What developers find surprising about Postgres transactions

When should I use a transaction in Postgres? And why would I lock?

Is a question I was asked the other day, and one I’ve answered a few times before.

When dealing with Postgres there are details of the answer that surprise people, and even senior developers with lots of experience can be caught off-guard.

When should I use a transaction in Postgres?

There is no common definition of a transaction, and in Postgres what your transaction offers will depend on the isolation level. This can be understood as an adjustable toggle that controls how strong the guarantees will be, usually at the cost of performance.

Postgres ships with a default isolation level of read committed, which I’ve never had reason to change. This level means transactions can only ever read data that has been committed to the database- ie, the user making the change did it outside of a transaction or has issued a COMMIT;.

It follows that when you begin a transaction and have not yet committed, any updates you make are only visible to you- they cannot be read by any other users in the database.

That’s not all, though- read committed means during your transaction, you will continue to read the most up-to-date committed changes in the database, even if they were committed during your transaction.

Imagine we run:

/* 1 */ begin;
/* 2 */ select * from organisations where id = 'bananas';
/* 3 */ select * from organisations where id = 'bananas';
/* 4 */ commit;

Under a common mental model of transactions, you might imagine that when you read the organisation in query (3), you’ll get the same result as in query (2).

That’s not the case in read committed- there is no guarantee that a different transaction didn’t update bananas after (2), but before (3), causing your version of bananas to change even while within your own transaction.

This phenomenon is called a non-repeatable read, when a transaction reading the same data can receive different results. In my experience, most developers believe transactions will prevent this, which can have nasty consequences.

So if read committed transactions don’t provide a stable snapshot of the world, what do we use them for?

Transactions should be used to make one-or-more changes that should be either accepted or rejected together (an atomic change) when it wouldn’t be appropriate for other users of the database to see any in-between state.

If you want repeatable reads in a read committed world, you’ll need to add some locking.

And when should I lock?

The nonrepeatable read (reading something, then reading it again and getting a different result) from our previous example could be an issue, if an application isn’t built to handle other users modifying the data until it’s done with it.

In read committed world, you can opt-into stronger defenses using an explicit row lock. Our example can be amended with a FOR UPDATE modifier:

/* 1 */ begin;
/* 2 */ select * from organisations where id = 'bananas' for update;
/* 3 */ select * from organisations where id = 'bananas';
/* 4 */ commit;

In this circumstance, query (2) has acquired an update lock on bananas which will block any other transaction from modifying the row until our transaction commits.

If no one is permitted to modify the data, it follows that it cannot change throughout our transaction, ensuring repeatable reads.

Surprised?

This might be old news to some, but I think it’s normal not to know this.

For one, MySQL ships by default with repeatable read, so anyone moving from MySQL to Postgres needs to be trained to adjust their expectations- even Staff engineers with decades of database experience could be using the wrong mental model.

My personal experience suggests most engineers aren’t aware, or haven’t thought about this. If your company uses Postgres and depends on data consistency, you may want to shout louder about this- just saying “you know your data might change during a transaction?” should be enough to get people’s attention!

This post was written in practical terms, trying to avoid the details of other isolation levels and how they are implemented. If you’d like to know more, take a look at:

Postgres Concurrency Control docs are a good database-specific reference, with examples at each isolation level
Brandur’s amazing How Postgres Makes Transactions Atomic to understand how Postgres implements isolation levels
Designing Data-Intensive Applications for an amazing foundation in isolation levels across a variety of technologies

Discuss this post on Hackernews. If you liked this post and want to see more, follow me at @lawrjones.