I want to better understand how isolation levels work, and here is my current understanding:
Isolation levels determine how a transaction is isolated from concurrent transactions. They are typically described in terms of “read” actions, such as “repeatable read” or “read uncommitted”. This suggests that isolation levels control whether your transaction can see uncommitted records from other transactions. In other words, it seems that isolation levels primarily govern how deeply a transaction can “read into” the state of other ongoing transactions.
However, my understanding is that the isolation level setting does not dictate whether the current transaction’s uncommitted records should be visible to other transactions, but rather whether the current transaction should be able to see uncommitted changes made by others.
If my understanding is correct (or at least partially correct), why do many database systems allow you to specify the isolation level at the statement level for operations like INSERT INTO, which do not read data from other transactions (assuming it is not an INSERT INTO … SELECT, where the isolation level of the SELECT would be relevant)?
Some examples and further considerations
In DB2 for i series you can execute this statement:
CREATE TABLE SCLAUZERO.ISO_LEV_TEST( C1 INT PRIMARY KEY, C2 VARCHAR(16));
INSERT INTO SCLAUZERO.ISO_LEV_TEST (C1, C2) VALUES (0, 'TEST1') WITH RR;
Where WITH RR
is defined as isolation-clause and in this case sets the isolation level of this INSERT INTO statement to Repeatable Read.
A brief research has led me to the following temporary conclusion:
- Oracle does not have this feature
- SQL Server has a concept of ‘hint’ which seems to differ from what I have described
- Postgres does not have this feature
- DB2 for LUW has this feature
Therefore, from this brief exploration, it appears that the DB2 family introduces the concept of isolation levels for DML statements.
I initially used INSERT INTO as an extreme case where the absence of a read is more evident, but upon reconsideration with a more attentive perspective, I would say the following:
In both DB2 for i and DB2 LUW, it is possible to specify an isolation level for all DML constraints:
- INSERT … WITH
- UPDATE … WITH
- DELETE … WITH
In fact, aside from a “pure” insert (without a select or computed fields such as subselects), it is clear that UPDATE and DELETE must read the table data and may encounter the classic issues of dirty reads, etc. This may be the purpose of this isolation clause: to define the isolation level for implicit reads even in DML commands.
This DB2-family feature seems to be a pretty abstract and theoretical feature, because it’s very likely that the database will try to use locking to prevent modification by transaction B of a record that is being modified by transaction A.
This would explain why most database systems do not consider this ‘implicit read’ in DML statements. When locking mechanisms are in place, they tend to prevent, for instance, a concurrent UPDATE from even reading the data, thus negating the need to specify what it should see.
As you can see, there is a lot of speculation here, and it is frustrating, but I need a conceptual framework before designing tests to verify whether these assumptions are correct.
1
However, my understanding is that the isolation level setting does not dictate whether the current transaction’s uncommitted records should be visible to other transactions, but rather whether the current transaction should be able to see uncommitted changes made by others.
I think the way the ANSI standard is drafted, this is true, that isolation levels do not “dictate” what is seen by other transactions.
However, there is nothing to preclude influence on other transactions in general (provided that influence does not introduce the excluded “anomalies” for those other transactions), and in practice the placement of locks does influence the execution of other transactions – such as slowing them down and influencing the timing and sequencing of their execution, influencing their execution plans, changing the risk of deadlocks, and implicitly altering what anomalies may be seen (or how frequently so).
The standard also specifies only what anomalies each isolation level must strictly prevent. It does not specify what anomalies must strictly remain. In other words, the “read uncommitted” level means it is permissible for the transaction to read uncommitted changes – it does not actually require that the transaction be able to see all uncommitted changes by other transactions.
It’s also worth remembering that SQL supports a number of different modes of execution, including cursors, and including an interactive mode. Interactive mode is where statements within a transaction are submitted one-by-one by the user, so that the database engine cannot analyse all the statements of a single transaction as a pre-planned set, and so must ensure a mechanism of isolation that is capable of coping correctly with any additional statement that may be submitted later whilst the transaction remains open.
The ANSI SQL standard is extremely loose, since the first version in 1986 was designed to tie together the behaviour of a number of then pre-existing engine brands without requiring major mechanical changes amongst any of them.
Infamously, Oracle has a “serializable” level which complies with the letter of the wording of the standard, but does not comply with the basic definition of what serializable was supposed to mean (which was that concurrent transactions execute with effects as-if they occur in some serial order).
I’m not a DB2 user, so don’t quote me on this which I’ve discovered from brief research, but with DB2, apparently it’s “repeatable read” level (which is the highest possible in DB2) is equivalent to “ANSI-serializable”. And DB2-read-stability is equivalent to ANSI-repeatable-read. That gives you a flavour of how confused everything is in this area.
Informed opinion nowadays is that the ANSI standard, as it applies to isolation levels, is basically half-baked nonsense. To make any sense at all of the standard is an exercise in historical study, and a practitioner cannot learn anything useful from reading the standard alone but must be familiar with the workings of the specific engine they use.
Probably nobody in the world knows all the details of how even a single engine works under all theoretical conditions, let alone how several different brands each work and differ from each other. I’ve noticed in the past on SE that even world experts express surprise or annoyance occasionally about how their brand of engine works in a particular case, and subtle bugs in engines are discovered year after year.
So just to summarise, DB2 does have a statement-level isolation clause, and the availability of this clause on INSERT and UPDATE statements is likely because of the effect it can have on the locking behaviour of the engine, and therefore the effect it might have on subsequent work (SELECTs, INSERTs, UPDATEs, cursors, etc.) in the same transaction.
I can’t be sure exactly the intended use in DB2, but a standard idiom for inserting-then-reading is to recover an auto-numbered ID. Many engines now have the OUTPUT clause for this, but the statement-level isolation clause in DB2 seems to be ancient.
There might also be cases where a record is inserted and the space and keys allocated, then the same record is further updated in the same transaction, so that you want to take and keep holding the locks across multiple statements (which are all nominally writes, not reads), but you don’t necessarily want the same isolation level (or locking behaviour) for all work in that transaction. Seems like an uncommon pattern nowadays, but maybe more important in the past when memory was scarcer and processing was slower.
Also, the internal architecture of DB2 may be such that although these usages seem like they might be unusual or fanciful even, the difficulty of implementing them at the time for consistency and completeness of coverage, was minimal.