Given a system which provides:
- Optimistic concurrency control / versioning per object (using CAS – Check-and-Set)
- Transactions that never need to span more then a single object.
- Snapshot Isolation
Is this system considered serializable?
From Snapshot Isolation
In a write skew anomaly, two transactions (T1 and T2) concurrently read an overlapping data set (e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2), and finally concurrently commit, neither having seen the update performed by the other. Were the system serializable, such an anomaly would be impossible, as either T1 or T2 would have to occur “first”, and be visible to the other. In contrast, snapshot isolation permits write skew anomalies.
As a concrete example, imagine V1 and V2 are two balances held by a single person, Phil. The bank will allow either V1 or V2 to run a deficit, provided the total held in both is never negative (i.e. V1 + V2 ≥ 0). Both balances are currently $100. Phil initiates two transactions concurrently, T1 withdrawing $200 from V1, and T2 withdrawing $200 from V2.
Based on this is seems that having the potential for write skew is the only reason for a system that guarantees Snapshot Isolation is not Serializable.
However, in a system which doesn’t allow a transaction to span multiple objects (in the above example V1 and V2) it seems impossible for write skew to occur.
Therefore, the system described above is serializable. Is this correct?
2
No, I don’t think that your stipulations result in a system that we ought to consider serializable.
Snapshot isolation is a technique that ensures a transaction sees the same dataset throughout the whole transaction. Snapshot isolation provides some guarantees but doesn’t define all the characteristics necessary to understand how transactions really work (unless we choose to conflate snapshot isolation and MVCC).
Snapshot isolation is most commonly implemented using MVCC, Multi Version Consistency Control. MVCC defines more detail of transactions in the context of their snapshots: it is says they require isolation only when writes conflict (locations only, or, locations + values, depending on implementation). MVCC provides a relaxed consistency model, and suffers from write skew.
Relaxed consistency models are hard to understand because they are like a hybrid between no isolation and full isolation.
So, lets start with a strict concurrency model first. Two transactions must be isolated from each other if one writes to any data that the other either reads or writes (and vice versa…).
When we don’t know why a transaction reads data, we must presume that a different value read may alter the behavior of the client involved, which is why the condition of overlapping transactions indicates isolation. Without isolation, a read of stale data in a snapshot can easily exhibit relaxed consistency, another term for which is inconsistency, which is to say, error.
We only need consider the exact data read or written by the transactions, any data outside that set need not be considered. However, it is critical to realize that when we are talking about data read by a transaction, that we necessarily must includes any and all “meta” data (and data and metadata read by meta operations such as checking of constraints). Examples of meta data are / meta operations are: identification of a unique new primary key; another is the sum of an entire column; another is to search for something and not find it; range searches or sums. This goes to @Matthew’s comment about preventing duplicates, as well as @Tersosauros answer, in which he considers state.
For example, this means that two transactions overlap (require isolation) when they both insert a row (assuming a unique primary key constraint) because checking the key constraint is synonymous with reading the entire primary key column. As another example, searching for something and finding it is like reading that one value, however, not finding it is like looking at every value in the column.
MVCC protects only against overlapping or conflicting writes, but does not guard against reads (unless also written by that transaction). Thus, to obtain an consistency error in MVCC, all we need do is read something that is changed by another transaction (where the other transaction happens after the former’s snapshot is obtained, but commits first), while the other transaction continues using stale data and makes any decision differently based on that stale data compared with what it would have done up-to-date data. It’s easier to cause than you’d think.
Relaxed consistency is another way of saying potentially inconsistent, or error prone. (Relaxed consistency should not be confused with eventual consistency, which is another different popular form of error prone from “NoSQL”.)
On your question, when you say that transactions need never span more than one object, this must apply to both reads and writes, and metadata (and meta operations), including consistency checking, whole column aggregates, absence checks, range searches, etc..: if so, then so far so good.
However…
I take it from your question that you are using snapshot isolation (MVCC) on individual objects (say rather than object locking). (You also mention CAS; I’ve heard of compare-and-swap, and, test-and-set, but not check-and-set, though I assume it is similar).
Your question write up suggests to me that “objects” have more than one field, otherwise the question’s stipulations would be unnecessary.
Therefore, because your snapshott’ed/MVCC-handled objects have more that one field, you are prone to write skew within single objects. If two transaction update the same object at the same time, one may read a field of the object’s value made stale by a concurrent other transaction on the same object, and continue without knowing, thus, potential inconsistency (aka error).
You could use object locking instead, which would prevent two transactions (for update) from even looking at the same object if another transaction already was in the process of doing so.
I believe that an alternate form of snapshot isolation can be done without using MVCC’s broken write-set only comparison model. So, you can (algorithmically) promote the comparison set from write-only to include also the read set. Then two transactions updating the same object would not be able cause write skew (because the one that attempts commit later would be aborted). I think this may be the appropriate solution to the problem you’re describing, because you’re already getting most of the benefit that MVCC would buy us, by precluding multi-object transactions.
(We only need to consider the exact and specific items/fields read or written, but we must include those read as metadata, potentially during meta operations to prevent write skew (i.e. error). If we remove either the read set from the comparison set, or we fail to consider metadata (potentially used by meta operations), then we have a model that will allow error.)
I think, as @pjc50 stated, (Emphasis added:) “if the transactions are limited to read/write a single object” then “the answer is yes”.
However, I think where this falls down is in the idea of a single object.
In the example taken from Snapshot isolation, T1 and T2 do not share any value. However, they are still a potential for a write skew because “neither [has] seen the update performed by the other“source.
Therefore it is the ability of a transaction to see all other updates before committing that makes a transaction truly serializable.
From serializability:
Serializability of a schedule means equivalence (in the outcome, the database state, data values) to a serial schedule (i.e., sequential with no transaction overlap in time) with the same transactions.
Unfortunately, because of this (and as @Matthew Mark Miller points out), we must also consider state as well as values.
With this consideration, two transactions using OCC would have the potential for write skew whenever any database state is written.