Serializability and Linearizability

July 24, 2024

TLDR

Serializability: Talks about correctness of transactions. It ensures that the result of executing transactions T1, T2, and T3 concurrently is the same as if they were executed in some serial order, say T1 -> T2 -> T3, even if the actual execution overlaps.

Linearizability: Talks about correctness of read and writes. Moreso, it ensures that if a read operation R follows a write operation W in real time, then R will see the effect of W or some later write, maintaining the real-time order of operations.


Serializability

In the context of database systems, serializability refers to the property that ensures that the concurrent execution of transactions results in a state that would be achieved if the transactions were executed serially, one after the other.

Levels of Serializability:

1. Eventual Consistency

Definition: Ensures that, given enough time, all replicas of the database will converge to the same state.

Properties: Allows all types of anomalies in the short term but guarantees consistency in the long run.

Implementation: Used in distributed systems and NoSQL databases where availability and partition tolerance are prioritized over immediate consistency. These levels of serializability and isolation allow database systems to balance between the strictness of consistency and the need for concurrency and performance. Different applications and systems may choose different levels based on their specific requirements and constraints.

2. Causal Consistency

Definition: Ensures that operations that are causally related are seen by all processes in the same order, while concurrent operations may be seen in different orders.

Properties: Prevents causality violations but allows other forms of anomalies like write skew.

Implementation: Often used in distributed databases where maintaining strict serializability across all nodes is impractical.

3. Snapshot Isolation (SI)

Definition: Transactions operate on a snapshot of the database taken at the start of the transaction.

Properties: Prevents dirty reads and non-repeatable reads but may still allow certain anomalies known as write skew.

Implementation: Commonly implemented using multi-version concurrency control (MVCC).

4. Read Uncommitted

Definition: The lowest level of isolation, where transactions are allowed to read uncommitted changes made by other transactions.

Properties: Allows dirty reads, non-repeatable reads, and phantom reads.

Implementation: No strict locking mechanisms are enforced, allowing for maximum concurrency but at the cost of potential data anomalies.

5. Read Committed

Definition: Ensures that any data read by a transaction is committed at the moment it is read. Thus, it prevents dirty reads.

Properties: Prevents dirty reads but allows non-repeatable reads and phantom reads.

Implementation: Achieved by acquiring read locks on selected data items and releasing them immediately after the read operation is completed.

6. Repeatable Read

Definition: Ensures that if a transaction reads a data item, it will see the same value if it reads that data item again, even if other transactions concurrently modify the data.

Properties: Prevents dirty reads and non-repeatable reads, but phantom reads can still occur.

Implementation: Achieved through locking mechanisms that prevent other transactions from modifying data that has been read by the current transaction until it completes.

7. Serializable

Definition: The highest level of isolation. It ensures that the outcome of executing transactions concurrently is the same as if they were executed serially in some order.

Properties: No anomalies like dirty reads, non-repeatable reads, or phantom reads occur.

Implementation: Achieved using strict two-phase locking (2PL), timestamp ordering, or Serializable Snapshot Isolation (SSI).

Linearizability

Linearizability is a binary property: an operation or system is either linearizable or it is not. However, in practice, systems may offer different consistency models that approximate linearizability or provide weaker guarantees under certain conditions.

Levels of Linearizability:

1. Eventual Consistency

Definition: Ensures that, given enough time, all replicas of the data will converge to the same value, but does not guarantee immediate consistency.

Properties: The weakest consistency model, prioritizing availability and partition tolerance over immediate consistency.

Example: In a distributed database, updates to a data item will eventually propagate to all replicas, but reads may return stale data until convergence.

2. Monotonic Writes Consistency

Definition: Ensures that write operations by a process are seen in the order they were issued.

Properties: Ensures that writes are applied in a consistent order.

Example: If a process writes values A and then B, all processes will see A before B.

3. Monotonic Reads Consistency

Definition: Ensures that if a process has seen a particular value for a data item, it will never see an older value of that data item.

Properties: Prevents reads from moving backward in time, ensuring a form of temporal consistency.

Example: If a user reads an email at time T1 and then again at time T2, the user will see the same or more recent data at T2.

4. Read-Your-Writes Consistency

Definition: Ensures that a process always reads its own writes. If a process writes a value and then reads, it will see its own update.

Properties: Ensures a form of local consistency but does not guarantee global ordering.

Example: After a user updates their profile information, any subsequent read of the profile by the same user will reflect the update.

5. Causal Consistency

Definition: Ensures that operations that are causally related are seen by all processes in the same order, while concurrent operations may be seen in different orders.

Properties: Weaker than sequential consistency, allows more concurrency but maintains causality.

Example: If operation A causally precedes operation B, then any process that sees B will also see A.

6. Sequential Consistency

Definition: Ensures that the results of operations are consistent with some sequential order of the operations, but not necessarily in real-time order.

Properties: Weaker than linearizability as it allows more concurrency, but each operation appears in a consistent global order.

Example: If two operations, A and B, are issued by two different processes, all processes will see A and B in the same order, but this order may not respect the real-time order in which the operations were initiated.

7. Strict Linearizability (Atomic Consistency)

Definition: Ensures that each operation appears to take effect instantaneously at some point between its invocation and its response. Operations are applied in a strict real-time order.

Properties: Strongest consistency model, no anomalies allowed.

Implementation: Typically achieved using consensus algorithms like Paxos or Raft, or through strong synchronization mechanisms.

Linearizability vs Serializability

Serializability

  • Focuses on the correctness of transaction schedules in databases, ensuring that the interleaved execution of transactions is equivalent to some serial execution.
  • It ensures that the result of executing transactions T1, T2, and T3 concurrently is the same as if they were executed in some serial order, say T1 -> T2 -> T3, even if the actual execution overlaps.
  • This term is used in database system circles when they want to ensure correct transaction processing.

Linearizability

  • Concerned with the real-time ordering of operations in a distributed system, ensuring that operations appear instantaneous and respect real-time constraints.
  • It ensures that if a read operation R follows a write operation W in real time, then R will see the effect of W or some later write, maintaining the real-time order of operations.
  • This term is used in distributed system circles when they want to describe the consistency of reads and writes, replicated data, and correctness of concurrent operations.



Related Articles