Design Google Docs | Operational Transformation | CRDT
How to design Google docs. Explore 3 different approaches on concurrent text modification
Operational Transformation (OT) is a key algorithm used in collaborative editing systems like Google Docs to enable real-time, concurrent editing of shared documents by multiple users while maintaining consistency. The primary goal of OT is to ensure that edits made by different users in different orders result in the same final document state.
The core concept behind operational transformation revolves around adapting an operation to accommodate other concurrent operations that have been executed or are currently in progress. When a user initiates an edit or operation on a shared document, this action is communicated to other users within the system. If additional users have made alterations since the original operation was transmitted, these modifications must be considered when applying the operation. To facilitate this, operational transformation relies on essential components such as:
For instance, consider this scenario: two users, X and Y, access a shared document containing the string "ABCD." User X inserts the character "A" at index 0 (before B), and simultaneously, user Y adds the character "D" at index 2 (after C). Both of their modifications are reflected in their respective local copies. However, when these operations are dispatched to the server, the server attempts to resolve the issue as follows:
In the absence of Operational Transformation utilization, sharing operations across users leads to inconsistencies.
By employing Operational Transformation, the insert(D,2) operation is transformed into insert(D,3), taking into account the other operation. Consequently, both documents achieve a consistent state in the end.
Understanding Conflict-Free Replicated Data Types (CRDTs)
Beyond implementation complexities, a significant drawback of Operational Transformation lies in its dependence on a centralized server to synchronize changes. This becomes problematic when multiple devices need data synchronization. CRDTs emerged as an effort to simplify Operational Transformation and eliminate the necessity for a central server.
CRDT, a distributed data structure, is designed to enable simultaneous updates on identical data across diverse replicas or nodes within a distributed system, all without the need for conflict resolution. CRDTs tackle this challenge by introducing commutative and associative properties into operations:
CRDTs utilize operations that are commutative (order-independent) and associative (grouping-independent).
Regardless of the sequence in which operations are executed, they must yield identical outcomes, ensuring uniform data across all replicas.
Consider the following example: Imagine the indices of the content in an entire document lie between the number line 0 and 1. As elements are inserted, they are allocated fractional indices, incorporating the originating node/user's ID.
CRDTs are categorized into two primary types, as demonstrated above:
Operation-Based CRDTs:
These CRDTs represent data using operations (e.g., insert, delete) and apply these operations in a commutative and idempotent manner.
Each node generates its updates locally and spread them out to other replicas, which then implement the operations in a specified order to achieve convergence.
State-Based CRDTs:
These CRDTs represent data using the state.
Each replica independently updates its state and exchanges its complete state with other replicas to attain convergence.
During state merging, replicas either unify or select the maximum value of conflicting elements, contingent on the CRDT type.
Both CRDTs and Operational Transformation remain applicable methodologies, contingent on the system design's requirements, for addressing conflicts in real-time collaborative tools.
Here's a more detailed explanation 👇