Operational Transformation Part 2: Operations

This is the second post in a series I am doing on the real-time collaboration algorithm Operational Transformation (OT). The first post was an introduction to OT and this series. There are two repositiories on GitHub to follow along with:

operational-transformation - This is an Operational Transformation library for plain text which assumes nothing about how the storage backend for your documents works, what kind of frontend you will be using, etc. This is where most of the ideas discussed in this series will be implemented.
operational-transformation-example - This repository contains an example application which takes advantage of the OT library.

However, before we can start transforming operations and providing a real-time experience for our users, we have to decide how to represent and generate operations. That is the topic for this post.

Representing Operations

We will be representing operations as a stream of simple edits. As Daniel Spiewak notes in Understanding and Applying Operational Transformation, this is perfect for the type of automated processing we will be doing inside the xform function. Our operations will be made up of the following three types of edits:

retain(n) - Copy n characters from the old version to the new one.
insert(subsequence) - Insert subsequence in to the new document, without changing where we are indexed in the old document.
delete(subsequence) - Advance the position we are indexed in the old document by the length of subsequence, without copying subsequence in to the new document. The reason we pass a subsequence instead of a number is soley so that we can assert we are deleting the proper subsequence and throw an error if we are not.

To create the document "Hello, World!" from the document "hello world", you would apply the following operation:

[ delete('h'),
  insert('H'),
  retain(4),
  insert(','),
  retain(1),
  delete('w'),
  insert('W'),
  retain(4),
  insert('!') ]

Note that the edits start at the beginning of the old document and finish at the end of it. If an operation doesn't affect the whole document, the last edit in the operation should be retain(length of remaining document).

Calculating Operations

First off: you don't want to add listeners to keyboard and mouse events! Imagine all of the edge cases and browser incompatibilities: would you successfully handle Ctrl-a or Shift-up-arrow followed by backspace? Yikes, I don't even want to think about all of the common key combinations (not to mention the differences between key combinations across OSs).

No, instead you want to take two versions of the document and programmatically generate a list of operations which transform one version to the other. Fortunately, comparing sequences and determining their similarity or difference is pretty much a solved problem in computer science.

Levenshtein Distance

Levenshtein distance (sometimes also referred to as edit distance) is defined as the number of edits required to transform one sequence in to another sequence. Classicly, these edits are either an insertion, deletion, or substitution of a single item in the sequence. Comparing sequences of characters (or, as we know them, plain text documents) is a classic use case for Levenshtein distance. Variations of Levenshtein distance are also often used in bioinformatics to compare sequences of DNA.

The Levenshtein distance between "WUTANG" and "MUTATION" is five, because it would take three substitutions and two insertions to transform "WUTANG" in to "MUTATION":

"WUTANG" -> "MUTANG" (substitution)
"MUTANG" -> "MUTATG" (substitution)
"MUTATG" -> "MUTATI" (substitution)
"MUTATI" -> "MUTATIO" (insertion)
"MUTATIO" -> "MUTATION" (insertion)

The algorithm for calculating Levenshtein distance runs in O(m * n). Here it is presented in psuedo code:

LevenshteinDistance( s[1..m], t[1..n] ):
  d = matrix[0..m, 0..n]

  # The distance between the empty string and another
  # string is equal to the length of the other string.
  for i = 0 to m:
    d[i, 0] = i
  for j = 0 to n:
    d[0, j] = j

  for j = 1 to n:
    for i = 1 to m:
      if s[i] is equal to t[j]:
        d[i, j] = d[i-1, j-1]              # No edit
      else:
        d[i. j] = minimum( d[i-1, j] + 1,  # Deletion
                           d[i, j-1] + 1,  # Insertion
                           d[i-1, j-1] + 1 # Substitution
                         )

  return d[m, n]

However, the number of edits is immaterial in our application; we care about what the edits are. Modifying the algorithm to return a list of edits instead of the number of edits is trivial: just store lists of edits in the d matrix rather than numbers.

Here is my JavaScript implementation of calculating an operation on GitHub.

Possible Improvements in Calculating Operations

Currently, every insert, delete, and retain only handles one character at a time. We could save some bandwidth if [retain(1), retain(1), retain(1)] was simplified to [retain(3)] and [insert('c'), insert('a'), insert('t')] to [insert('cat')], etc. I chose not to do this simply because it has been easier this way, and I haven't ran in to any problems so far.
In practice, usually only a small part of the document has changed since the last comparison. Humans don't randomly change documents in random places; we add new sentences and delete old paragraphs. Knowing this, we can cut down on the size of our problem domain by reducing the area of the strings we are looking for edits in.
```
PrefixPostfixOptimizedOperations(s[1..m], t[1..n]):
  prefix = []
  i = 1
  while s[i] == t[i] and i <= m and i <= n:
    push(prefix, retain(1))
    i = i + 1


  postfix = []
  j = 0
  while s[m-j] == t[n-j] and j < m and j < n:
    push(postfix, retain(1))
    j = j + 1


  return concatenate(prefix,
                     Operations(s[i..m-j], t[i..n-j]),
                     postfix)
```
I haven't implemented this optimization yet, but I probably will have to eventually. In my preliminary testing, O(m * n) is just too slow for large values of m and n, and causes the browser's UI to be unresponsive while the computation is performing.
My representation of the individual edits in the stream of edits which make up an operation could be optimized. Currently, each edit is represented in an array of two elements where the first is the type of edit and the second is the associated data. For example, insert('toto') is represented as ['insert', 'toto'] and retain(4) is ['retain', 4]. This isn't the most memory efficient, bandwidth efficient, or "JavaScripty" solution; it's just the first I thought up and implemented. Eventually, I will probably update it so that each edit is a string and the first character is the operation type, while the rest of the string is the associated data. So, insert('toto') would become 'itoto' and retain(4) would become 'r4'.

Update: I have now implemented the string representation.

Applying Operations To a Document

Applying operations to a document is straight forward. See the apply.js file in the operational-transformation GitHub repo.

Next Up

In the next part of this series, I will describe the xform function, and how it transforms a pair of operations so that two documents can be kept in sync.

References

Levenshtein Distance on Wikipedia.
Understanding and Applying Operational Transformation by Daniel Spiewak.
Introduction To Algorithms section 15.4 on the Longest Common Subsequence algorithm by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. (Levenshtein distance and Longest Common Subsequence are very similar to the point of being two faces of the same problem).

Update: If you don't have the book (or just want more), MIT has provided the video recorded lecture and notes by Charles E. Leiserson.