Building a Peer-to-Peer Whiteboarding App for iPad

Our approach to the technical challenges of real-time collaboration.

Pixelboard is a collaborative whiteboarding app. Up to 10 users can join a virtual whiteboarding session and work together on a shared drawing in real time. We want Pixelboard to be as robust as it is easy to use. Some of our most important goals are:

Everybody should see the same drawing. The final rendering of a board should not differ from one device to the next.

Everybody should be able to make edits at the same time. You shouldn’t have to take turns editing, forcing everyone else to idly observe while one collaborator draws. And any potentially conflicting edits made on different devices at the same time should be automatically resolved so that we don’t sacrifice the first goal, that everybody sees the same drawing.

Drawing changes should appear in real time. Based on our own internal use cases, we expect that folks are going to want to draw together while on conference calls. If drawings update infrequently or after periodic delays, a lot of magic will be lost.

Technical Challenges to Overcome

Our design goals presented technical challenges during development:

Everybody should see the same drawing. In order to display an identical drawing on all devices, the complete model has to be present on all devices. The board renderer receives a model struct called Drawing, which represents the current state of a whiteboard. A Drawing is essentially an array of marker strokes. In load testing with a quorum of collaborators, we found that it was not difficult to produce a Drawing containing tens of thousands of strokes. Broadcasting a complete Drawing every time a new stroke gets added wouldn’t be practical. We needed to communicate only the minimum amount of data that was necessary.

Everybody should be able to make edits at the same time. Edits to the shared model cannot use locking or turn-taking. Each collaborator must be able to make edits on the drawing as it appears on their device, even if they haven’t yet received updates from someone drawing at the same time on another device. This also makes broadcasting complete Drawings impractical, as it would be impossible to adequately resolve conflicts between competing edits due to information loss in the model.

Drawing changes should appear in real time. We can’t wait on slow networks to send huge amounts of data, which is another reason to transfer only the minimum amount of data necessary as changes occur. Another part of the solution to real-time collaboration is that devices must communicate directly with one another using peer-to-peer networking, not via an intermediating server. No centralized server means there’s no arbiter of truth to resolve conflicts between simultaneous edits on different devices. We need the data model to be inherently conflict free, so each peer can independently achieve a valid state.

Conflict-Free Replicated Data Types

To overcome all of these challenges, we turned to the concept of a conflict-free replicated data type, or CRDT. From Wikipedia:

A conflict-free replicated data type (CRDT) is a data structure which can be replicated across multiple computers in a network, where the replicas can be updated independently and concurrently without coordination between the replicas, and where it is always mathematically possible to resolve inconsistencies which might result.

Instead of using a Drawing as the model, we demoted Drawing¹ to just a “view” of the model:

struct Drawing { let strokes: [Stroke]}

The actual underlying model is comprised of instances of an Edit data type. Every possible user action in a session corresponds to a case on the Edit enum, for example:

enum Edit { case addStroke(Stroke) case undo(Edit) case redo(Edit)}

Edits are “conflict free” in that Pixelboard’s Edit integration algorithm guarantees that any Edit can be applied to a Drawing in any order and will always produce a valid result.

For storage and transmission, each Edit is wrapped in an EditRecord struct:

struct EditRecord: Hashable, Comparable, Codable { let edit: Edit}

EditRecord has Equatable, Hashable, and Comparable conformances which eliminate inconsistencies between devices that would otherwise be caused by duplication or incomplete ordering. The backing store for a whiteboard is a complete set of EditRecords, one each for all of the Edits across all devices that occur during a session.

The combination of Edit and EditRecord forms Pixelboard’s CRDT and is at the heart of our collaboration engine.

Replicating EditRecords Across All Devices

Our peer-to-peer networking layer ensures that the set of all EditRecords is replicated across all devices in a session. It’s an append-only model, where no EditRecord is deleted or mutated. Each user action generates an additional EditRecord which gets appended to the shared set. Records generated on a given device are integrated locally and then broadcast to all other devices in the session. As incoming EditRecords are integrated, an updated Drawing is computed and passed to a renderer which updates the displayed whiteboard.

Each device receives every edit record made by other devices. Sketched in Pixelboard.

Networks can be slow and unreliable. EditRecords cannot be guaranteed to be received in the same order in which they were created, nor can we guarantee that a device won’t receive the same EditRecord twice. Thus, sets of EditRecords must exhibit the following properties:

Associativity: (A∪B)∪C=A∪(B∪C) The order of the union operations must not matter, so that multiple sets of EditRecords from other devices in a session can be integrated in whatever order the network conditions allow.

Commutativity: A∪B=B∪A The order of the arguments to a union operation of two EditRecord sets must not affect the resulting value.

Idempotence: A∪A=A Duplicate EditRecords must be able to be integrated more than once without leading to duplication of those records in the replicated set of all records, nor duplicated information in the computed Drawing.

Guaranteeing Strict Total Order

Because EditRecord conforms to the Equatable and Hashable protocols, sets of EditRecords naturally exhibit all three of the above required properties. But there’s one additional property we need from sets of EditRecords in order to produce a consistent drawing across all devices: a capacity for total order.

Given identical sets of EditRecords, two devices should be able to independently sort those EditRecords into an identical order, so that the Edit each EditRecord contains can be “replayed” during computation of the Drawing model, resulting in an identical Drawing on both devices. We took a page from Helftone (the developers of Clear.app) and added several properties to EditRecord which together enable a capability of achieving total order.

Edit records have three properties which make a strict total order possible. Sketched in Pixelboard.

First, for each whiteboard there is a logical timestamp, which in Pixelboard’s case is literally just an integer. It marks the passing of time on a “logical clock,” making it possible to determine whether a given EditRecord happened-before another record. Each EditRecord has a logical timestamp whose value is one integer higher than the highest logical timestamp among all the existing EditRecords visible locally to the device which created the EditRecord at the time of creation:

To see how logical timestamps are incremented, let’s follow an example sequence of events:

Three devices join a session: A, B, and C.

Device A draws a stroke, generating the first EditRecord. It has a logical timestamp of 1.

Devices B and C receive and integrate that EditRecord.

Next, and at the same moment, devices B and C each draw a stroke, generating two additional EditRecords. The logical timestamps for both of these records is 2, one more than the highest logical timestamp visible to each device (1) at the time each EditRecord was created.

Device A receives and integrates the two new EditRecords.

Device A adds another stroke, generating another EditRecord. This EditRecord’s logical timestamp is 3, one higher than the highest logical timestamp visible to it at the time (2). Note that it doesn’t matter that there is more than one EditRecord visible with a logical timestamp of 2. The highest value found is 2.

The logical timestamp provides the backbone for most of the ordering of a set of EditRecords, but, by itself, it’s incapable of yielding a total order. We have just seen that multiple EditRecords can share the same logical timestamp value. To achieve a total order, adding two additional properties to EditRecord does the trick:

In this Comparable implementation, we first attempt to sort by comparing the logicalTimestamp of each record. If the logical timestamps are different, then this is the only comparison required, as it’s possible to determine whether either record happened before the other. If the logical timestamps are equal (as occurs in Step 4 from our example sequence above), then we compare the temporalTimestamp (a Unix timestamp in seconds) from each record to order them by the actual time they occurred.

You may wonder why this comparison isn’t the first step. This is because a temporal comparison is not as meaningful as a logical one. Networks aren’t fast or reliable enough to guarantee that all devices will receive incoming records at the same precise moment. The logical timestamp of an EditRecord is thus a more accurate reflection of the user-visible state of the drawing at the time the user created the record than a Unix time is. The temporal timestamp is merely a tiebreaker.

Lastly, if both the logical and temporal timestamps are equal (which is not impossible), we sort by collaboratorId, the ID of the collaborator who authored each EditRecord. Collaborator IDs are assigned by the peer-to-peer networking layer when each device joins the session and are guaranteed to be unique in that session. With this last tie-breaker in place, our Comparable implementation can yield a strict total order.

Putting It All Together

When Pixelboard integrates new EditRecords, it sorts the set of all EditRecords into an array and then iterates through that array, replaying each EditRecord’s edit on a blank Drawing, filling the Drawing’s strokes array with all the strokes in the correct z-order, from back to front, excluding those that were undone or wiped from the board.

Data flowing through the Pixelboard system. Sketched in Pixelboard.

This algorithm guarantees that a valid Drawing can be produced from any combination of Edits in any order. The computed Drawing is passed up to the user interface tier of the application where a renderer object updates the displayed whiteboard. In combination with our peer-to-peer networking layer which guarantees completeness, this integration process ensures that every device in a session will eventually display an identical whiteboard to each user, while still allowing each device to make independent edits, all in real time.

Performance Optimizations

One of the challenges that came up during implementation is that sorting a set of EditRecords is an O(N) operation, growing linearly with the number of records. In a busy session with several people all drawing at once, it was possible to create drawings with tens of thousands of strokes. Sorting a set of a few hundred records is relatively quick. Sorting 20,000 records is not, even on the latest iPad hardware.

We optimized the integration process by baselining the history of a busy board periodically. When certain threshold criteria are met, a portion of EditRecords deemed unlikely to be affected by future user actions are flattened into a baselined history which is not ever sorted again. The next time new EditRecords must be integrated, only the recent, un-baselined records are sorted. The edits from these recent records are replayed on top of the baselined history. This enforces a reasonable cap on the number of records sorted at any given time.

Baselining was labor intensive to develop and test, but led to 25x performance gains. The baselining system not only reduced the time it took to integrate EditRecords into the model, but it also unlocked some optimizations in board rendering as we were able to cache bitmaps of baselined portions of a drawing for faster redraws.

Concluding Thoughts

Even an app as deceptively simple as Pixelboard poses quite a challenge if it involves real-time collaboration in a peer-to-peer system. Pixelboard would not have been possible without guiding concepts like conflict-free replicated data types. This is a fascinating domain to explore with a rich history of research and development. We certainly have not exhausted these riches and look forward to bringing further performance optimizations and features to our users.