Controlling Time: TeaTime

Many of us remember where we were when we saw some seminal event unfold on TV. We may have been doing different things, but we shared a common experience through the live broadcast. Parts of each person’s experience were not shared. The shared experience is the part that came off the screen and out the speakers.

For a Web-based sales presentation the shared experience imposed on us is one common view of a complete application, driven by someone else, and often even this is hard to get right. If we don’t all see the slide change at the right time with the audio, the salesman’s pitch may confuse us rather than persuade us. Now imagine that we are not merely watching a live broadcast, but that we each are meant to interact with what is being shown. Several people interact with one or more side-by-side applications, all as part of the same shared experience. You type a key and I see the letter appear on my screen, as you are speaking. At the same time, someone else moves a project deliverable from a pending column to a done column. How can these activities all be coordinated so that we feel like we are all sharing the same experience at the same time?

One way to achieve this might be to have everything running within one giant program on a server, with the display broadcast to each of us as a movie. But this would create problems:

We don’t all want to hear the same thing. In particular, we don’t want to hear an echo of ourselves, and so we each need a different audio mix with our own sound removed. Additionally, telepresence and dedicated conferencing systems have overwhelmingly demonstrated that voice discussion among remote participants can be devastatingly confusing unless the sound of each voice is spatialized to the geometric relationship among the users, which requires different mixing for each participant.

We don’t actually want to see the same thing. Maybe I want to zoom in on one app of interest while you are comparing two other documents, even as we are both in the same meeting. This requires a different video feed for each user.

We change during the course of the activity. People come and go, and we want to know about these changes. In Teleplace, the user’s 3d position and direction defines your focus and communicates that to others and to the system. It is easy to change position by clicking on a chair or panel or another person, or by walking around like in a video game. As a result, what each of us hears and sees changes all the time. However, the changing location and activity of each user is part of the shared experience.

For each of us to see a clear sharp image, the video would have to be very detailed, and that takes a lot of bandwidth.

Producing all of this for each of the huge number of potential groups would require enormous computation and electric power.

Another way might be to have each user’s computer produce its own results, but how would the shared experience be coordinated? Each computer could tell the others about its results, but what if data conflicts? One computer could be the arbitrator, but what happens if that computer is slow or temporarily unavailable? And how do we communicate the results? It would be terribly complex to figure out how to communicate just the parts of the shared experience that have changed, and to merge that into a user’s current state. The results would be incredibly buggy.

Instead, Teleplace is based on the idea that computers are computational robots: if you give two robots the same instructions at the same time, they always do the same thing. Around the time the Teleplace technology was first being developed, my daughter played the computer game Sims with a friend of hers in another city. The Sims program was not networked, but each girl sat at their own separate computer and spoke via telephone. “Let’s make a character named Howard.” “Let’s make him hungry.” Both separate computers were running the same version of the Sims game, and both responded in the same way to these instructions. Teleplace does exactly the same thing, but there are some special problems, and they have magical solutions:

1. How do you have the machines produce the same result on different processors and operating systems? We run a fast Virtual Machine and all parts of the shared experience are computed within it. The virtual machine is completely bit identical on each and every computer we run on. We’ve gone through a lot of effort to make sure that each floating point operation produces the same results under all circumstances. Random numbers are generated in the same repeatable patterns. In fact, given the same starting point and the same sequence of events, every single part of the shared experience will play back exactly the same way on the Teleplace virtual machine, regardless of operating system and hardware differences. By contrast, things that are different for each user can be handled separately outside of the virtual machine, including, e.g., the screen display from an individual user’s own point of view.

2. How do we arrange for everything to happen at the same “time” so that things stay deterministically in sync? There isn’t any good way to precisely coordinate time on the network. Every computer has its own clock, and it takes an indeterminate amount of time to send every one a message like “at the time this message was sent, the time was 11:53:21.1234567.” If there’s any difference at all between clocks, even a virtual machine can give different results for what is otherwise the same instruction.

In Teleplace, each collaboration has a message router as a data stream on our overlay network. When each user types a key or moves the mouse, the inputs are sent to the router . The router puts its own time stamp on the message and sends it out to everyone. That’s all the router does: no computation, no decoding. Each receiving virtual machine processes the input messages based on the message time stamp, which is called TeaTime. If nothing else is happening, the router also sends a heartbeat to move time along. As far as the shared experience is concerned, time is not a continuous thing in which something can always happen between any other two arbitrary points in time. Instead, everything in the shared experience – including animations of things falling or clouds moving – is computed based on the discrete router’s TeaTime. During the wall-clock time in which a single message is being acted upon, TeaTime does not progress. Time stands still. The collaboration code is not even allowed to see the local computer’s clock, or some answer might depend on that and be different for different users. Code can cause things to happen later by using a simple mechanism we have for sending a message into the future. (If the message originates outside the shared experience, the system automatically sends the message to the router just like any other user input. If the message is sent from computation within the shared experience, then every computer will send itself the same message because they are all doing the same thing at the same (Tea) time, and so the system does not even have to bother putting the message on the network to the router.)

3. How do we get each machine to start with the right state?

As each user joins a collaboration in progress, it gets a snapshot of virtual machine memory, and this is used to define the initial conditions. This snapshot trick works because of the other two tricks:

We can take a snapshot of memory that works on each user platform because it’s running on identical Virtual Machines. This snapshot can be supplied by any of the connected machines, selected at random or using the fastest. (By the way, we don’t take a snapshot of the whole VM, but just that part which is associated with the shared experience. A user can be part of any number of groups simultaneously, and they each effectively have their own separate memory. This separation allows me to be in groups A, B, and D, while at the same time you go in and out of group A and C separately.)

Remember that the flow of time is defined entirely by the router’s time stamp of messages. Time on the simulation stops in between messages, and the heartbeat messages are used to keep things moving. Thus Teleplace’s simulation-time is only loosely coupled to wall-clock time. At computer speeds, it’s close enough for people not to notice, but the fact that it is allowed to vary non-linearly with wall-clock time is what allows the simulations to stay correct. When we connect, we immediately start getting the same series of time stamped messages that everyone else gets, but we don’t execute them yet because we don’t yet have a snapshot to run them in. When we do get the snapshot, we throw away all the messages before the snapshot’s time stamp, and immediately execute our queue of all following saved messages, each just as fast as we can. Now we’re caught up and just like everyone else.

The first two issues and their solution had been covered as early as the 1970s by David Reed’s MIT PhD thesis. The programming that became Teleplace started when he and Alan Kay and David Smith realized that computers and networks were now fast enough to execute applications built on this concept of message time. Teleplace became practical when Smith and Andreas Raab developed the third idea and a simple programming model to keep shared-experience messages separate from everything else.

There are several immediate and practical consequences to all of this:

The initial joining snapshot is the only time we have to send everyone a lot of data. Each message is very short and represents a single relatively infrequent event such as typing a character. This means that the network traffic associated with the shared experience is very very low, and never includes large state updates. The shared experience includes such things as “Movie xyz is started at timestamp 1234”, but it does not include the actual movie traffic, which is handled separately on the Teleplace Overlay Network. While there may be a lot of separate voice, media, and app update traffic, the coordinating data is incredibly small.

There is no server load at all for coordinating shared experiences, and only a tiny router that does no computation is needed for each group. We only need a router running for a group that has people in it, and we can easily run hundreds of these on a single computer. To the Teleplace Overlay Network, the router traffic is just like any other media data to be coordinated and distributed with the same efficiency.

If a person cannot keep up because they have an unhealthy computer or network, it does not effect the results of others. In fact, even if a person temporarily falls behind because of something happening on their computer or network, their own version of the shared experience will still execute correctly in TeaTime, but slower than the expected real time. Everything will still be correct, and will automatically catch up to everyone just as soon as the computer and network are able to do so. This is crucial when people are on separate variable networks, including mobile networks.

The coordinating code is easy to write, without dealing with distributed systems issues. Even complex animations of interacting objects are written entirely as though there was just one user. In practice, very nearly all of our development and bug fixing time is non-TeaTime code for media or user interface, outside of the common shared experience code.

Even media and application code that was never written for sharing can be easily integrated into the system. There is no need to rewrite browser or office apps for cloud use or for sharing use. If there is one source for the data — such as one server running the application — then there is no question of producing multiple disparate results. All that is necessary is to use TeaTime to coordinate inputs such as “start movie” or “mouse down on pixel 123, 45”, and then allow the Teleplace Overlay Network to distribute the output to everyone.

While the bit-identical virtual machine is only necessary for the shared experience, it is very convenient to have as part of each Teleplace client. Porting to a new hardware or operating system platform consists of nothing more than making the virtual machine run correctly on the new platform. To the extent we get that correct, we also only have to do cross-platform testing of the non-VM parts of the app.

It is so easy to program a shared 3D experience with separate rendering for each user, that we started with that part, adding separate media types and office application platforms as we went. Thus Teleplace has always been easy for users while also exciting and engaging. It would not necessarily have been any easier to have started with just 2D app sharing such as WebEx, and then we would have been trying to figure out how to make it clear for users to tell who is doing what, or how to have multiple people gesturing at multiple apps, or how users and software can turn on and off parts that they are not using to reduce traffic, or how to sex it up, or how to make it meaningful on a 3D display.

And yet we have barely scratched the surface. We have not yet begun to explore the long-term consequences of TeaTime.

We could build 2D or 2.5D interfaces for use on boxes with weak graphic cards, and these could even be designed to work in the same meeting as people with the normal 3D interface.

Or we could go the other way and explore the possibilities of touch-screen tablet user interfaces, or 3D data gloves, WII, and XBox Kinect.

There is yet more context to be exploited. For example, application and media streams near us can have higher priority than those farther away.

Because the Teleplace Overlay Network supports embedded routers at sites, we could create simple appliances to enable incredibly low-latency interactions even while long term storage and perhaps apps would continue to be cloud hosted. While edge-caches like Akamai were hugely valuable for broadcast content distribution networks, such appliances would be the communication network equivalent for realtime unified-communications.

Because internal messages do not go through the router, the TeaTime router is only handling messages about what people are actually doing. For auditing or research, it may be much easier to determine intent and significant actions from this message stream than from a sea of video data or internal computations.

Since the entire shared experience is reduced to a small series of short input messages, these messages can be played back as machinima. An entire TeaTime “movie” would take practically no bandwidth compared with video, and could be viewed from any angle within the movie and played back at any changing speed and changing resolution with perfect fidelity. It could be played back step by step to explore such questions as “when did this meeting-or-training-or-simulation-or-operation go off the rails”?

Teleplace users interact with only one 3D TeaTime space at a time. We could overlay multiple spaces with different subgroups of people able to see and manipulate different sets of additional objects, such as, e.g., annotation.

People are already creating augmented reality applications info layered over real-world scenes. TeaTime could be used to coordinate the layered virtual space among users.

This technology works now, and has great possibilities for the future. Who knows what potential there is now that we have freed distributed computing from the tyranny of time.

About Stearns

Howard Stearns works at High Fidelity, Inc., creating the metaverse.
Mr. Stearns has a quarter century experience in systems engineering, applications consulting, and management of advanced software technologies. He was the technical lead of University of Wisconsin's Croquet project, an ambitious project convened by computing pioneer Alan Kay to transform collaboration through 3D graphics and real-time, persistent shared spaces. The CAD integration products Mr. Stearns created for expert system pioneer ICAD set the market standard through IPO and acquisition by Oracle. The embedded systems he wrote helped transform the industrial diamond market. In the early 2000s, Mr. Stearns was named Technology Strategist for Curl, the only startup founded by WWW pioneer Tim Berners-Lee. An expert on programming languages and operating systems, Mr. Stearns created the Eclipse commercial Common Lisp programming implementation.
Mr. Stearns has two degrees from M.I.T., and has directed family businesses in early childhood education and publishing.

Howard,
Awesome posts!
Just wonder, is there any difference between current TeaTime software implementation of OpenCroquet used in Teleplace, comparable to OpenCroquet (Hedgehog) publically avaliable since 2006 till now with no official updates?

The deeper differences in application in Qwaq/Telplace are:
* The TeaTime router is now just one of the media streams traveling over the overlay network.
* Media (voice, Webcam, RFB, movies) are overlay network streams, interfacing to a generalization of Hedgehog’s embedded apps. (E.g., there’s a tea-time part that signals events to any listening non-tea-time part, and the (visual or audio) rendering is in the non-tea-time part. The joining and teleporting process has some hooks for wiring up the non-tea-time parts.)
* Rendering more-or-less works the same way (e.g., with meshes and even primitives like cubes and spheres outside of tea-time, but getting their cues from tea-time frames).

I haven’t looked at Hedgehog in a very long time, but as I recall, the fetching and caching of resources such as textures and meshes was always “pluggable”, with the CroquetCollaborative KAT working a little differently than some of the other built-in demos. (The KAT fetched stuff from peers, with the idea that such content would eventually come from a DHT and thus not need a server.) Qwaq/Teleplace has always defined a set of server operations, including getting/putting content. Also, I think the KAT pickled worlds for offline-persistence using the same binary memory capture used for participants joining late. By contrast, Qwaq/Teleplace uses the XML format we published years ago, so that worlds could be used across versions (either updates or different implementations).

Of course, the big bulk of code differences is in features and hardening. For example, there’s tons of code — not all of it well written — to make sure that participants can survive a network disconnection with little interruption. Recording sessions and playing back imported video is a big deal. The RFB apps have always been implemented differently than in, e.g. the KAT. But all that is just shallow coding. The magic is still the same stuff we’ve always written about and summarized in this series.