Prerequisites

Principles

The use of WebSockets requires some tweaks to the traditional way of thinking about an application. The typical goal is to have each of your application instances (dynos) be completely stateless and to have any state be stored in a location they can all access. This way client requests can be load balanced across all dynos and the data a client needs will be present regardless of which dyno a particular request hits.

This design is only partially valid when using WebSockets. It’s still best to keep all application data in a shared location. However now a client will connect and keep affinity to a single dyno. All communication with that client will happen from that dyno until a reconnection occurs. Furthermore that client will likely expect real time updates to shared data so each dyno must be able to be notified when something changes.

This change creates a need to share global state among dynos (application data) while also maintaining local state on each dyno (which clients are connected and who they are).

The sample application

The sample application is a modification of the WebSocket chat sample that ships with Play 2.x distributions. Out of the box the chat sample only works as a single instance application. The changes below describe how it can be modified to share state across multiple nodes.

Architecture

In its original form the chat application deals with two pieces of data: a list of members in the room and chat messages. The member list is stored on the actor that handles the room. It is a map of member names with a handle to the output stream of the WebSocket that each member is connected through. The chat messages aren’t stored at all. As soon as one comes in it’s simply sent to all connected members from the member list.

When deployed and scaled as-is this design causes each dyno to act as its own chat room rather than having them work together to create a single shared room. In order to scale, data needs to be shared among dynos. The two pieces of data that must be shared are: a global list of the members that are in the room and chat messages. Both of these will be stored on a Redis instance that all the dynos can share and communicate through.

There is also a 3rd piece of data that must be kept. Each dyno must keep track of which members are connected to it and keep handles to the WebSocket for each user. This data will be local to the dyno. There is no need to keep a global picture of all dynos and where each member is connected.

This diagram illustrates the design:

Connecting

The application is a simple Play 2.2 app. The interesting entry point is the WebSocket that is opened once a user is signed into the chat room. On the server side this is a controller method in Application.java that returns a WebSocket.

Here the connection to the WebSocket is opened and receiveEvent is registered to be called when a message comes through the socket. The code in receiveEvent will display the message in the chat room. Each message also includes the member list in the payload so that we can keep that up to date.

Sending Messages

ChatRoom is a singleton actor that handles all of the details of the chat interaction. There are 3 events that happen in the chat room: join, quit, and chat message. The simplest of these 3 is sending a chat message.

Chat messages are received over each user’s WebSocket connection. When the user is added to the room the onMessage event of the input stream of their WebSocket is set.

Each time a message is received over the WebSocket, this method will create a new Talk object and put a JSON representation of it onto a Redis pub/sub channel. All messages go over this channel and all instances of the chat application subscribe to it. This is how message state is shared across dynos.

You can use the Akka scheduler to schedule the Redis channel to be subscribed to in another thread:

This onMessage method will read each message that comes over the pub/sub channel, create the appropriate object from it, and tell the singleton actor defaultRoom about it. The message processing logic is in the onReceive method of the actor. Here the message type is determined and the message logic is executed:

When each dyno reads a message off of the pub/sub channel it loops through the list of members connected to it and sends the message to each of them. Each message also contains the list of all members in the room.

An important thing to note here is that each dyno deals with two lists of members. The instance variable members contains the list of members that are connected to the dyno along with their WebSocket output streams. However the list of all members in the room that is sent down with the message comes from Redis. This is how a group of dynos are able to present a global picture of who’s in the room to users while each dyno only communicates to the subset of users that are connected to it.

Joining and leaving

When the user connects they’re added as a member of the room by calling ChatRoom.join.

Joining does two things: it sends a message to the defaultRoom actor which adds the member to the room and it sets up the appropriate callbacks on the user’s WebSocket. The onMessage callback was explained above. The onClose callback is where the quit message is sent to the actor. Messages sent to the defaultRoom actor are handled by its onReceive method.

The portion of onReceive that handles Talk events was already covered above and has been omitted here. Joining and leaving each have to manage the state of who is in the room and who is connected to this dyno. Each of them will manipulate the local and global member lists accordingly.

Joining and leaving also generate a special kind of chat message called a RosterNotification. This is the message that notifies all users when someone joins or quits. When a join or quit occurs the RosterNotification is put onto the pub/sub channel so that all dynos can handle it equally. The message is handled very similarly to a normal talk message. It is sent to all members connected to each dyno via the notifyAll method.

Keeping the connection open

The last important piece of logic is a keep-alive for the WebSocket connection. The request timeout window on Heroku still applies to WebSocket connections. An application must send some data across the connection once every 55 seconds or it will be closed by the router.

The original chat sample included a robot that would send a message to the room once every 30 seconds. This fulfills the keep-alive requirement.