Refactoring the Jungle Disk Gateway Service

As a newly independent company with newly allocated development resources at the start of 2016, we dedicated some portion of our work to refactoring some of our back-end services. I spent most of the year working with these types of changes and would like to share a particularly interesting issue we overcame.

Translating from one written language to the next isn’t perfect in all cases; it may take multiple words in one language to describe the meaning of a word in the other, for example. Refactoring from one programming language to the next can be like this as well - sometimes due to differences in primitives or the standard library.

This time, we’re going to look into the challenges associated with the refactor of our Gateway service.

Gate-what?

The Jungle Disk Gateway service is what allows live, indirect communication between our Server Edition Management Client (an app for remotely managing your server backup configuration) and the Server Edition Agent (installed on your servers). Because each of these programs are making outgoing connections to our Gateway service, you don’t have to open incoming ports in your firewalls to remotely control your server’s backup configuration. It was also originally used for file change notifications (for our Workgroup/Desktop Network Drive features), but that responsibility was eventually moved to its own service several years ago.

Receive Connection

When either of the Jungle Disk Server Edition programs connects to our gateway service, it authenticates and establishes a what we call a “Receive Connection” (which is simply an always-open https connection for receiving communication data). Some firewalls will close connections on the client’s end if it hasn’t received any transmissions for an extended period of time, though. To take care of this, we send an empty message out to the client once it reaches a minute or two of idle time. Some adjustments, messaging and cleanup take place immediately following a client disconnection from the Gateway Service.

Send Connection

When a client program wants to send some information (for example, if you’re starting a backup manually or changing settings), it’ll open a single-use Send Connection and transmit its message to our Gateways while the Receive Connection is still open.

Why Keep this Approach?

While there are some benefits with always-open connections, the concept of Polling is more standard today and is what most of our other services use. Unfortunately, switching to Polling was not an option since it would require some significant changes in our Server Edition client/agent programs and this would extend the timeline too much.

Refactoring

To keep the Receive Connection open, our original .NET service used Sockets and generally followed a Comet style approach. When a message is received from Client A for Client B, that message is forwarded to the Client B’s already-open Receive Connection. Client B may respond back to Client A, but at that point, it’s just treated as a new message for Client A and the process is repeated. If either client disconnects, the remaining client is informed and knows that the conversation is over.

1. Always-Open Connections

So in Go, we needed to keep the connection open at all times to feed message data (bytes) into the connection as we receive them. We accomplished this by using the golang feature of connection hijacking example here.

2. Trouble immediately detecting client disconnection

Later in testing, we encountered another issue: writing to a connection may be unsuccessful without returning an error for some period of time/writes (and was inconsistent). In other words, when the client software stopped, was disconnected, or the client computer shut down, our server didn’t know until after forwarding a few messages (which meant silently losing those messages instead of holding them in case the client reconnected shortly afterwards).

This was more complicated and we had trouble tracking down information on the topic, but eventually stumbled upon a Stack Overflow post of someone explaining that a Read on a connection is able to detect the loss of connection much more quickly. Even though we never have anything to read from our Receive Connection (since from the server’s perspective, we only write to the Receive connections), running a regular Read request to this connection in the background would reliably tell us within one second if the connection was severed. We used that information to signal to the Receive (writing) routine that the connection is dead and we should stop waiting.

Here’s a portion of our code showing how it all fit together:

...// Monitor heartbeat of connectionvar(heartbeatDone=make(chanbool)connected=truelistener=client.Listener())gofunc(){// Monitor connection and trigger disconnect/shutdown of client when it stops respondingforprimer:=make([]byte,1);connected;time.Sleep(1*time.Second){if_,err:=conn.Read(primer);err!=nil{context.Debug("hearbeat of connection no longer detected")postmaster.DeactivateClient(client)// stops message pulling; remove from locally connected clients mapclose(listener)// stops message pushing; close listenerconnected=false// stops heartbeat monitoring}}heartbeatDone<-true}()// NOTE: We want to share "connected", so we aren't passing it in as a parameter