Sockets and protocols: exchanging messages. Using bytes.

When two applications wish to communicate over an (inter)network, sockets quickly come into play. Nowadays sockets are delivered by almost every operating system, in the form of an API that enables the developer to easily transfer data through a local network or over the internet to another machine.

TCP sockets

To instantiate a socket you have to combine an IP address and a port number to a so-called socket address, after which a connection (when using TCP) or a data flow (in case of UDP) can commence. The socket address then can be bound to a specific network interface using an IP address assigned to that interface, or to the ‘magic’ any address, 0.0.0.0 (or :: when using IPv6). You can also bind it to the loopback address (127.0.0.1 or ::1), making the socket accessible from the same computer only.

Server

Creating a server using C# and the .NET Framework doesn’t require a lot of effort. To create a socket and let it listen for a client that wants to connect you only need a couple of lines:

The naming of the various parameters and methods used when instantiating the socket is pretty self-explanatory: a streaming internetsocket is created, which sends and receives data using the TCP-protocol and the code bind()s it to the any-adress (so not to a specific network interface) on the given port. Besides IPv6 the socket also allows IPv4-connections, despite it being bound to the IPv6 version of the any address, because of the IPv6Only socket option which is disabled.

When this has been done, the port can be opened with a call to the socket’s Listen()-call. The method Accept() which is called after that, blocks until a client connects to the server. In the case of that event, the method will return whereafter the method StartReceiving(client) is called:

Just as Accept(), Receive() also is a blocking call; the code that follows it is only executed when data is received, or when the client gracefully disconnects, in which case the return value will be 0.

The MessageReceived()-method (not shown) converts the received byte array to a UTF8-string, which is printed to the console.

Done?

Unfortunately, there exists the widespread misconception that one is now ready to exchange messages between server and client. This is caused by the belief that one Send() call always results in one Receive() at the other side. This is as far from the truth as it gets! Unlike WebSockets, which you can use to exchange messages, ‘regular’ sockets work with bytes without any further semantics. The amount of data that gets placed on the wire depends on many factors. Data sent in one call to Send() and smaller than give or take 1448 bytes is a real candidate to indeed be sent in one TCP packet that might result in one Receive() returning on the other side, but when an application sends lots of small data blocks, one generally has a great chance that multiple (probably partial) messages are received as one byte array.

This is caused by Nagle’salgorithm, which keeps the byte arrays from multiple Send calls until a “full” enough packet can be put on the wire, in order to minimize TCP- and IP-overhead and thus maximizing bandwidth. The application of Nagle’s algorithm can be disabled, but then there’s still no guarantee that multiple messages won’t end up in one receive buffer. So you’ll have to understand that using timers (for example by sending one message per second, or by waiting five seconds after the last Receive and then assume the whole message has been received) isn’t a viable solution, regardless the unnecessary lag this way of guessing message boundaries introduces.

Protocols

To be able to tell a message apart from the larger byte array that it’s contained in, some rules have to be laid out that describe what a message actually looks like. Those rules are described in a protocol. When (and if) client and server adhere to these rules, to that protocol, they will be able to exchange messages over a streaming protocol.

HTTP

As said before, it’s vital to be able to determine different messages which reside in one big block of data that has just been received. On the other hand, larger messages that are received in multiple parts, will have to be glued together to form the original message again. The trickiest part (the order in which data is received) is taken care of by TCP, the data is presented from Receive() in exactly the order they were sent using Send(), but by default there’s no way to tell multiple calls or messages apart.

An HTTP request exists out of three parts: it starts with the start line, followed by the request header(s), followed by an optional body. The start line and headers, as stated in the specification, are each separated by one CrLf. Because it is stated that the CrLf is mandatory and that the last header has to be indicated by another CrLf (with a subtle hint towards “buggy HTTP/1.0 clients”, which did that less consistently), a server can keep reading during the reception of the first two parts of a request until it receives a CrLf, before it decides to start interpreting the received line. Until then the request will be buffered, which causes browsing with a telnet client to work (or sending an email if you connect it to an SMTP-server): only when you hit the Enter key, the just entered line is processed by the server. Because these protocols work this way, it doesn’t matter that for every keypress (perhaps unless you type blazingly fast) a Receive() will return at the server, with a value of 1 byte. This also explains why backspace won’t work when manually type an HTTP request towards a server: the key is sent to the server, adding a byte with the value of 8 to the string that was already entered. Unfortunately every Receive() can cause various context switches, which in turn can cause relative substantial delays.

To solve this (albeit for HTTP, “our” protocol won’t gain any profit), a kernel module called accf_http(9) was introduced in FreeBSD 4.0. This module hooks into the kernel and into the sockets-API that is being used by a web server, and caches each incoming request on the socket it is enabled on with the appropriate socket-option. Only when the filter has received a full HTTP-request (only GET and HEAD requests though, for obvious reasons) it will raise a Receive() in the API, with in the receive buffer (if it is large enough) the whole request it has just received. This causes less context switches, enabling the server to do its job even more efficiently. But I’m losing track here.

Reserved bits

The trickiest part of defining a protocol might be that you don’t know the required funcionalities of an eventual next version, or even the of version currently under development. This means you’ll have to carefully understand the implications of every choice you make while doing so. HTTP shows quite some thought went into that (although its somewhat verbose structure kind of turns that a little down), which resulted in the ability to extend practically every part of the protocol. A result from that choice is for example the existence of WebDAV, a protocol comparable to FTP but which is much more advanced on the field of permissions and versioning and is capable of handling metadata. It is an extension of HTTP, provided with new verbs (like PROPFIND) and XML message bodies the server and client interpret and respond to. Another extensibility of HTTP shows you can ‘make up’ new headers, which can be used to exchange information between client and server that hasn’t been specified in the original RFC, like the X-XSS-Protection header, while other existing headers allow extensions that provide new functionality.

All this can be done easily, because the specification doesn’t have a restraint on header size, nor has it for their options: just a CrLf indicates the end of a header, using a key: value pattern, enabling the server to read them one by one. Until a double CrLf is sent which announces the end of all headers, allowing an optional message body to be transmitted, the length of which is (generally) specified in the headers sent before. An alternative to this approach is fully specifying the message headers’ keys and possible option values, where each multiple of bits in the header has a predetermined function. TCP shows an example to this approach, defining the layout of the first 20 bytes of each TCP packet, where each group of bits or bytes has a specific function, depending on in which location in the header the bytes are positioned. While designing such a protocol you should allocate enough space for the data that might need to fit in in the future (how much positions do you need to write down a year?). One can’t always predict future extensions; while the foundations of the internet were laid out, more often than not an experiment got out of hand, just take a look at IPv4. Because the future’s still unpredictable, now and then a reserved bit is slapped in. This has been picked up in the RFC that has been published on April Fools’ Day 2000:

Reserved [a 32 bits wide space present in the header] must be 0 and will always be 0 in future uses. It is included because every other protocol specification includes a “future use” reserved field which never, ever changes and is therefore a waste of bandwidth and memory.

Yet another solution, also utilized by TCP, is sending headers in their own variable-length block of data, prepended by the block’s length, increasing header flexibility.

Our own protocol

Using abovementioned details one can define an own, basal protocol. A message sent using this protocol (just because) has the following layout:

The first four bytes are representing an ASCII-encoded string, indicating the message type. For now we only recognize the type “TMSG“, which can be used to send text messages between client and server. Following are four bytes that form an unsigned 32-bit integer (uint / UInt32) which defines the message length in bytes. This value and hence the message size can be any integer value from 0 tot 4,294,967,295.

Byte order

Another fact that should be kept in mind is that data sent over the network, can be received by a different type of device. The relevance of that lays in the order of successive bytes in multi-byte values, like the 32-bit integer we use: do you write the four bytes from large to small, or vice versa? Although many architectures utilize little-endianness, the usual order for transporting them over the network though is big-endian. Because the only unit that can be sent using a socket is one byte, all values requiring multiple bytes will have to sent in big-endian order. Because of the standards, because frankly this application is the only one that cares.

For the first four bytes this doesn’t matter, because they are defined as ASCII and will by definition each fit in one byte. The unsigned int following them will however, on the average CPU this code will be executed on data is stored little-endian, which shows from the output of the following code:

For a single value counting four bytes, reversing is no problem (and remember the method used for that for your next job interview, know your framework):

if (BitConverter.IsLittleEndian)
{
Array.Reverse(bytes);
}

Unfortunately you don’t gain much benefit from reversing all bytes in a Unicode-string, because it’ll generally also put the text it contains in reverse, while multibyte characters will be corrupted. A better alternative is using a byte order mark which determines the byte order of a Unicode-string, obtainable through Encoding.GetPreamble(). A byte array, whether it starts with a BOM or not, can be read back into a string using a StreamReader. It is also stated though, that:

For standards that provide an encoding type, a BOM is somewhat redundant.

When it’s decided that all multibyte-characters are transmitted in big-endianness, both the client and server side can simply use the BigEndianUnicode-implementation of the Encoding-class, which exposing the methods GetBytes() and GetString() provides in all possible needs. Worth noting is that wherever in .NET the term “Unicode” is used, a UTF-16-encoding is meant, where each reference to a Unicode-codepoint uses either two or four bytes. A more efficient encoding is UTF-8, where all characters used in most Western languages usually only require one byte. Using UTF-8 the byte order is always the same, completely rendering the BOM superfluous.

Implementation

Keeping that knowledge in mind we can start implementing the client and server side, enabling both programs to start exchanging messages. The implementation starts with the following interface:

Parsing

Now a structure exists which enables us to read and write messages to an in-memory representation of the data, we can proceed to put this data onto the wire and write some code that converts received data back into such a message. The latter part is called parsing. Sending a message is demonstrated by the following code:

The data from memory (the message object) is converted into a byte array that is sent towards the server.

What are you doing? Or: state

When a message is sent using the code mentioned above, one or more Receive()-calls will return on the receiving end, so that application can start to parse the data. The data that’s being received solely exists out of byte arrays and an integer indicating how many bytes have been received. The application doesn’t know if there have been earlier Receive() calls and what data has been received at those times. To be able to recognize such situations (i.e. when a possible partial message is being received), we need an object that keeps track of all data, at least until a whole message is formed: enter ParserState (not shown). A state object initially contains a new, empty message and a buffer to store the headers. The state object is to be persisted between calls by the calling application, and it needs to be provided after every Receive(), so newly received data can be added onto a previously received message, when required.

The majority of this work is done by the following code, merely counting eighty lines:

As you can read from the comments, a few possible situations exists. The initial situation is of course a clean sheet: no data has been received and the state has just been initialized. Now data is received: the start of a new message. The first eight bytes will, as dictated by the protocol, function as header and thus be stored in the state.Headers buffer. Once eight bytes have been received, the headers will be parsed, resulting in the Length header being set. Now the parser can see how much payload is to be expected after those headers.

It is however perfectly possible that not all of the eight bytes that form the header have been received and that this is the end of the currently available data. This is by no means a problem: the rest of the code will be skipped and the state containing the data received until now will be returned. The data has to be on ‘in flight’, or is probably still in the process of being sent by the other end. Upon the next call, the newly received data can be appended to the existing state, until all header data has been received.

Now the payload can be read, if the headers indicated that a payload should follow. If this isn’t the case, the current message is marked as complete and a new state is created, the buffer-“pointer” is incremented and the loop will be ran again using the eventual remaining data from the receive buffer.

By looping through the buffer and returning a list of states containing completed messages, you can theoretically receive an unlimited amount of messages, as long as enough memory remains to store the list and while all messages are adhering to the protocol. If the latter isn’t the case, sooner or later arbitrary data will be incorrectly assumed to be buffer data. When this happens, most probably an incorrect message type is created (which is defined in the first four bytes of a message), causing the MessageFactory to throw an exception.

Extending the protocol

Currently only one message type can be exchanged, and neither client nor server have the ability to understand the messages and don’t contain any logic, apart from printing their received messages to the console. Besides that the server can handle only one client at a time.

Fortunately, .NET’s sockets-API also provides asynchrone methods, easily enabling you to create a non-blocking, multithreaded server. The class I wrote to implement this functionality has the responsibilities of sending and receiving messages, exposing that as methods a consumer can call and delegates that can be consumed. This is too much code to show here, but when requested I might be able to put it on GitHub or the like.

Anyway, a server and client being able to successfully exchange messages closes the protocol chapter.

Defining application specific behaviour

What follows next is refining the protocol by adding more types of messages, specific to the requirements of the application. Those new messages are in exactly the same form of the TextMessage discussed before, only differing in the contents of the MessageType-header. For example when you’re building a chat server, this way you’re able to create a JOIN-message which lets a client join a given chatroom, the address or channel name of which is to send in the message body (i.e. the payload). Defining how a server keeps track of certain clients and chatrooms, and who may access which and perhaps even who has the permissions to create and assign such things has to be determined by the one(s) desiging the protocol, so client and server can implement this protocol and understand each other.

Now the structure to send messages exists, all that rests is determining and realising the functionality, by implementing all requirements (“A client has to send a NICK-message before it can JOIN a chatroom”) on both client and server side. While the infrastructure has been laid out, it is extendable, for example enabling to limit message length (although the transport part of the protocol describes a maximum of 4 GB can be sent in one message, client and server can agree to put limits on certain or all messages), or the message payload can be more actively involved in the exchange (like the JOIN-message). This way, using the existing message infrastructure, information can be exchanged.

Don’t reinvent the wheel

Please realise though what I’m describing here begins to resemble a poor reinvention of the wheel called IRC. While it’s perfectly defendable to write such code for demonstration’s sake, usually an existing protocol will solve the problem you think is unique, because an application protocol can actually be used as a transport protocol, which is demonstrated by REST and SOAP. Both protocols use HTTP as their transport mechanism, exchanging (for client and server) meaningful messages in XML or JSON format.

Multiplexing

The current implementation of our protocol shows some deficiencies. One of the biggest is that the messages are sent and received synchronously: the parser will keep reading from the same buffer until the whole message has been read. When a lot of data is being exchanged, for example when (large) files have to be exchanged, all other communcation between those two nodes blocks until all file data has been received.

Solving this problem can roughly be done in two different ways: enabling multiple TCP-connections or UDP datagrams to flow between the two parties (which costs more resources and causes other problems), or enabling multiplexing to occur on one stream like SPDY does. When using multiplexing, client and server send a maximum amount of bytes at once (about 1400 would be nice, so each message part usually consumes one IP packet), adding a stream-ID in the message header. This way the receiving end knows to which message in which state the incoming message is to be appended, enabling multiple data flows to occur at the same time. With this method you don’t have to wait for larger message exchanges, since every stream can send its data in between the smaller messages.

Conclusion

Programming with sockets is really interesting, although it quickly adds more responsibilities than a beginning (sockets) programmer can think of. So, if you want to exchange messages and you actually aren’t implementing the newest state-of-the-art protocol or building a high-performance networking game or application that can gain benefit from designing a new protocol: pick any existing protocol or an extension of one and make it fit your needs, so you won’t walk into pitfalls that others have long conquered.

There’s no use for peek here. The application has to append the data to the message buffer as soon as data is available; there’s no use letting it sit in the buffer and peek at it until there’s a complete message in the buffer (it might even just not fit). The loop processes data that has already been received, so it won’t wait for more bytes to come in (it’ll just return when 0 bytes are left): when more bytes are received, a new call to the same method is made using the new buffer.