Sockets in .Net Core – Part 2

Stream and Framing

Sockets implement an abstraction of NetworkStream which is a Stream. The stream is always a sequence of bytes. We read and write a stream of bytes, that is very relevant to understand what we do. Sockets do know nothing about the data itself. Applications must know what to do with the byte stream and how to convert it to something meaningful.

The string “Hello my friends!” means nothing to a network communication. The string needs to be encoded and represented in a byte array. The application receiving that data needs to decode it properly from the stream in order to translate it to something understandable. In this context, understandable only makes sense for the two parts of the communication. Information broadcast can represent an image, text, encrypted data or any kind of other format.

In general, simple high-level protocols deal only with ASCII strings or byte arrays, but this is not a rule, it is just an observation.

For example, a valid representation of the phrase “Hello my friends!” can be encoded using the following C# code:

var myMessage = Encoding.ASCII.GetBytes("Hello my friends!");

Which yields a byte array containing the following bytes:

72
101
108
108
111
32
109
121
32
102
114
105
101
110
100
115
33

Of course, the message must be decoded using the same method on the destination to make sense, using the following C# code:

var decoded = Encoding.ASCII.GetString(myMessage);

Let’s remark again the byte stream doesn’t mean anything to the socket and network broadcast. Only high-level applications writing and reading to the socket can translate it into meaningful messages or entities such as images, documents and so forth.

A very normal application for sockets is high-level protocols. At the time being many small devices such as GPS, IoT or similar devices implement a socket communication to a server that can be configured using simple packets with commands, instructions, information, monitoring data, etc.

These kind of high-level protocols are mostly byte or ASCII based. The majority of them are TCP/IP based with some unusual UDP/IP implementations. TCP is (mostly) reliable in terms of package delivery, order, duplication, retry. Under certain circumstances such as bad connectivity, UDP will lost a high percentage of packages, TCP might be better due to retrying mechanisms. Our applications need to be aware of the fundamentals of protocols to expect appropriated output.

High-level interpretations lead to another common issue which is part of this post’s goal, framing.

Framing

In a nutshell, framing is the process of messages interpretation, which requires to identify message boundaries in a stream context.

The term boundaries is just a definition that correspond to whatever decisions were taken in terms of defining a high-level protocol. In that sense there is no restriction, I can determine any kind of rule that is doable. For example, I could define my own chat protocol that will always start messages with @@ and finish with $$. Messages will look like the following examples:

@@ey! this is a message in the protocol, pretty cool, isn't it?$$
@@sure, let's keep chatting this way$$

Boundaries were defined by my “protocol” and they only make sense in the context of my potential application. In theory, receiving those messages should be a trivial task, I just need to read from a socket and pass what I read to a method, class or function that handles it. The reality is, there is no any possibility of having certainty of receiving those messages with boundaries preserved (in one read). So the first issue is dealing with the reception. A possible reception of the messages in 3 different reads could be the following:

Comment: Chat protocols might be implemented using UDP, which is less reliable but at the same time simpler to implement. UDP preserves boundaries, messages are broadcast in one piece so we don’t have to deal with framing.

The data read from the stream is copied into a buffer, encoded in ASCII, and passed to a string manipulation function that will look for start / end of the boundaries. I will show all the code a bit further. For the sake of simplicity I didn’t define some of the variables but it is worth to explain some of them.

Buffer size is related to how much data I am trying to read at once, this process is required to read from the communication queue and clean up that data downstream. In general Socket Flags are not used, in fact, it requires a different level of complexity and in some cases are hacks not entirely recommended unless you are dealing with unusual communications. When I say “I am trying to read” I mean exactly that, there is no guarantee I will read a specific number of bytes, I just indicate the maximum size I am trying to read.

What is the main problem of this algorithm? Even when string manipulation is not complex, a high workload could lead to heavy delays. Delays in reading process causes something called flow control, the underlying layers will force a transfer interruption. If the interruption happens very often it will cause many other issues such as high CPU or data loss for time out.

Read about flow control in TCP/IP connections to get a deeper understanding.

The problem gets much worse when protocols are binary. I refer to a binary protocol which is not entirely accurate, to those protocols that are not encoded in a human readable format. They are very common in small devices where memory and buffer size are critical. In a typical ASCII encoded format I could send the following flags in a readable ASCII message:

1,0,1,0,1,1

Which is acceptable, but when memory and buffer are critical, those flags take a big chunk of it. Of course I exaggerated the representation but it’s not unusual, this message takes 11 bytes to send 6 flags. In a binary compact protocol I could represent it in one byte where 6 bits build the flags and I still have room for 2 more flags. The byte transmitted could be:

45 = 0x2B

One byte makes all the eleven previous bytes if I change the protocol.

I did mention it makes it worse. Why? It is simple, once we deal with array of bytes instead of string operations, we need to create arrays and use array copy methods to move those chains of bytes. String operations can be used in a more efficient way using different mechanisms but byte array operations are heavy, the fact of “extending” an array requires to create a new one and copy the two previous arrays into the new one which is a very inefficient operation (I know, strings concatenation does the same but we have alternatives such as StringBuilder).

Once we deal with heavy workload, binary protocols cause a lot of troubles, mainly high CPU consumption if there are many processes or threads dealing with devices at the same time. The problem doesn’t arise on the message reception, in fact, it is entirely message interpretation.

Let’s imagine a binary protocol that marks message start with some bytes that might be any kind, 1 or 2 start bytes and 1 or 2 end bytes (I indicate 1 or 2 because it’s unlikely to find binary protocols that mark with less than 2 for obvious reasons). Once packages arrive we need to concatenate binary data that might be something like the following example (in hexadecimal for reading clarity):

There is no guarantee we receive all at once. What is the big issue? The quantity of Array operations we will need for a very simple case. Let’s indicate it in a series of steps before digging into the code:

First message arrives. It is copied into a buffer and passed to a message analysis function. Boundaries are not present so it just copies it into the buffer and does nothing.

Second message arrives. The message analysis function requires to do the following, create a new buffer that holds message 1 and 2, copy both into it. Because message boundaries now are present it takes the first packet and send it for processing trimming again the remaining buffer. Because there is no any other message it returns.

Third buffer arrives, same as in 2, a lot of Array copies and creations.

CPU consumption of the process sky-rockets from that point and onward. Binary processing is the worst case. Very basic processing functions will look like the following code:

There is a common method to enhance binary copy but it is something to be careful with, Buffer BlockCopy. In theory we can improve our code using buffer block copy operations but it is not that simple, first, it requires a deeper understanding of the underlying implementation. In some critical cases, block copy operations are mandatory to achieve an expected output. Also, some improvements to the code I show here might work, such as keeping fixed size buffers (avoiding constant creation) but the implementations will become more complex.

In the next part we will explore a new method implemented in .Net core 2.2 and above, it solves many of the issues of framing in a more efficient way. The method is called Pipelines and it was not available until the mentioned version. Life is easier when implementations of common techniques are standardized!

Published by maxriosflores

Solution Architect for a decade.
I designed, built and implemented software solutions for more than 25 years and every single day more interested on technology. I learned to code in a Texas Instruments with 16kb at 8 years old. I shared this passion with friends coding CZ Spectrums, MSX's and C64's. I worked in computers since my early 17's with super old tools like plain C and Quick Basic. I love math and computers as much as outdoors and family life.
View more posts