Sockets Tutorial with Python 3 part 2 - buffering and streaming data

Welcome to part 2 of the sockets tutorial with Python. In the previous tutorial, we learned how we could send and receive data using sockets, but then we illustrated the problem that can arise when our communication exceeds our buffer size. In this tutorial, we'll talk about overcoming this!

As mentioned before, there are a few logical ways that you could handle for this, but one common way is by starting all messages with a header that contains the length of the message that is going to come. The next challenge is normalizing this header in some way. You might consider using some series of characters, or some format, but then you run the risk of people accidentally, or purposefully, mimicking this formatting. Instead, you can go with a fixed-length header, where the first n bytes of data will be the header data, which will include the length of the message to come. Once we've received that length of data, we know any following information will be a new message, where we need to grab the header and continue repeating this process.

So what we need to do now is choose some truly maximal message size. Say, 1,000,000,000 bytes. Right, there's almost no circumstance where someone would attempt anything even close to this via our chat app, so this will be fine. That number is 10 bytes (10 chars). In python, how might we represent any number as 10 characters? We can use string formatting! Yay basics! Since this is a lesser-used functionality, see more here: format examples, which you will see examples like:

In this case, you can see various examples where there are 30 characters used every time, but you can do various alignments. While this is mainly used to make text-based GUIs pretty, we can also use this for our purposes, like:

f'{len("your message here!"):<10}'

In the above case, this will produce the length of our message using 10 characters.

>>> f'{len("your message here!"):<10}'
'18 '

All we do now is just pre-pend all of our messages with this, then we can convert the first 10 chars of new messages to an int to know how much more is a part of a unique message. To do this, we'll start in our server script:

So now our messages will have a header of 10 characters/bytes that will contain the length of the message, which our client use to inform it when the end of the message is received. Let's work on the client.py next:

This one is a bit more involved, but nothing too crazy here. I increased out buffer to 16 bytes. 8 wouldnt even be enough to read the header, so that would have been a problem, and you would probably never have a buffer as small as these anyway. We're just doing it for example. So, we start off in a state where the next bit of data we get is a new_msg.

If the message is a new_msg, then the first thing we do is parse the header, which we already know is a fixed-length of 10 characters. From here, we parse the message length. Then, we continue to build the full_msg, until that var is the size of msglen + our HEADERSIZE. Once this happens, we print out the full message.

Going from this to some sort of streaming API is quite simple. Let's do an example where the server just streams out something simple, like the current time.

Now, run these two things, and you should see the server outputting something like:

28 The time is 1552068299.01783
30 The time is 1552068302.0181189
30 The time is 1552068305.0206459
29 The time is 1552068308.021842
29 The time is 1552068311.024837
29 The time is 1552068314.025016
28 The time is 1552068317.02619
29 The time is 1552068320.026504
29 The time is 1552068323.031633
30 The time is 1552068326.0359411
29 The time is 1552068329.039903
29 The time is 1552068332.040124
30 The time is 1552068335.0402749
27 The time is 1552068338.0437
29 The time is 1552068341.043971