Categories

Writing Simple WebSocket Server in Python: PyWSocket

Journey to websocket was pretty long. I started with an idea to make an app which can play music in sync across the devices during college period. No wonder I couldn’t get thru it. Later this year I stumbled upon this new thing called WebSockets and they were intriguing. I thought I could finish that app with websockets (and I did, with partial success). Spinned of another app out of it. And websockets were on a roll. It was time I digged further in and ended up writing a websocket server. (GitHub link at the bottom)

So what is a websocket server?

A WebSocket server is a TCP application listening on any port of a server that follows a specific protocol, simple as that.

How does it work? : It uses HTTP protocol for handshake and after handshake is complete, it works over TCP protocol and exchanges data in it’s agreed-upon format called frames. Connections are bi-directional and any party can send message anytime. Unlike HTTP where new TCP connection is made every time you want to communicate, WebSockets maintains a connection using which any side can send message anytime, reducing the message delivery time by using the existing connection.

Credits: fullstackpython.com

The WebSocket Server:

We will be writing our server in 4 parts:

Writing a TCP/HTTP server to identify a websocket request.

Performing a handshake

Decoding/Receiving data/frames

Sending data/frames

I will be discussing the protocol implementations as we go thru steps. You can also take a pause have a look at this awesome piece written by Mozilla on WebSocket Servers. It is a must read. Now or later.

1. Writing a TCP/HTTP Server to Identify WebSocket Request

We will be using python’s SocketServer library which ptovides simple TCP server. The client will send an HTTP request which looks something like this:

1

2

3

4

5

6

GET/chat HTTP/1.1

Host:example.com:8000

Upgrade:websocket

Connection:Upgrade

Sec-WebSocket-Key:dGhlIHNhbXBsZSBub25jZQ==

Sec-WebSocket-Version:13

So what we need to lookout for is that if the request is of type GET and it has these three headers namely Upgrade: websocketConnection: Upgrade and Sec-WebSocket-key: <some random characters>

If you find all this, we can proceed towards the next step which is completing the handshake. In our implementation we will check if all the three headers are present and we will proceed with the handshake. The request handler function will look something like this:

This is the rough flow: If we find a valid websocket request, we proceed with handshake and then in while loop, we just do echo. ie sending back whatever we received. If it’s not a valid request we send HTTP 400 in response.

Pretty simple till now, innit?

2. Performing a Handshake

This is where the protocol details kicks in. You will need to send a specific HTTP response back to client in order to establish the bidirectional connection. The response will look something like this:

1

2

3

4

HTTP/1.1101Switching Protocols

Upgrade:websocket

Connection:Upgrade

Sec-WebSocket-Accept:s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

You see there a new header called Sec-WebSocket-Accept with some random looking characters. Now there’s a method to calculate this. As per protocol, you concatenate the key you received in request header (‘dGhlIHNhb…’) and the magic string (“258EAFA5-E914-47DA-95CA-C5AB0DC85B11”) , calcualte SHA1 hash of them and send back the base64 encoding of the hash (which is ‘s3pPLMB…’) This is done so that client can also confirm that the server understands the protocol. So handshake is basically HTTP response with a header containg SHA1 of the key and magic-string and key client sent and those same two headers.

Here’s how it’s done in python:

Python

1

2

3

4

5

6

7

8

9

defshake_hand(self,key):

# calculating response as per protocol RFC

key=key+WS_MAGIC_STRING

resp_key=base64.standard_b64encode(hashlib.sha1(key).digest())

resp="HTTP/1.1 101 Switching Protocols\r\n"+\

"Upgrade: websocket\r\n"+\

"Connection: Upgrade\r\n"+\

"Sec-WebSocket-Accept: %s\r\n\r\n"%(resp_key)

Here we send the key we received in request header as an argument and we use hashlib to calculate SHA1 and base64 to encode it.

3. Decoding an Incoming Frame

Now that the connection is established, client/the other side can send us data. Now the data won’t be in plain-text. It is using a special frame format defined in protocol. A frame looks something like this:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Frame format:

​​

0123

01234567890123456789012345678901

+-+-+-+-+-------+-+-------------+-------------------------------+

|F|R|R|R|opcode|M|Payload len|Extended payload length|

|I|S|S|S|(4)|A|(7)|(16/64)|

|N|V|V|V||S||(ifpayload len==126/127)|

||1|2|3||K|||

+-+-+-+-+-------+-+-------------+---------------+

|Extended payload length continued,ifpayload len==127|

+---------------+-------------------------------+

||Masking-key,ifMASK set to1|

+-------------------------------+-------------------------------+

|Masking-key(continued)|Payload Data|

+-----------------------------------------------+

:Payload Data continued...:

+-------------------------------+

|Payload Data continued...|

+---------------------------------------------------------------+

I will discuss the fields which we will be using here. Please read that Mozilla article I mentioned before to get more idea around this.

The FIN bit suggests that this is the last frame.We will assume/set it to 1 as we will be sending small amount of data only. Next 3 bits are reserved. The opcode suggests what kind of operation is this. 0x0 for continuation, 0x1 for text, 0x2 for binary etc. We will be using 0x1. The MASK bit we will discuss shortly.

Decoding Payload Length

To read the payload data, you must know when to stop reading. That’s why the payload length is important to know. Unfortunately, this is somewhat complicated. To read it, follow these steps:

Read bits 9-15 (inclusive) and interpret that as an unsigned integer. If it’s 125 or less, then that’s the length; you’re done. If it’s 126, go to step 2. If it’s 127, go to step 3.

Read the next 16 bits and interpret those as an unsigned integer. You’re done.

Read the next 64 bits and interpret those as an unsigned integer (The most significant bit MUST be 0). You’re done.

We are assuming it to be <125. That will leave byte 2 to 6 as masking bytes. If you are using web browser console as a client (which we will) it will set the mask bit to 1. Hence the payload will be masked. You can use XOR operation with the mask to get the original data back. The code to help you understand it better:

We sent frame as a bytearray as you noticed in the first function (handle). The operations are quite self explanatory. To get the payload length, we are subtracting 128 (the mask bit) from byte 1. (look at the frame structure and you’ll have a clear picture) Encrypted payload XORed with the mask will give us the decrypted payload.

4. Sending Frames

While sending frames, we will do nothing fancy. We will not set MASK bit and we will send data unmasked i.e. in plain text. So that will leave us with filling the FIN bit, the OPCODE, the LEN and finally the payload. Have a look:

Python

1

2

3

4

5

6

7

8

9

defsend_frame(self,payload):

# setting fin to 1 and opcpde to 0x1

frame=[129]

# adding len. no masking hence not doing +128

frame+=[len(payload)]

# adding payload

frame_to_send=bytearray(frame)+payload

self.request.sendall(frame_to_send)

Yep, that easy. So that wraps up our server. Now let’s have a look at how can we make it on roll. Fire up a web browser console and try these out:

We asked our browser side websocket to print whatever it receives in console. And our server is sending back whatever the client sends. So there you are. The mighty WebSockets with <80 lines of python code 😀 Check it out on GitHub.