WebSocket Extensions in Tyrus

There is always room for another experimental feature :-) This one is maybe little less experimental than broadcast support, but please implement presented APIs with one important fact in your mind – it can change any time.

What is WebSocket Extension?

You can think of WebSocket Extension as a filter, which processes all incoming and outgoing frames. Frame is the smallest unit in WebSocket protocol which can be transferred on the wire – it contains some some metadata (frame type, opcode, payload length, etc.) and of course payload itself.

If you are interested about some more details related to WebSocket protocol specification, please see linked RFC document (RFC 6455).

What can be achieved by WebSocket Extension?

Almost everything. You can change every single bit of incoming or outgoing frame, including control frames (close, ping and pong). There are some RFC drafts trying to standardise extensions like per message compression (used to be per frame compression) and multiplexing extension (now expired).

Tyrus already has support for per message compression extension and exposes interfaces which allow users to write custom extensions with completely different functionality.

When should I consider implementing WebSocket Extensions?

This is maybe the most important question. WebSocket Extensions can do almost everything, but you should not use them for use cases achievable by other means. Why? Majority of WebSocket use cases are about communication with browsers and javascript client cannot really influence which exception is going to be used. Browser must support your particular extension (by default or it can be enabled by some custom module).

You can easily use custom extension when using Tyrus java client, so if there is no browser interaction in your application, it should be easier to distribute your extensions to involved parties and you might lift the threshold when deciding whether something will be done by extension or by application logic.

and the specification (JSR 356) limits extension definition only for handshake purposes. To sum that up, users can only declare Extension with static parameters (no chance to set parameters based on request extension parameters) and that’s it. These extensions don’t have any processing part, so the work must be done somewhere else. As you might already suspect, this is not ideal state. Usability of extensions specified like this is very limited, it is basically just a marker class which has some influence on handshake headers. You can get list of negotiated extensions in the runtime (Session.getNegotiatedExtensions()) but there is no way how you could access frame fields other than payload itself.

Proposed Extension API

I have to repeat warning already presented at the start of this blog post – anything mentioned below might be changed without notice. There are some TODO items which will most likely require some modification of presented API, not to mention that RFC drafts of WebSocket Extensions are not final yet. There might be even bigger modification needed – for example, multiplexing draft specifies different frame representation, use of RSV bits is not standardised etc. So please take following as a usable proof of concept and feel free to use them in agile projects.

Firstly, we need to create frame representation.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

publicclassFrame {

publicbooleanisFin() { .. }

publicbooleanisRsv1() { .. }

publicbooleanisRsv2() { .. }

publicbooleanisRsv3() { .. }

publicbooleanisMask() { .. }

publicbytegetOpcode() { .. }

publiclonggetPayloadLength() { .. }

publicintgetMaskingKey() { .. }

publicbyte[] getPayloadData() { .. }

publicbooleanisControlFrame() { .. }

publicstaticBuilder builder() { .. }

publicstaticBuilder builder(Frame frame) { .. }

publicfinalstaticclassBuilder {

publicBuilder() { .. }

publicBuilder(Frame frame) { .. }

publicFrame build() { .. }

publicBuilder fin(booleanfin) { .. }

publicBuilder rsv1(booleanrsv1) { .. }

publicBuilder rsv2(booleanrsv2) { .. }

publicBuilder rsv3(booleanrsv3) { .. }

publicBuilder mask(booleanmask) { .. }

publicBuilder opcode(byteopcode) { .. }

publicBuilder payloadLength(longpayloadLength) { .. }

publicBuilder maskingKey(intmaskingKey) { .. }

publicBuilder payloadData(byte[] payloadData) { .. }

}

}

This is pretty much straightforward copy of frame definition mentioned earlier. Frame is designed as immutable, so you cannot change it in any way. One method might be recognised as mutable – getPayloadData() – returns modifiable byte array, but it is always new copy, so the original frame instance remains intact. There is also a Frame.Builder for constructing new Frame instances, notice it can copy existing frame, so creating a new frame with let’s say RSV1 bit set to “1″ is as easy as:

1

Frame newFrame = Frame.builder(originalFrame).rsv1(true).build();

Note that there is only one convenience method: isControlFrame. Other information about frame type etc needs to be evaluated directly from opcode, simply because there might not be enough information to get the correct outcome or the information itself would not be very useful. For example: opcode 0×00 means continuation frame, but you don’t have any chance to get the information about actual type (text or binary) without intercepting data from previous frames. Consider Frame class as as raw as possible representation.isControlFrame can be also gathered from opcode, but it is at least always deterministic and it will be used by most of extension implementations. It is not usual to modify control frames as it might end with half closed connections or unanswered ping messages.

New Extension representation needs to be able to handle extension parameter negotiation and actual processing of incoming and outgoing frames. It also should be compatible with existing javax.websocket.Extension class, since we wan’t to re-use existing registration API and be able to return new extension instance included in response fromSession.getNegotiatedExtensions():List<Extension> call. Consider following:

ExtendedExtension is capable of processing frames and influence parameter values during the handshake. Extension is used on both client and server side and since the negotiation is only place where this fact applies, we needed to somehow differentiate these sides. On server side, only onExtensionNegotiation(..) method is invoked and client side hasonHandshakeResponse(..). Server side method is a must, client side could be somehow solved by implementing ClientEndpointConfig.Configurator#afterResponse(..) or calling Session.getNegotiatedExtenions(), but it won’t be as easy to get this information back to extension instance and even if it was, it won’t be very elegant. Also, you might suggest replacing processIncoming and processOutgoing methods by just oneprocess(Frame) method. That is also possible, but then you might have to assume current direction from frame instance or somehow from ExtenionContext, which is generally not a bad idea, but it resulted it slightly less readable code.

Last but not least is ExtensionContext itself and related lifecycle method. OriginalExtension from javax.websocket is singleton and ExtendedExtension must obey this fact. But it does not meet some requirements we stated previously, like per connection parameter negotiation and of course processing itself will most likely have some connection state. Lifecycle of ExtensionContext is defined as follows: ExtensionContextinstance is created right before onExtensionNegotiation (server side) oronHandshakeResponse (client side) and destroyed after destroy method invocation. Obviously, processIncoming or processOutgoing cannot be called before ExtensionContextis created or after is destroyed. You can think of handshake related methods as @OnOpenand destroy as @OnClose.

For those more familiar with WebSocket protocol: process*(ExtensionContext, Frame) is always invoked with unmasked frame, you don’t need to care about it. On the other side, payload is as it was received from the wire, before any validation (UTF-8 check for text messages). This fact is particularly important when you are modifying text message content, you need to make sure it is properly encoded in relation to other messages, because encoding/decoding process is stateful – remainder after UTF-8 coding is used as input to coding process for next message. If you want just test this feature and save yourself some headaches, don’t modify text message content or try binary messages instead.

Code sample

Let’s say we want to create extension which will encrypt and decrypt first byte of every binary message. Assume we have a key (one byte) and our symmetrical cipher will be XOR. (Just for simplicity (a XOR key XOR key) = a, so encrypt() and decrypt() functions are the same).

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

publicclassCryptoExtension implementsExtendedExtension {

@Override

publicFrame processIncoming(ExtensionContext context, Frame frame) {

returnlameCrypt(context, frame);

}

@Override

publicFrame processOutgoing(ExtensionContext context, Frame frame) {

returnlameCrypt(context, frame);

}

privateFrame lameCrypt(ExtensionContext context, Frame frame) {

if(!frame.isControlFrame() && (frame.getOpcode() == 0x02)) {

finalbyte[] payloadData = frame.getPayloadData();

payloadData[0] ^= (Byte)(context.getProperties().get("key"));

returnFrame.builder(frame).payloadData(payloadData).build();

} else{

returnframe;

}

}

@Override

publicList onExtensionNegotiation(ExtensionContext context,

List requestedParameters) {

init(context);

// no params.

returnnull;

}

@Override

publicvoidonHandshakeResponse(ExtensionContext context,

List responseParameters) {

init(context);

}

privatevoidinit(ExtensionContext context) {

context.getProperties().put("key", (byte)0x55);

}

@Override

publicvoiddestroy(ExtensionContext context) {

context.getProperties().clear();

}

@Override

publicString getName() {

return"lame-crypto-extension";

}

@Override

publicList getParameters() {

// no params.

returnnull;

}

}

You can see that ExtendedExtension is slightly more complicated that original Extension so the implementation has to be also not as straightforward.. on the other hand, it does something. Sample code above shows possible simplification mentioned earlier (one process method will be enough), but please take this as just sample implementation. Real world case is usually more complicated.

Now when we have our CryptoExtension implemented, we want to use it. There is nothing new compared to standard WebSocket Java API, feel free to skip this part if you are already familiar with it. Only programmatic version will be demonstrated. It is possible to do it for annotated version as well, but it is little bit more complicated on the server side and I want to keep the code as compact as possible.

CryptoExtensionApplicationConfig will be found by servlets scanning mechanism and automatically used for application configuration, no need to add anything (or even have)web.xml.

Per Message Deflate Extension

The original goal of whole extension support was to implement Permessage extension as defined in draft-ietf-hybi-permessage-compression-15 and we were able to achieve that goal. Well, not completely, current implementation ignores parameters. But it seems like it does not matter much, it was tested with Chrome and it works fine. Also it passes newest version of Autobahn test suite, which includes tests for this extension.

TODO

There are some things which needs to be improved or specified to make this reliable and suitable for real world use. It might not seem as big feature, but it enables lots of use cases not defined in original specification and some of them are clashing little bit, so for now, I kept everything as it was when you are not using extended extensions.

Everything mentioned in this article is already available in Tyrus 1.4-SNAPSHOT and will be part of 1.4 release.

Conclusion

There is still lots of decisions to be made and things to do, but it seems like we can implement usable extensions which are supported by newer versions of browsers and containers. PerMessageDeflate extension is nice example of handy feature which can save significant resources.

Links

Join the discussion

Comments ( 3 )

guest Thursday, January 9, 2014

Great to see WebSocket compression is started to get implemented more widely! It's really useful, as it can compress e.g. real-world JSON payload 10-15x.

Also nice to see you are using Autobahn|Testsuite;) Two comments on that: a) link is broken, b) yes, Autobahn|Testsuite (fuzzing client) includes tests for WS compression, but we still need to add tests for parameter negotiation (it currently only contains a dozen or so tests with default parameters).

Cheers,

Tobias

Pavel Thursday, January 9, 2014

Hi Tobias,

we are using Autobahn test suite from the very beginning, it proved to be excellent tool for keeping implementations compatible and interoperable.

Sorry for broken link - it is already fixed.

Tyrus impl does not support any parameters for PerMessageDeflate extension, but it hopefully will someday :) The RFC itself is not final yet, so I'm little bit reluctant to spend lots of time on that if it may be completely changed. Adding tests to autobahn test suite will definitely help and maybe put more pressure to us to implement it (I like green boxes much more than red ones).

Regards,

Pavel

guest Thursday, January 9, 2014

Hi Pavel,

thanks for link fix and the nice words;)

Rgd permessage-parameters, fwiw: Autobahn|Python already implements all parameters and full negotiation (and it works with Chrome). It's just missing coverage in the testsuite. We will add it, so you get a couple of red boxes, which you can then make green again;)