In one of my project, I used ZeroMQ for inter-process communication which is extremely fast, allows async IO, different messaging patterns and supported on multiple platforms/languages.I used following three messaging patterns.

Publish/Subscribe: Where client subscribes to specific types of messages. When server reads these messages from hardware, it will publish to these clients.

Request/Response: Client can send request to server who execute the request, interact with hardware and get's the response back. E.g Client can request to open a serial port or play an audio.

Push/Pull: All clients will push the logs to the central logging server, central logging server pulls the messages and writes to the file.

As the development is done using C# on Windows Embedded environment, I use clrzmq which is a C# binding for ZeroMQ. Based on my initial performance test, I realized that clrzmq is taking lot more CPU than I expected.

I used RedGate's ANTS performance profiler for .NET which gives detail analysis on how much CPU cycles are spent on each function and how many times it is called.

What I found is that ZmqSocket.Receive() method spent it's time on

SpinWait:17.1%

Stopwatch.GetElapsedDateTimeTicks: 4.1%

Stopwatch.StartNew: 2.4%

Receive: 73.3%

In which Receive() function spent 64.4% of the time on SocketProxy.Receive()

As part of optimization, I used pre-allocated raw buffer to send and receive data instead of ZmqMsg object, moved StopWatch and SpinWait code in to a limited scope where timeout is defined and longer than certain value.

After these optimization, SocketProxy.Receive() uses only 2696.54 CPU ticks which is almost 1/5 of original cpu usages. See the attached picture below.

I wish I'd seen this article before I made a similar discovery manually.. My short term solution was to remove the timeout in the call to receive as I noticed the WithTimeout extension was where all the SpinWait and StopWatch action was happening. Sounds like I need the newer version :)