Introduction

CXQueue is a class to create and manage an interprocess queue. CXQueue is intended to offload operations from client app to server app - for example, performing logging actions; or server might function as network gateway, so that details would be transparent to the client.

CXQueue supports FIFO (first-in first-out) communication between multiple writers (clients) and one reader (server) that are running on same machine. Communication is one-way only - from client to server. CXQueue offers no delivery guarantee.

CXQueue may be used in either MFC or SDK programs. It has been tested on Windows 98, Windows 2000, and Windows XP.

Background

Although Windows NT-based platforms provide mechanisms such as mailslots and named pipes that can be used to emulate a queue, these are not available on Win9x systems. There have been some superficial implementations of queues using shared memory and other techniques, but these have been limited in one of two ways:

They are limited to only intraprocess, not interprocess communication;

or

They are synchronous, meaning that queue is limited to one entry, and server must remove and process entry before client can add another.

An Alternative

If you are reading this, you are probably also familiar with Microsoft queue solution: MSMQ. This is an extremely powerful tool that is mandatory in some situations. However, it is also very complex, non-trivial to install and set up, and has some issues on Win9x systems.

Concept & Facilities

CXQueue is based on shared memory using memory-mapped file. Access to shared memory is controlled by means of two kernel objects: an event (used by clients to ensure quiescence, to allow server to free some space); and a mutex, to control actual writing to queue.

The shared memory (hereafter called MMF, for memory-mapped file), is composed of fixed-size blocks. The fixed size ensures that there will be no fragmentation, and eliminates need for garbage collection. The first block of MMF is used for CXQueue variables. The second block and all further blocks are used for data.

The first block (the variable block) holds data that is used to manage and monitor queue. The data blocks are managed by the use of two doubly-linked lists: free list and used list. Initially, all blocks are assigned to free list. When client writes to queue, one or more blocks are allocated from free list and transferred to used list. This involves simply relinking block numbers; no physical block copying is done, so it is very fast.

The client may write any number of bytes to queue (up to total size of data blocks). The default queue block size is 1024 bytes, but this may be easily changed by modifying one line of code. Typically client→server message is much less than 1024 bytes - usually, it is between 10 and 200 bytes. To assist in determining optimal block size, CXQueue monitors and stores maximum size of client write in queue variable block. You can then use this information to adjust size of queue blocks in your own application.

Regardless of block size chosen, it should be expected that there will be client writes that exceed block size. When this happens, CXQueue determines how many blocks will be needed, and writes data to blocks, splitting data as necessary. (Note: data blocks have header with back and forward links and other information, so there is less than 1024 bytes available in each block for data). If multiple blocks are necessary, continuation flag is set in block header, to indicate that there is another block (which can be found by means of forward link).

The links that have been mentioned are block numbers. Data blocks are numbered from 0 to N-1, although first data block is actually second block in MMF. To calculate byte address for any block, formula is:

block_address = (block_no * BLOCK_SIZE) + base_address

where

block_no is zero-based number of data block,

BLOCK_SIZE is fixed number of bytes in a block,

and base_address is byte address of first data block,

which is simply address returned by MapViewOfFile(), plus

BLOCK_SIZE (to account for variable block).

Note that there is no guarantee that blocks will be contiguous in MMF for multi-block client write. But they will always be delivered to server in correct order.

When client has added entry to queue, notification event informs server that queue needs to be serviced. The server then performs read on queue (usually, two reads, with first read returning only size of queue entry). Then server reads data and returns blocks to free list. Because only server manipulates blocks already in used list, it is not necessary to lock queue (using mutex) until server actually frees blocks. This optimization helps to reduce time that clients are prevented from writing.

As mentioned in previous paragraph, mutex is used to control write access to queue. An event object (mentioned earlier) is also used to synchronize queue access. This event object is used only by clients. Here is why: when client wants to write, first thing it must do is determine if there are enough free blocks to accommodate entire write. If there are, it can then write. But if there are not, it would do no good to use mutex at this point, because using mutex would lock out server as well as all clients, so server would not be able to free any blocks. The client would then be waiting for server to free some blocks, and server would be waiting for client to release mutex.

What event object does is to prevent clients from starting any new writes. Since no one will be adding anything to queue, server will have chance to free some blocks. When enough blocks become free, waiting client can complete write, and then set event to allow other clients to write.

To ensure that messages are processed in first-in first-out manner, newly-written blocks are always appended to end of used list. When server processes queue entries, it always removes entries beginning at front of used list. Server apps may verify proper ordering of message entries by inspecting sequential message id that is stored in each queue entry; multi-block entries will have same message id stored in each block.

In current implementation, CXQueue has been tested with multiple clients, but only one server.

How To Use

To integrate CXQueue class into your app, you first need to add following files to your project:

XQueue.cpp

XQueue.h

XMemmap.cpp

XMemmap.h

For details on how to use CXQueue object, refer to code in XQueueClientTestDlg.cpp, XQueueServerTestDlg.cpp, and ServerThread.cpp.

Known Limitations

XQueue offers no delivery guarantee.

XQueue does not guarantee that there will not be duplicates.

XQueue clients and server must be on same machine.

Demo App

XQueueClientTest.exe and XQueueServerTest.exe demonstrate use of CXQueue in client/server scenario. When you start client, it will first try to open XQueue. If this fails, client will try to start server, as you see in messages that are logged:

After you select number of messages and message size, client queues messages, and calculates throughput:

You can start multiple clients and server will update its stats to display number of clients connected:

The server will dynamically display MMF loading as messages are queued:

Frequently Asked Questions

Why use XQueue at all? Why not just use Microsoft's MSMQ?

There are two issues that you have to deal with if your are considering MSMQ. First, MSMQ is normally not installed on Win98 systems. To install MSMQ 1.0 on Win98 systems, you will need NT4 Option Pack. Also, you must be running either MSMQ 1.0 or MSMQ 2.0 on an NT server to use an MSMQ 1.0 independent client on Win98 (MSMQ 2.0 is not available for Win98). Note that MSMQ is not available on XP Home edition, only on XP Pro.

Second, MSMQ is complex: 30+ APIs, 100+ properties, 10+ major structs. Plus, installation of MSMQ is non-trivial - you can't just dump some DLLs in SYSTEM directory. Obviously, MSMQ offers some significant features, and if you need those, then MSMQ is for you. Unlike MSMQ, XQueue (client and server) runs on Win98 and XP Home.

Why use XQueue at all? Why not just use named pipes?

First, unlike named pipes, XQueue (client and server) runs on Win98. Second, XQueue is much easier to integrate into your app than named pipes, and requires no installation procedure.

I don't want to run my app under debugger all the time, just to get TRACE output. How can I see all this error reporting?

You can use excellent free utility DebugView from Sysinternals. This allows you to see all TRACE output from your debug builds. One very nice feature of DebugView that I cannot live without is ability to filter the output, and colorize any line that contains a particular string.

Can I use XQueue in non-MFC apps?

Yes. It has been implemented to compile with any Win32 program.

When I try to include XQueue.cpp in my MFC project, I get compiler error,

When using XQueue in project that uses precompiled headers, you must change C/C++ Precompiled Headers settings to Not using precompiled headers for XQueue.cpp and XMemmap.cpp. Be sure to do this for All Configurations.

The default installation options of Visual C++ V6.0 do not install Unicode libraries of MFC, so you might get an error that mfc42u.lib or mfc42ud.lib cannot be found. You can fix this either by installing Unicode libs from VC++ install CD, or by going to Build | Set Active Configuration and selecting one of non-Unicode configurations.

Can we use XQueue in our (shareware/commercial) app?

Yes, you can use XQueue without charge or license fee. It would be nice to acknowledge my Copyright in your About box or splash screen, but this is up to you.

Revision History

Version 1.2 - 2005 January 17

Initial public release

Usage

This software is released into the public domain. You are free to use it in any way you like, except that you may not sell this source code. If you modify it or extend it, please to consider posting new code here for everyone to share. This software is provided "as is" with no expressed or implied warranty. I accept no liability for any damage or loss of business that this software may cause.

Share

About the Author

I attended St. Michael's College of the University of Toronto, with the intention of becoming a priest. A friend in the University's Computer Science Department got me interested in programming, and I have been hooked ever since.

Recently, I have moved to Los Angeles where I am doing consulting and development work.

For consulting and custom software development, please see www.hdsoft.org.

Comments and Discussions

Fitstly,very good article and great code.Secondly,in my program,except the message from many client to server,I want to send messages from server to client sometimes,can you tell me the possible solution? thank you.

I'm writing a (non-MFC) plugin DLL for a game that won't run when a debugger is in memory. This DLL is an xQueue client.

Along with this DLL, I'm writing a MFC program (SDI based on CFormView) that is an XQueue server.

The game loads the DLL, and the DLL executes (via CreateProcess) the server program. The server program creates all the XQueue stuff in the app class as a globally accessible object. Here's the code I use for that:

This function returns true, and the DLL can successfully write to the queue. To verify correct operation in the DLL, I put a MessageBeep(0) in the Write method of the queue. It beeps like mad.

My problem is that the server app doesn't appear to be getting any messages at all. I put a MessageBeep(0) in the view's event handler, but no beeping is occuring.

Can you see anything in there that I might have forgotten

"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997-----"...the staggering layers of obscenity in your statement make it a work of art on so many levels." - Jason Jystad, 10/26/2001

I thought about it a bit after posting my original message, and I think I'm gonna try using the client that comes with your code and see if I can hook it up to my "server". I think that way, I can my server under the debugger. If I can't find it doing it that way, I'll send you what I have.

"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997-----"...the staggering layers of obscenity in your statement make it a work of art on so many levels." - Jason Jystad, 10/26/2001

I have a program which outputs numerical data first in bursts (backlog when starting XQueue), then on variable but reasonable intervals (>30seconds). I rewrote part of the client and interfaced it to send both types of data to the XQueue server. Real-time data gives me no trouble when disabling transmission of the backlog data, but sending bursts of data crashes the server almost immediately after only about 10 messages or so.

Mutex's are very expensive when used thousands of times a second, critical sections are much faster but whats faster then both of these is using none of them at all!

What I propose you do is what I did for my interprocess code. Use single linked lists instead of double linked lists, this allows you do away will all your sync code. These are the two multithread safe linked list access functions you can use:

InterlockedPopEntrySList
InterlockedPushEntrySList

These are multi thread safe functions that do not require thread protection. If you keep your current mapped memory design but instead use two single linked lists and two events then your code will be much much faster. This is a basic design you could use:

Note that this code is completely multi thread safe even without any mutex's or critical sections. Also note that due to the design both the server and any number of clients can be processing and sending data at the same time.

There is one side affect which could be an issue this is the order that the server will receive the blocks in. Due to the blocks being added to the front of the singlely linked list they will be returned as LIFO (Last in first out) or is it FILO (First in last out)??? Basicly this means if you wish to receive the blocks in the order that they were transmitted you would have to sort them as and when they are received. Possible give every block a number that increments by one each time. This number could be stored in the block therefore would not require sync code.

If you compare the performance of this class and yours, your notice a very big difference in overhead and bandwidth. On my laptop this class could transmit at of 300,000 packets a second at over 500,000,000 bytes per second. To put that in perspective, that's over 4000 MBit/s, a standard network card will do 100 Mbit/s flat out.

Hopefully this will help someone else who wants to transmit data very very quickly between processes.

// Shared memory buffer that contains everything required to transmit
// data between the client and server
struct MemBuff
{
SLIST_HEADER m_Available; // List of available blocks
SLIST_HEADER m_Filled; // List of blocks that have been filled with data
Block m_Blocks[IPC_BLOCK_COUNT]; // Array of buffers that are used in the communication
};

// Enter a continous loop
for (;;)
{
// Try to grab some data from the filled buffer
Block *pBlock = (Block*)InterlockedPopEntrySList(&m_pBuf->m_Filled);
if (pBlock)
{
// Get the amount we can write
DWORD written = buffSize;
if (written > pBlock->Amount) written = pBlock->Amount;

// Try to grab a block from the available list data from the filled buffer
Block *pBlock = (Block*)InterlockedPopEntrySList(&m_pBuf->m_Available);
if (pBlock)
{
// Set the amount we can write
DWORD written = amount;
if (written > IPC_BLOCK_SIZE) written = IPC_BLOCK_SIZE;
pBlock->Amount = written;

But I think there is a potential bug (or just a drawback) in this code. In the constructor, after push all the list headers in the avil list. It is better to call SetEvent(m_hAvail). Then when testing, the client could first call waitAvailabe() and then call write(). Otherwise, the write() will fail when all the blocks are filled.

I just curious how you test this class?

BTW: I want give some advice for others who want to use this code in their program too. You need to write your class for osString or osCVar, or change them somehow.

Following is my code for testing this osIPC class. Hope it helpful.

// osIPCTest.cpp : Defines the entry point for the console application.
//

I've used this in my IPC class and achieved over 3,000,000 msgs/sec or over 1,100,000,000 Bytes/Sec (27GBit) depending on the hardware/packet size. The reason it achieves such high performance is because it doesnt use a single lock, no mutex, no critical sections, no spin locks. This minimises thread context switching and allows the full potential of multi core machines to be expoited. Basiclly all clients and all servers can read and write all at the same time on different threads.

When I wrote this class it was written purely to show how to perform fast very IPC communication not to provide the capabilities your implementation does but maybe it would be a good idea to merge the two providing the best of both worlds. Not that you have'nt given enough to the community with your article but it's just a thought.

Regards

John

P.S. That the original article has a bug in it but I can't edit it anymore for some reason. Check the article forum for the bug fix.

In your known limitations section, you mentioned
# XQueue offers no delivery guarantee.
# XQueue does not guarantee that there will not be duplicates.
# XQueue clients and server must be on same machine.

The 2nd and 3rd limitation is ok in my opinion but the first limitation seems to be a bit bothering me. In what situation will the delivery fail generally?

Hi
This is a great tool. I am going to use this tool as template to develop a serial communication server.

I need to add a function that server give commands to client to write data (from MMF) to COM devices. But I notice that "Communication is one-way only - from client to server". How can I add one more communication from Server to Client?

what is the singleton class in c++, and how to create it.
in somany interviews those people asking this question,
how can we use in our application, means what situation we can ues this singleton class.

I tried to use your class to communicate between a service and normal application on Windows XP, but it wasn't possible due to access denied errors. The problem was that you pass NULL as SECURITY_ATTRIBUTES parameter to functions CreateMutex, CreateEvent and MapMemory. So I basically just changed NULL to GetSecurityAttributes() function which is defined as follows:

After this modification the application will have access to queue created by service, but you can still receive FILE_NOT_FOUND error when opening file mapping if you're using Terminal Services on Windows 2000 or XP. The solution is to prefix mapping name with "Global\" - just modify CXMemMapFile::CreateMappingName as follows:

Hello
First of all I would like to thank an author and contributors for extremly useful article.
There is another small addition, regarding Win Service on XP or 2003. You should prefix names of all kernel objects (Mutex, event) in XQueue with "Global\" string. It is possible to use the same function :CXMemMapFile::CreateMappingName (just copy it to XQueue cpp file or something like this). This will enable to make cross session calls, I used this to communacte between LocalSystem service and User mode apps and this works fine even for "Fast User switch" (Terminal service).
all the best , David

Very good! I may be out of step with a lot of people but I do like articles written about code in decent English. Letting the code speak for itself is all very well in certain circumstances but generally it is just laziness.

It is unclear to me from the article which parts undergo serialziation via the mutexes? Is this a multiple writer (simultaneously writing from multiple threads and/or processes) and single writer, or is it more like a single writer at a time, and no-one writes while server is reading, and likewise no-one reads while someone is writing?

I am looking for something that allows an unspecified number of clients to independently write (per process would be fine, not necessarily per thread) to a queue and there only need be a single reader. I want to avoid serializing access to the queue for the writers, because I don't want them serialized against each other.

We had a logger that works similar to what you are doing here and it was switched to use TCP/IP. I did not do the changes, so I am not sure if that was to allow the server and client to be on separate machine, or to possibly de-serialize the writers.