Introduction

When using message queue with socket or any other file descriptor based
unix facilities, the most inconvenient thing is message queue does not
support select() system call. So usually unix programmers solve
the I/O multiplexing issue in a simple but ugly way like

while(1)

{

select on socket with timeout;

...

wait on a message queue with IPC_NOWAIT

}

Certainly, the above implementation is ugly. I don't like it. Another
solution might be adopt multi-threading. But here in this article, I want
to show you a funny approach, that is, implementing a new system call called
msgqToFd(). I'm not trying to provide you with full-fledged,
bug-free kernel implementation. I just want to present you my experiment.
This article might be interesting to readers who like to play
with GNU/Linux kernel source.

msgqToFd() - A new non-standard system call

Here is its signature.

int msgqToFd(int msgq_id)

It returns a file descriptor corresponding to a message queue , which
can be used with select().

If any error happens, it returns -1.

An application can use the call like

...

q_fd = msgqToFd(msgq_id);

while(1)

{

FD_ZERO(&rset);

FD_SET(0, &rset);

FD_SET(q_fd, &rset);

select(q_fd + 1, &rset,
NULL, NULL, NULL);

if(FD_ISSET(0, &rset))

{

...

}

if(FD_ISSET(q_fd, &rset))

{

r = msgrcv(msgq_id,
&msg, sizeof(msg.buffer), 0, 0);

...

}

}

How select() works

A file descriptor is associated with a file structure. In the file
structure, there is a set of operations supported by this file type called
file_operations. In the file_operations structure, there
is an entry named poll. What the generic select() call does
is call this poll() function to get status of a file (or socket
or whatever) as the name suggests.

In general, the select() works like

while(1)

{

for each file descriptor in the
set

{

call file's poll()
to get mask.

if(mask &
can_read or mask & can_write or mask & exception)

{

set bit for this fd that this file is readable/writable or there is an
exception.

retval++;

}

}

if(retval != 0)

break;

schedule_timeout(__timeout);

}

For detailed implementation of select(), please take a look
at sys_select() and do_select() in fs/select.c.
of standard kernel source code.

Another thing required to understand is poll_wait(). What it
does is put current process into a wait queue provided by each kernel facilities
such as file or pipe or socket or in our case, message queue.

Please note that the current process may wait on several wait queues
by calling select()

long sys_msgqToFd(long msqid)

The system call should return a file descriptor corresponding to a message
queue. The file descriptor should point to a file structure which
contains file_operations for message queue.

To do that, sys_msgqToFd() does

with msqid, locate the corresponding struct msg_queue

allocate a new inode by calling get_msgq_inode()

allocate a new file descriptor with get_unused_fd()

allocate a new file structure with get_empty_filp()

initialize inode, file structure

set file's file_operations with msgq_file_ops

set file's private_data with msq->q_perm.key

install fd and file structure with fd_install()

return the new fd

Please take a look at
msg.c
and the accompanying
msg.h
provided with this article. See also
sys_i386.c

msgq_poll()

msgq_poll() implementation is pretty simple.

What it does is

With file->private_data, which is a key for a message
queue, locate the corresponding message queue

put current process into the message queue's wait queue by calling
poll_wait()

if the message queue is empty (msq->q_qnum == 0),
set mask as writable( this may cause some arguments but let's forget this
for now). If not, set mask as readable

return the mask

Modification of existing message queue source code

To support poll() on a message queue, we need to modify
existing message queue source code.

The modification includes

adding a wait queue head to struct msg_queue, which
will be used to put a process into for select(). Also the
wait queue head should be initialized when a message queue is created. Please
take a look at struct msg_queue and newque()
in msg.c.

Whenever a new message is inserted to a message queue, a process waiting
on the message queue( by calling select()) should be awaken.
Take a look at sys_msgsnd() in msg.c.

When a message queue is removed or it's properties are changed, all
the processes waiting on the message queue(by calling select())
should be awaken. Take a look at sys_msgctl() and freeque()
in msg.c.

To allocate a new inode and file structure, we need to set up some
file system related

s for VFS to operate properly. For this purpose, we need additional
initialization code to register a new file system and set something up.
Take a look at msg_init() in msg.c.

All the changes are "ifdef"ed with MSGQ_POLL_SUPPORT.
So it should be easy to identify the changes.

File System Related Stuff

To allocate a file structure, we need to set up the file's f_vfsmnt
and f_dentry properly. Otherwise you'll see some OOPS
messages printed our on your console. For VFS to work correctly with
this new file structure, we need some additional setup, which is already
explained briefly.

Since we support only poll() for the file_operations,
we don't have to care about every detail of the file system setup code.
All we need is a properly set up f_dentry and
f_vfsmnt. Most of the related code is copied from pipe.c.

Adding a new system call

To add a new system call, there two things need to be done.

The first step is add a new system call in kernel level, which we already
did (sys_msgqToFd()).
In the GNU/Linux kernel, all system V IPC
related calls are dispatched through sys_ipc() in arch/i386/kernel/sys_i386.c.
sys_ipc() uses call number to identify a specific system
call requested. To dispatch the new system call properly, we have to define
a new call number(which is 25) for sys_msgqToFd() and modify
sys_ipc() to call sys_msgqToFd(). Just for
your reference, please take a look at arch/i386/kernel/entry.S
in the standard kernel source and sys_ipc()
in sys_i386.c
provided with this article.

The second step is add a stub function for user level application. Actually
all the system call stub functions are provided by GLIBC. And to add a new
system call, you have to modify the GLIBC and build your
own and install it. Oh hell, NO THANKS!!!. I don't want to do that and I
don't want you to do that either. To solve the problem, I did some copy and
paste from GLIBC. If you look at
user/syscall_stuff.c
provided with this article, there is a function named msgqToFd(),
which is the stub for msgqToFd() system call.

What it does is simply

return INLINE_SYSCALL(ipc, 5, 25, key, 0, 0, NULL);

Here is a brief description for the macro.

ipc : system call number for sys_ipc(). ipc is expanded
as __NR_ipc, which is 117.
5 : number of arguments for this macro.
25 : call number for sys_msgqToFd()
key : an argument to sys_msgqToFd()

INLINE_SYSCALL sets up the arguments property and invokes
interrupt 0x80 to switch to kernel mode to invoke a system call.

Conclusion

I'm not so sure about practical usability of this modification.
I just wanted to see whether this kind of modification was possible or not.

Besides that, I want to talk about a few issues needed to be addressed.

If two or more threads or processes are accessing a message
queue and one process is waiting on the message queue with msgrcv()
and another is waiting with select(), then always the former process/thread
will receive the new message. Take a look at pipelined_send()
in msg.c.

For writability test, msgq_poll() sets the mask as
writable only if the message queue is empty. Actually we can set the mask
as writable if a message queue is not full and there will be no big difference.
But I chose the implementation for simplicity.

Let's think about this scenario.

A queue is created

A file descriptor for the queue is created

The queue is removed

In this kind of case, what should be do? A correct solution would
be close the fd when the queue is removed. But this is impossible since
a message queue can be removed by any process which has a right to do that.
This means a process removing the message queue may not have a file descriptor
associated with the message queue even if the message queue is mapped to
a file descriptor by some other process.

Additionally, if the same queue (with the same key) is created again,
the mapping will be still maintained.

Efficiency problem. All the processes waiting on the wait queue by
calling select() will be awaken when there is a new message.
Eventually only one process will receive the message and all the other processes
will go to sleep again.

No support for message type. Regardless of message type, if there
is any message, the select() will return.