The second patch removed the need for f_op->poll calls (an optional flag was added to struct file which could be queried by do_select() and do_poll()). This speeded up polling by another 3x to 4x. This patch required no changes to userspace code.

Off hand, this sounds like a very good idea. Did you consider whathappens if there are two processes calling select on a shared filedescriptor?

Finally, the third patch created the poll2(2) syscall. This provided a more efficient interface to the kernel, and removed the need for an application to search all fds to see where there was activity. Since the kernel already has to search all fds for activity, it is more efficient to pass back to userspace a short list of fds which have activity, saving the application the time of searching the big list of fds. This new syscall works well for both single-threaded and multi-threaded servers.

At some level, that's what the IO completion ports are all about,although they add the additional twist that not only do they notify youthat data is available, but actually transfer the data to the memorybuffer and tell you how bytes were transfered. They also don't requirean additional system call (since you can use something like fcntl toregister the fd with the I/O completion port).

The question, then, is if we're going to be modifying the user API,what ultimate API is best? A poll2 interface, or a I/O completion styleinterface?

This is not to say that completion ports are not without their problems.There are also questions of what happens if you try to register morethan one asyncronous I/O --- does it return an error, overwrite theprevious I/O request, etc? Do you allow asyncronous reads and writes?Since I'm on the road, I still haven't had a chance to look at Robey'sproposal, but there are some design/API questions that we need toconsider.

Both the second and third patches would massively improve the scalability of polling in Linux. Unfortunately, I didn't manage to get either into Linus' kernel, so after perfecting my patches, I stopped working on them. If I can get some encouragement from people who's opinion has some weight with Linus, I could resurrect these patches.

I believe the second patch is definitely worth revisiting andconsidering for inclusion, modulo some design questions that I mentionedabove. The third patch IMO needs to wait on the higher-levelarchitectural question of how we want to provide this kind offunctionality in general....

- Ted

-To unsubscribe from this list: send the line "unsubscribe linux-kernel" inthe body of a message to majordomo@vger.rutgers.edu