Subject: Syslets, "Threadlets", generic AIO support, v3
this is the v3 release of the syslet/threadlet subsystem:
http://redhat.com/~mingo/syslet-patches/
This release came a few days later than i originally wanted, because
i've implemented many fundamental changes to the code. The biggest
highlights of v3 are:
- "Threadlets": the introduction of the 'threadlet' execution concept.
- syslets: multiple rings support with no kernel-side footprint, the
elimination of mlock() pinning, no async_register/unregister() calls
needed anymore and more.
"Threadlets" are basically the user-space equivalent of syslets: small
functions of execution that the kernel attempts to execute without
scheduling. If the threadlet blocks, the kernel creates a real thread
from it, and execution continues in that thread. The 'head' context (the
context that never blocks) returns to the original function that called
the threadlet. Threadlets are very easy to use:
long my_threadlet_fn(void *data)
{
char *name = data;
int fd;
fd = open(name, O_RDONLY);
if (fd < 0)
goto out;
fstat(fd, &stat);
read(fd, buf, count)
...
out:
return threadlet_complete();
}
main()
{
done = threadlet_exec(threadlet_fn, new_stack, &user_head);
if (!done)
reqs_queued++;
}
There is no limitation whatsoever about how a threadlet function can
look like: it can use arbitrary system-calls and all execution will be
procedural. There is no 'registration' needed when running threadlets
either: the kernel will take care of all the details, user-space just
runs a threadlet without any preparation and that's it.
Completion of async threadlets can be done from user-space via any of
the existing APIs: in threadlet-test.c (see the async-test-v3.tar.gz
user-space examples at the URL above) i've for example used a futex
between the head and the async threads to do threadlet notification. But
select(), poll() or signals can be used too - whichever is most
convenient to the application writer.
Threadlets can also be thought of as 'optional threads': they execute in
the original context as long as they do not block, but once they block,
they are moved off into their separate thread context - and the original
context can continue execution.
Threadlets can also be thought of as 'on-demand parallelism': user-space
does not have to worry about setting up, sizing and feeding a thread
pool - the kernel will execute the workload in a single-threaded manner
as long as it makes sense, but once the context blocks, a parallel
context is created. So parallelism inside applications is utilized in a
natural way. (The best place to do this is in the kernel - user-space
has no idea about what level of parallelism is best for any given
moment.)
I believe this threadlet concept is what user-space will want to use for
programmable parallelism.
[ Note that right now there's a pair of system-calls: sys_threadlet_on()
and sys_threadlet_off() that demarks the beginning and the end of a
syslet function, which enter the kernel even in the 'cached' case -
but my plan is to do these two system calls via a vsyscall, without
having to enter the kernel at all. That will reduce cached threadlet
execution NULL-overhead to around 10 nsecs - making it essentially
zero. ]
Threadlets share much of the scheduling infrastructure with syslets.
Syslets (small, kernel-side, scripted "syscall plugins") are still
supported - they are (much...) harder to program than threadlets but
they allow the highest performance. Core infrastructure libraries like
glibc/libaio are expected to use syslets. Jens Axboe's FIO tool already
includes support for v2 syslets, and the following patch updates FIO to
the v3 API:
http://redhat.com/~mingo/syslet-patches/fio-syslet-v3.patch
Furthermore, the syslet code and API has been significantly enhanced as
well:
- support for multiple completion rings has been added
- there is no more mlock()ing of the completion ring(s)
- sys_async_register()/unregister() has been removed as it is not
needed anymore. sys_async_exec() can be called straight away.
- there is no kernel-side resource used up by async completion rings at
all (all the state is in user-space), so an arbitrary number of
completion rings are supported.
plus lots of bugs were fixed and a good number of cleanups were done as
well. The v3 code is ABI-incompatible with v2, due to these fundamental
changes.
As always, comments, suggestions, reports are welcome.
Ingo