Sendfile (a system call for web developers to know about!)

The other day I learned about a new (to me) exciting Linux system call! (for newcomers, a system call is an operation you can ask the operating system to do). This one seems really important to know about if you’re configuring a webserver! So let’s learn about it.

Before this, I knew about basic system calls like open and read for files, and sendto and recvfrom for networking. And a few fancier things like futex and select for mutexes and waiting.

why sendfile was invented

Suppose I want to send you a big file over a network connection. Normally I’d just read the file incrementally, and then write the contents to the socket. So, at a minimum, we need to

use read (requires a context switch into kernel code)

(implicitly, copy the data from kernel memory into user memory)

use sendto or write (another context switch)

This means we need to copy data (bad) and use two system calls instead of one (also bad).

So the idea is – this pattern of reading a file and writing to a socket is really common! So they made a system call to just do that! Then the kernel can do all the work of reading and writing, and save you CPU time. And you don’t need to copy any data around! AMAZING.

the disasters

That post describes how on OS X, sendfile wouldn’t send any data until the socket was closed, causing up to 5 second delays. That’s TERRIBLE. So sendfile isn’t some kind of universal panacea, and that’s why webservers let you turn it on and off.

Life of a HTTP request, as seen by my toy web server is interesting, and describes how the author uses sendfile for large files, but not for small files. You don’t need to write your own webserver to take advantage of this – you can configure apache and nginx to use sendfile!

The sendfile man page is actually quite readable, and it tells you something very important! It recommends using the TCP_CORK TCP option for better network performance. We learned about how understanding TCP is important in Why you should understand (a little) about TCP, and that’s pretty important here as well. In this case you need to decide whether to use TCP_CORK and TCP_NODELAY. One thing I read recommended using both.

You can also use sendfile to copy files quickly! (like, think about how cp is implemented!) So you want to write to a file real fast… walks through some optimizations to file copying and gets a 25% improvement by using sendfile and other tricks. I straced cp on my machine just now, and it seems like it does not use sendfile. It’s super interesting to me how much abstractions break down when you’re trying to really optimize performance.

next step: splice & tee

These days sendfile is a wrapper around the splice system call, which seems to be the same thing – copy data from one file/pipe/socket to another – but with some extra options.

Anyway, when would you actually use a kernel buffer? Normally you’d use it
it you want to copy things from one source into another, and you don’t
actually want to see the data you are copying, so using a kernel buffer allows
you to possibly do it more efficiently, and you can avoid allocating user VM
space for it

That post also makes it clear that sendfile used to be a separate system call and is now just a wrapper around splice.

There’s also vmsplice, which I think is related and important. But right now my brain is full. Maybe we’ll learn about vmsplice later.

why this is amazing

It makes me really happy when learning a new system call helps me understand how to do something really practical. Now I know that if I’m building something that serves large files and I care about the performance, I should make sure I understand if it’s using sendfile!