OLS: Kernel documentation, and submitting kernel patches

The second of four days at the 10th annual Ottawa Linux Symposium got off to an unusual start as a small bird "assisted" Rob Landley in giving the first talk I attended, called "Where Linux kernel documentation hides." The tweeting bird was polite, only flying over the audience a couple of times and mostly paying attention.

Landley did a six-month fellowship with the Linux Foundation last year to try to improve the Linux kernel's documentation. He explained that it was meant to be a year, but after six months he had come to some conclusions about how documentation should be done, which he said the Linux Foundation both agreed with and did not plan to pursue, and so he went back to maintaining his other projects.

Where, asked Landley, is kernel documentation? It's in the kernel tarball, on the Web, in magazines, in recordings from conferences like OLS, in man pages, on list archives, on developers' blogs, and "that's just the tip of the iceberg." The major problem is not a lack of documentation, he said, but that what is out there is not indexed.

The challenge in providing useful documentation for the Linux kernel, Landley said, is therefore to index what is already out there. When a source of some documentation for some item gains enough traction, it becomes the de facto source of documentation for that particular subsection of the kernel, and from then on gets found and maintained. But there is a big integration problem, as such sources of documentation are scattered around.

It is hard enough for Linux kernel developers to keep up with the Linux Kernel Mailing List, Landley noted, let alone to read all the other lists out there and keep track of the ever-growing supply of documentation. Putting all the kernel documentation found around the Internet together is itself a full-time job. Jonathan Corbet of LWN, he noted, is good at this already, but there are several people doing it each their own way in their own space.

The Linux kernel developers' blog aggregator, kernelplanet.org, and other aggregators offer a huge amount of information as well, Landley noted. But he said we need to aggregate the aggregators. Google is inadequate for the challenge, he said, as it can take half an hour to find some pieces of information, if you can find them at all, and it only indexes Web pages, not, for example, the Documentation directory in the kernel source tarball.

So what are the solutions? Landley explained how he set up a new page on kernel.org called kernel.org/doc, where all the aggregated documentation is stored in a Mercurial archive and is automatically turned into an indexed Web page. Adding information to this database is a task that requires a lot of editing, Landley said, quoting Alan Cox: "A maintainer's job is to say no." As the maintainer of the kernel documentation on kernel.org, Landley sees himself as mainly responsible for rejecting submissions, as one would with kernel patches. As a tree in its own right, the documentation has to be kept up to date and managed.

Asked why he does not use Wikipedia rather than the kernel.org/doc system, Landley explained that on Wikipedia, you cannot say no, so there's no real quality control on the information available, and it lacks a rational indexing system, which is still the core problem.

Landley said his six-month term with the Linux Foundation ended 10 months ago. While he is still responsible for the section, he no longer has the time to maintain it himself and stated that what is really needed is a group of a dozen or so dedicated volunteers under a maintainer to handle kernel documentation as its own project.

On submitting kernel features

Kleen, a self-described "recovering maintainer," asked, "Why submit patches?" then said people submit kernel patches for a variety of reasons. The code review involved in submitting patches usually improves code quality. Including code in the kernel allows it to be tested by users for free. Having it in the kernel instead of separate keeps it away from user interface conflicts. And you get free porting service to other architectures if your feature becomes widely used. Getting code into the kernel, he said, is the best way to distribute a change. Once it is in the mainline kernel, everyone uses it.

So how does one go about doing it?

Kleen outlined a few easy steps for submitting features to the kernel, and included two case studies to explain the points. The basic process as he explained it is:

You, the developer, write code and test it, and submit it for review. You fix it as needed based on the feedback from the review. It gets merged into the kernel development tree by the maintainer responsible for the section of the kernel that you are submitting a patch for. It gets tested there. And then it gets integrated into a kernel that is then released.

The basic things to remember when submitting code: the style should be correct and in accordance with the CodingStyle document in the kernel documentation directory found in the kernel source tarball. The submitted patch should work and be documented. You should be prepared for additional work relating to the code as revisions and updates as needed. And expect criticism.

Kleen compared submitting kernel code to submitting a scientific paper to a journal for publication. Getting attention for your kernel patch means selling it well. There is generally a shortage of code reviewers, and the maintainers are often busy. In some cases, you could be submitting a patch to a section of the kernel that has no clear maintainer. So selling your patch well will get you the reviewers needed to get started on the process.

You have to sell the feature, Kleen said, and split out any problematic parts where possible. Don't wait too long to redesign parts that need it, and don't try to submit all the features right off the bat. As his case study, he discussed a system he wrote called dprobes. After a while of it not going anywhere, Kleen resubmitted the patch as kprobes with a much simpler design and fewer features, and the code became quickly adopted.

There are several types of code fixes one can submit, Kleen said. The clear bugfix is the easiest to do and sell. He advised against overdoing code cleanups, because bugfixes are more important. And for optimisations, he suggested asking yourself a number of questions: how much does it help? How does it affect the kernel workload? And how intrusive is it?

In essence, Kleen said, a patch submission is a publication. The description of the patch is important. Include an introduction. If you have problems writing English, get help writing the introduction and description, he advised.

Over time and patches, the process becomes easier. When a kernel maintainer accepts a patch from you, it means he trusts you. The trust builds up over time. Kleen recommended making use of kernel mailing lists to do development on your patches, and suggested working on unrelated cleanups and bugfixes to help build trust.