Monday, February 18, 2013

Documenting python code using sphinx and github

Documentation is good right? Doesn't everybody like to have a good source of documents when working with a piece of software? I know I sure do. But creating documentation can be a drag, and creating pretty documentation can be even more of a drag, more time consuming than writing the code in the first place. Ain't nobody got time for that!

Thankfully there are utilities out there that will help with creating documentation. My language of choice is Python and for generating documentation I like to use Sphinx. Sphinx appeals to me because much of the documentation can be auto-generated based on existing code. It works with python docstrings in a way that can be PEP257 compatible. It takes very little setup to get from nothing to decent looking documentation. For an example of what sphinx output can look like, see my pyendeavor project.

The first step is to make frequent use of docstrings in the code. Not only will this help to generate useful documentation later, but it is also really handy for anybody working with the code to understand what the code is doing. A docstring is a string literal that occurs as the first statement in
a module, function, class, or method definition. It's more than just a comment, because it will become an attribute of the object itself. When I look at a function or a class init I ask 3 main questions:

What does this code do?

What are the inputs to it?

What will I get in return?

These questions can be broken down into the docstring very easily:

def func(foo):
"""This function translates foo into bar
The input is a foo string
the output is a bar string
"""

Those three lines answer the question pretty well, and if we were just going to look at the code and not try to generate html/pdf/whatever documentation we could be done. Instead lets try to give it a little more structure, structure that sphinx will appreciate:

A human reading this docstring is still going to know what's going on pretty well, and sphinx is going to read it even better:

The python source files themselves service as input to the documentation creation tool. Just go about the business of writing code and keep the docstrings flowing and updated with changes and very useful documentation can be produced.

How does one generate the documentation though? How does one use sphinx? I'm glad you asked! Sphinx has a utility that can help get started -- sphinx-apidoc. First make a docs/ subdir of the project (or whatever you want to call it). This is where some sphinx control files will go, although the rendered output doesn't necessarily have to go there. For my example software pyendeavor I have a single python package, pyendeavor, located in the src/pyendeavor directory. To get started with sphinx, I would issue the command:

$ sphinx-apidoc -A "Jesse Keating" -F -o docs src/

The F causes a full setup to happen, the -o docs tells sphinx to direct it's output to the newly created docs directory, and the src/ tells sphinx to look in src/ for my modules.

This just tells sphinx to read the module files and generate content for module members, undocumented members, and to follow inheritance. These are all just commands that sphinx understands, but you don't really have to.

There is a conf.py file that will need some attention. Sphinx will need to know how to import the code, so a system path entry to where the code can be found needs to be added. There is a helpful comment near the top, just clear the hash and update the path:

A browser can be used to view index.html and all the linked docs. How awesome! Useful documentation without having to do much more than just use docstrings in the code (which should be done anyway).

Now that docs can be generated, they should be put somewhere useful. That's where the github part of this post comes in. A lot of projects are posted on github and I've started using it for more of mine too. One nice feature is a way to create a webspace for a project by pushing content to a 'gh-pages' branch of a project. These following steps will help setup a repo to have a place to publish the html content of sphinx. They are based on the directions I found here, however instead of using a directory outside our project space we're going to make use of a git workdir so that we never have to leave the project directory to get things done.

First lets create a directory to hold our new branch from the top level of our source.

(More information about git-new-workdir can be found here but essentially it is a way to create a subdirectory that can be checked out to a new branch, but all the git content will be linked. A git fetch in the topdir of the clone will also update the git content in the workdir path. No need for multiple pulls.)

Now we have to prepare the workdir for sphinx content. To do this we need to create an empty gh-pages branch within the html directory:

$ cd gh-pages/html
$ git checkout --orphan gh-pages

Initially there will be a copy of the source tree in the html directory that can be blown away with:

$ git rm -rf .

Back in the docs/ directory a change needs to be made to the Makefile to tell it to output content to where we want it, the gh-pages/html/ directory. Look for:

BUILDDIR = _build

and change it to

BUILDDIR = ../gh-pages/html

Now from the docs/ directory, run make html again. This time you'll notice that the output goes to ../gh-pages/html/

Switching back to that directory the files can be added and committed with:

After upwards to 10 minutes later the pages site for the repo can be visited, like mine for pyendeavor. Github also has a feature for README files in the base of your repo, supporting/rendering markdown and reStructuredText. This README.rst file can also be included in your pages output with a simple tweak to the index.rst file in docs/. The ..include directive will tell sphinx to include the content from the README.rst file when generating html output:

The more and more I see posts like this, where GitHub offers yet another complex-to-use feature with various strings attached (e.g., 10 minutes to see results? Really?), the more I'm convinced Fossil is the right way to go for most projects these days.

I have one question though. On my local machine, the html renders nicely with Sphinx's default style. But on my github.io page, it renders with apparently no style sheet at all. I know the _static/default.css is in the repo, so it should be there. But it doesn't seem to find it correctly. Do you know if there is something special I need to do to get it to find the style sheet?