Sex, software, politics, and firearms. Life's simple pleasures…

Main menu

Post navigation

GPSD and Code Excellence

There’s a wonderfully tongue-in-cheek project called the The Alliance for Code Excellence (“Building a better tomorrow — one line of code at a time.”) that sells Bad Code Offset certificates. They fund open source projects to produce good code that will, in theory, offset all the bad code out there and mitigate the environmental harm it does. They’ve asked software authors to write essays on how their projects drive out bad code, offering $500 dollar prizes.

I sat down to write an essay about GPSD in the same vein of high drollery as the Alliance’s site, then realized that GPSD actually has a serious case to make. We really do drive out bad code, in both direct and indirect ways, and we supply examples of good practice for emulation.

GPSD is a service daemon and device multiplexer that is the open-source world’s basic piece of infrastructure for communicating with GPS receivers, and it’s everywhere Linux is – running on PCs, on embedded systems, and on both OpenMoko and the entire line of Maemo cellphones. We’re directly relied on by dozens of applications, including pyGPS, Kismet, GPSdrive, gpeGPS, position, roadmap, roadnav, navit, viking, and gaia. If you’re doing anything with GPSes on an open-source operating system, GPSD is your indispensible tool.

GPSD’s quality is up to the standard required when you’re that ubiquitous. In March 2007 a Coverity scan turned up only two errors in over 22,000 LLOC. In more detail: it flagged only 4 potential problems, and two of those were false positives. This is three orders of magnitude cleaner than typical commercial software, and about half the defect density of the Linux kernel itself at the time.

We get, on average, about one defect report every 90 days, and there are just five on our tracker as I write. Given what we know about the size of our userbase, our low rate of incoming bug reports tells us we’ve maintained a similar level of code quality since the Coverity audit. This hasn’t happened by accident. Good practice matters, and I’ll describe how we systematize ours in a bit.

First, though, I want to explain how we drive out bad code. The reporting protocols used by GPS sensors are a hideous mess — the kind of mess that tends to nucleate layers of bad code around it as programmers with insufficient domain knowledge try to compensate for the deficiencies at application level and wind up snarling themselves up in ever-nastier hairballs. Part of what GPSD does is firewall all this stuff away; we know everything about the mess so you don’t have to, and we present clean data on a well-known port in a well-documented wire format. We then provide client-side service libraries that will unpack GPS reports into native C, C++, Python, or Perl structures so you don’t even have to know about our wire format.

If our client applications had to deal with the back-end mess of poorly-specified NMEA 0183 and seventeen different vendor-specific binary protocols, I for dead certain guarantee that the total community bug load from GPS-related problems would go up by an order of magnitude. And I’d bet more than any of the $500 prizes the Alliance is offering on the bug count going up by two orders of magnitude.

We also try to drive out bad code indirectly in the same way we keep our defect level low — by providing an example of good practice that extends all the way up from our development habits to the zero-configuration design of the gpsd daemon.

The most important thing we do to ensure code quality is maintain a rigorous test suite. Our “make testregress” runs about fifty-five regression and unit tests. Forty-four of those exercise the daemon’s logic for recognizing and processing device reports; the remaining ten to a dozen exercise the rest of the code, all the way out to the application service libraries.

We actively collect device logs and metadata from users through this form, which we use to update a device-capability database and our collection of test logs. Almost every time a user fills out one of these, the number of devices for which we can guarantee good performance in the future goes up. Currently it’s 87 devices from 39 vendors.

We also routinely audit our code with splint. Not many people do this, because splint is very finicky and a pain in the ass to use and requires you to litter your code with cryptic annotations. But I believe accepting that discipline is the main reason the Coverity scan went so well. After hacking through the underbrush of false positives, I generally find that splint heads off about two potentially serious bugs per release cycle, averaging out to about one every 17 weeks.

We have a policy of not using C where a scripting language will do. Python is what we mostly use, but not the actual point here (though I do like it a lot and use it in preference to other scripting languages). The point is to get away from the fertile source of bugs that is memory-management in a fixed-extent language. The core daemon is written in C because it has to be; a significant part of our customer base is embedded and SBC developers who need to run lean and mean. But our test tools and some of our test clients are Python, and we’re gradually working to retire as much of the C as possible from outside the daemon in favor of scripting languages.

No account of good practice can leave out the human element. In the best open-source tradition, GPSD combines the benefits of a small, highly capable core group (three developers: Chris Kuethe, Gary Miller, and myself) with about a half dozen other semi-regular contributors and a halo of casual contributors numbering in the hundreds. GPSD teaches by example about the kinds of specialization that produce good code. Here is what the core group looks like…

Chris Kuethe is our GPS domain expert. He knows the devices, the mathematics of geodisy, and where all the bodies are buried in this application area to a nearly insane level of detail. I am the systems architect — I neither match Chris’s depth of domain knowledge nor want to, but it’s my been my role to give the GPSD codebase a strong modular architecture, design and implement our test suites and tools, design and implement our wire protocols, and push autoconfiguration as far as it could go. Gary Miller is more of a generalist who owns some particularly tricky areas of the core code and device drivers, and is extremely good at detecting bad smells in other code; he backstops Chris and myself admirably.

If this sounds like a description of a classic “surgical team” organization straight out of Fred Brooks, that’s because it is. Open source changes a lot of things, and the outer circle of contributors brings huge value to the GPSD project — but some things about software development never change, and the power of teams that include a domain expert, a master architect, and a bogon detector is one of them. GPSD reinforces a lesson that is old but never stale; if you want the kind of good code that improves the whole software ecology around it, that kind of human constellation is a great place to start.

Finally, we drive out a lot of potentially bad code by eliminating configuration options. The gpsd daemon is designed to autobaud and recognize GPS or AIS reporting packets on any serial or USB device that it’s handed, no questions asked. And normally, at least on Linux systems, those devices are handed to it by udev when a hotplug event fires. Though arranging this took a lot of work, there are many fewer combinations of code paths in gpsd to test (and to accumulate bugs) than there would be if the daemon had the usual semi-infinite array of knobs, switches, and config files. Because client applications don’t have to give users any access to those nonexistent knobs and switches, thousands of lines of application code have never had to be written either; the simplifying effects of autoconfiguration ripple through dozens of application-development groups and all the way up the software stack to the end-user.

The RFP for these essays asked software authors to explain what they’d do with a $500 prize. That’s easy; we’d use it to buy test hardware. Because GPSes are wacky, idiosyncratic devices with poorly documented interfaces, testing on real hardware is vital to fully learn their quirks.

UPDATE: I’ve added two more GPS regression tests in the few hours since I write this, and we’ve shipped release 2.90. The new JSON-based protocol I’ve blogged about before is now deployed.

Google+

25 thoughts on “GPSD and Code Excellence”

Well, it’s not totally zero configuration: gpsd accepts command-line parameters. You’re right to say that making things auto-configure as much as possible cuts down on bugs because, obviously, you have fewer and fewer code execution paths.

OTOH, autoconfiguration isn’t a panacea. With GPSD, different devices and (perhaps) different device settings can still cause many combinations of code paths to test. One of the big problems in writing a program designed to handle a device is that you need a wide array of devices with which to test. This means you need to build a user community around your code fast unless you happen to own a whole bunch of those devices already. If you fail to do so, your project (like mine) will die because you have no one to test the code. :( (If someone wants to take over the project, drop me a mail, but I think it is increasingly unnecessary as other tools have supplanted the need for it)

ESR says: In normal operation gpsd doesn’t require the user to supply any of these. udev starts it with a control-port option and then stuffs devicenames down that port. The other command-line options are mainly for debugging use.

>While this may have reduced the chance of memory management bugs from appearing, wouldnâ€™t it also incur additional complexity in the code from having to juggle around fixed-size buffers?

No. GPSD’s situation is unusual in that all it ever has to handle is small pieces of data that are very perishable – the useful lifetime of a GPS report is typically one second, until the next one comes in. So we get away with a small handful of fixed-length buffers per device.

Knowing that we can’t have memory leaks or any of the other nasty sorts of bugs associated with malloc is rather nice. I’d have been prepared to invest more effort into avoiding it than I’ve actually had to. Mind you, this strategy won’t work for most programs; gpsd’s memory usage pattern is exceptional.

I have been considering purchasing some bad code offsets to help offset several messes that I made which unfortunately became popular. I learned about them a week ago, when someone gave some to a friend as a practical joke.

I heard that they were going to fund some projects, but I instantly (and incorrectly) concluded that such projects would (likely) be the ‘pop’ ones backed by venture capital to begin with.

Mind you, this strategy wonâ€™t work for most programs; gpsdâ€™s memory usage pattern is exceptional.

Just to clarify: not only are the packets limited in their lifetime, but you also have a known quantity of packets that are coming in at more or less fixed time intervals. If the device updates once a second and sends only one packet, you know you’re only going to have to grab a packet once a second ahead of time. You don’t have to deal the possibility of many different packet streams coming in and out all at once, like, say an HTTP server or something similar.

Referencing this back to your posts on the horribleness of the code created by the global warming scientists, a small core group, of various levels of capability, is how the current mess as laid bare by the CRU e-mails and documents was arrived at.

A ‘reboot’ of climate study with a small group -with an agenda to push- will soon be led right back to where things were a couple weeks ago. Climate study needs a larger and highly competent AND OPEN MINDED core group. It also needs total openness of *all* the data they collect and every bit of software code they write.

The entire process must be completely open so everything can be cross-checked by anyone. It still won’t prevent agenda driven “flappers” (see Gulliver’s Travels) from acting as gatekeepers to filter what their pet politicians see and hear, but a fully open process will make it more difficult, especially in presenting to the public the one sided “science is settled” fraud.

Something amusing to do when you’re otherwise un-occupied. Google (or Yahoo or Ask etc.) for climategate, global warming, and sex and note how many hits each term gets.

There is something I’d like to ask about the reasons for your choice of programming language for the project. There is a fairly big community out there in Programming.Reddit obsessing about fairly little-known programming languages – several variants of LISP and Scheme, plus Haskell, OCaml, Scala, Dylan and so forth. These languages are indeed beautiful, very expressive and their very structure prevents many common kinds of bugs, and I’m fairly convinced that code quality would go up signifcantly by choosing one of those instead of C (and using them the way they were intendeded to be used). However, when I ask the community why don’t see them being used more often in production I get the usual complaints about stupid, stupid bosses calling the shots at stupid, stupid corporations.

This later is when my bullshit sensor got triggered, because in theories about politics (of which I’m kind of becoming a hobby-expert of lately) notions of the “everybody would realize how great is our idea if only they weren’t so stupid” type are an almost surefire sign that somebody is engaging in Utopian thinking, i.e. is obsessed about ideas that look beautiful on paper but have serious hidden drawbacks that prevent their widespread practical deployment, and I wondered if it might be the same with programming. Using Haskell, Scheme, Ocaml or Scala for a fairly big practical project is such a charming idea that if it doesn’t happen more often, there is probably a good practical reason why.

So, there is this project. No stupid bosses around, you know a lot about LISP and probably would enjoy writing something in Bigloo, which is a Scheme to C compiler, giving speeds comparable to C, probably you wouldn’t mind using Haskell or OCaml either, which too can be compiled to fast code or Scala which can be as fast as Java, which might be fast enough for your purposes.

What were the practical reasons you decided against them?

I have a suspicion about the most important practical reason: probably it is that if you don’t use some very popular programming language, you lose most advantages of Open Source. Out of few million people in the world who can write acceptable C, you have a few hundred casual contributors. Out of the few thousands who can write acceptable Scheme, Haskell, OCaml or Scala, how many could you find? Probably five, and even those five wouldn’t actually be much interested in GPSes, they would contribute just in order to be able to see their pet programming language uses for something popular, which means despite their very high programming skills, their lack of domain language and enthusiasm about the domain would make them contributors of a very limited use.

Is this the main reason? And if it is, then can we safely predict that if anything ever will release Haskell or Scala or OCaml or Scheme whatever from the intellectual ghetto it will probably not be typical Open Source projects of the “many eyeballs attacking a set of bugs” type?

If I were writing something like GPSD from scratch. today, I’d almost certainly do it in Python. Not that there aren’t other interesting languages out there, but the maturity of the Python support libraries for things like socket I/O is a significant advantage for a project like GPSD.

That having been said, GPSD’s application domain is exceptionally simple (and thus well suited to C) in one important respect. The small size and short lifetime of the data makes it possible to use entirely static memory allocation in the daemon, avoiding the single most fertile source of errors in any fixed-extent language.

@Shenpen: More points in favor of C: lower resource requirements compared to the dynamic languages you mention. All the libraries GPSD plugs into are also written in C (though using them from Python is equally trivial).

That being said, these days people have lots and lots of free memory and processor cycles. Each of my 64-bit dual-core Linux boxes has 4 GB of RAM and neither barely ever hit swap. The box I’m sitting at now as all of its swap free running Samba, a DNS server, a DNS cache, Apache, OpenSSH, and OpenLDAP along with Chromium, Emacs, some shells and a couple of Python interpreters and OpenOffice.org under GNOME 2.28, I don’t think another Python interpreter running a Pythonic GPSD would bother it at all.

> If I were writing something like GPSD from scratch. today, Iâ€™d almost certainly do it in Python.

*Really*?

How much storage does it cost to bring along an entire Python runtime library? Even shrunken down to just the necessary pieces (can you do that automagically with Python?), I should think the Interpreter Cost of that choice would piss off the embedded audience, no?

>I should think the Interpreter Cost of that choice would piss off the embedded audience, no?

Maybe. But:

(1) GPSD’s popularity with the embedded crowd is a happy accident. Once we realized we had it, we started tuning for those guys. But the original plan was more to aim at location-aware apps on laptops.

(2) If Symbian is to be believed, Python’s runtime size is about a megabyte. That’s about three times the size of a gpsd x86 executable, but probably tolerable even in embedded-land.

Yes, the embedded crowd thinks of scripting languages as unacceptably huge, but I think that’s ingrained cultural prejudice talking rather than technical reality.

hello sir…
i am new to GPSD and i have one doubt .. there is problem with gpsd .. when i run gpsd on my minicom then it fetches data but after sometimes it got hang and we have to reboot it .. i wanna know where in code i can fix or tell me the functions where i can check it ,, or i can just insert reboot there if it happens that is hang … so plz help me