I have been doing quite a bit of elisp hacking lately, for some bigger project. Multiple buffers must be coordinated, as well as some external process. Although my elisp is slowly becoming more idiomatic, I find that it offers little to help with modularity, information hiding etc., so it all comes down to self-discipline.

Having an interactive environment is nice, but can also be rather confusing, i.e. how to ensure that the function that's being called is the one I want, not some old version that still lives on in the netherworld.

Some of this is clearly the impedance of a brain wired for C; but still, some of the glue for modularity that e.g. Guile delivers would be very welcome.

Somehow, I found this three-year-old Coding
Horror article about the FizBuzz
programming test; it's a very simple programming
problem, yet it turned out
that a majority of the people applying for a programming job
were not able to
solve it.

Also interesting is that many people in the comments-section
still got subtly
wrong solutions – by not carefully reading the
requirements.

I suppose many of the job-applicants who could not solve
FizBuzz still got
jobs in that field, somewhere. What does that mean? Would
they just be very
slow, that is, could they still solve it after many
iterations, debugging and
what not? Or is there a sufficiently large segment of
programming jobs that
put such limited emphasis on algorithms that they can get
away with it?

Yesterday, I found that, unfortunately,
advogato.org does not work anymore
– it's last version was from 2004 or so, and it
expects to find browser
cookies in some text file. However, times have changed, and
these days, those
cookies are stored in an SQLite-database.

Anyway, it's actually not too hard to publish by
hand. I am using
org-mode in emacs, which has some light-weight
markup syntax, as I discussed
here. I can simply type things there, and when I am
done, I run
org-export-as-html. That will also put the
result in my 'kill-ring' (i.e.,
paste buffer), so I can paste in the advogato web form, et
voilà!.

LaTeX (and to a lesser extend, HTML) is sometimes promoted
over WYSIWYG word
processors because it allegedly focuses on the
contents and allows you to
describe semantics, not looks. That is only partly
true, as anyone who wants
to insert e.g., a table in a document can attest to: in
MS-Word or Writer,
it's much easier to concentrate on table contents
than it is in
LaTeX. Programs like LyX alleviate this to some
extent, but for me it's a
bit too much on the WYSIWYG side.

So, I used to endure the pain of raw LaTeX (and HTML)
editing, because it
still we was the least painful way to get the what I
want. For LaTeX that
is, book-quality rendering, with all the magic of maths,
indices, numbering,
source code blobs and so on. For HTML, it would be
standards-compliant 'clean'
blobs that I can still understand, and can paste into e.g.,
a blog.

However, with org-mode the pain is mostly gone!
I can export to both HTML
and LaTeX and it really allows me to focus only on the
contents of what I want
to write (as said, org-mode-markup is really
lightweight); still it allows
for a lot of massaging of the output if needed. I can
imagine that this
'output massage' would be quite hard if I hadn't already
spent quite a bit of
time using 'raw' HTML and LaTeX - anyway, for me it works
very well. Coming
back to adding tables in documents: this is easy in
org-mode, and I can
even use the tables as little spreadsheets, with all the
power of GNU Calc
formulae.

I joined Advogato more than 10 years ago(!), and my last
entries here are
from ages ago. I am planning to do some more posting here;
main reason for
that is that I just installed the advogato.el
for emacs, which
hopefully allows for painless publishing from within emacs,
something which
unfortunately cannot be said for the interaction with e.g.
Blogger.

In the last ten years, I've written a lot of
software, both for money
and for fun, using C++, Perl, Python, Ruby, Emacs-Lisp, and
good-old C. For
some reason, most of the code has involved C and Emacs, I am
somehow drawn to
projects where that particular knowledge is useful.

All those things
that were once a bit mysterious, such as autotools, parsers,
Lisp and all
those obscure tools like objdump,
strace,
procmail,… have entered my comfort zone.
Editor-wise, I am
still using GNU/Emacs, as I've been done since the mid-90s,
with maybe a month
or so somewhere in 2000 where I went cold-turkey to vim. That
did not last; I
do like vim, but I am much more productive with emacs, and
it's taking over
more and more of my computing universe.

I went as
far as starting a
blog with emacs tips at the end of 2008: Emacs-Fu, where I
try to share useful
thing about the One True Editor. There are many little gems,
but some of them
are well hidden, such that I still often find some nifty
trick that has been
in emacs for twenty years, and I never discovered. My
emacs-lisp is still a
bit embryonic; good enough to glue things together, but not
really fluent. I
am brushing up my skills in this area though, and re-reading
SICP.

I
am also still a
happy Gnome-user. I have learned a lot from reading the code
from so many
talented hackers. I think Gnome 3 offers some great
opportunities, and I just
got my first patch accepted into gnome-shell
(it fixes the
12h/24h clock bug). But it must be said that with my
workflow revolving around
emacs, the desktop environment is less important.

Implementing GTK+-widgets and other GObjects in C requires quite a bit of boilerplate code - that's hardly news. One obvious way to deal with that is to use a different programming language. If you're into C++, I can recommend the excellent GtkMM C++-bindings for GTK+. Programming GtkMM feels very natural and follows the C++-idioms; it's easy to integrate with std:: and friends. Also, it's LGPL and pure C++.

Another option is Vala. If you haven't heard about it, Vala is a programming language in its own right, with similarities to C#, but specifically designed for use with GObject. One very interesting thing about Vala is that it compiles to plain C-with-GObjects (as an intermediate step). Thus, you write in Vala, with no 'libvala' needed, with code which is just as fast as handwritten C. Vala also supports many other libraries, which can make them easier to use, compared with plain C. Using Vala, writing GObject/GTK+-based applications becomes a lot easier. Vala Overview.

Finally, my truly low-tech solution is spuug. Spuug is a little GObject code-generator that I wrote in 2006 to learn some Ruby, and to save myself some time. And boy, has it saved me some time! Now, finally a new version. The credit for this go mostly to Viktor Nagy (many thanks!), who submitted some patches.

spuug usage is quite easy; for example:

$ spuug --class=FunkyFooBar --namespace=Funky --parent=GtkWidget

will generate funky-foobar.c and funky-foobar.h with 150 lines of boilerplate code, as a starting point for some FunkyFooBar-widget.

Of course, spuug works well for Maemo-code, and I know of a number of programs that are using it.

There are of course some disadvantages to using code-generators. But the advantage of spuug is that it doesn't require you to learn any new language. Also, after using it, you're not depending on spuug - the output is perfectly readable C code.

So, after three years I finally made a new version ttb, my teletekst viewer, which is especially interesting for Dutch-speakers and linguisticly-inclined people studying West-Germanic languages. The new version brings user-help and some cosmetic updates.

The program is listed as the 'official' client for Linux by the NOS (state television), and I'm getting quite some mails -- but interestingly, not one single bug in three years. To be honest, there is a bug remaining: there is too much bad news in the news section. I am working on that one, but it might take a while.

I am also preparing a Maemo-version. Interestingly, I had a version running on an 770 in early 2005 at LinuxTag, but I never got to packaging it. Anyway, the work has to wait until after my trip to a friend's wedding in the Eternal City of Rome, where I'll be flying.

As if all of that were not enough, I started a blog with tips for emacs-users; the idea is to have frequent small posts that show one useful trick: Emacs-Fu. Let's see if I succeed.

Sometimes, I like to use mathematical notation in webpages, either to impress people or simply for decoration. One way to do that is MathML, which is an XML-based markup language for mathematical notation. However, many browsers do not support MathML at all, or require you to download plugins and/or special fonts. Another problem with MathML is that XML is a really inconvenient format to edit by hand. Practically, you'll need some kind of formula editor.

tex vs mathml

As an old-schooler, I prefer to use the math-notation invented for TeX instead - it is short and sweet and powerful. Donald Knuth invented the whole TeX language because he was unhappy with the quality of typesetting of mathematic, and it is widely used in both computer science and mathematics. Anyway, I'm sure many people remember the 'abc-formula' to calculate the roots of a quadratic function :

In the TeX-sublanguage for math, one can specify the formula as follows:

-b \pm \sqrt{b^2 - 4ac} \over 2a

The corresponding MathML is no fewer than 20 lines; see the example in Wikipedia. Clearly, MathML is not designed for hand-editing. There are are some editors available, but hand-editing TeX is much faster (at least for me); and, as mentioned, even if you have the MathML, many browser will not show it correctly.

So what I'd like is a way to use (i) TeX-notation and (ii) have it display correctly in any (graphical) browser. One way to that is to use LaTeX to process and render the formulae, and convert that to a PNG-image. In 2004, I wrote a little tool called WebTeX to create small images from TeX-formulae. It was nothing too fancy; you enter a <img ...>-element with some decription of some formula, and the little tool would turn it into an image, using LaTeX and ImageMagick. I don't maintain that old tool anymore - it was time for something new. Therefore...

texdrive

This weekend, I wrote a new maths-in-webpages tool using emacs-lisp. The emacs-integration makes adding formulae to html-pages really easy. For example, if I want to include the famous Bayes' Theorem, I simply type:

Now, all we need to do is texdrive-generate-images-from-html, and the corresponding image will be generated:

So, for immediate download: texdrive.el. It works pretty well for me; please let me know if you have any problems or are missing something. In some cases, the formulae are not as sharp as they could be; I hope I'll be able to improve it with some tweaking. Anyway, it's nice to see how one can solve problems by glueing together some existing open-source tools. Standing on the shoulders of giants...

Note that some wiki-software, notably Wikipedia's MediaWiki, use a similar approach.

Most of us are not Donald Knuth, and indeed need to test our software. That is even true for my hobby projects - when I offer software for use by others, it's a matter of craftmanship to deliver the best software possible. It's very hard to foresee all the possible environments (architecture, compiler, library version, ...) where my software might be run. But at least, I can minimize the number of programming errors by testing things as much as possible.

The trouble with testing, however, is that it is dead boring. I hate doing boring things -- life is just too short. So, I want to do my testing in the least boring way possible -- I'd like to be able to simply run:

$ make test

and have that go through all my test cases, and report any failures. The idea is that if it is so easy to run tests, you might actually do so, and make sure your software is working according to plan. When doing a release, it is so easy to forget something really obvious, for which you get embarrasing bug reports... Running some automated tests gives some peace of mind when doing a release.

gtest

Since 2.16, the GLib library offers a unit-testing framework called GTest (note, this is not to be confused with Google Test, sometimes also called GTest). GTest is not much different from, say, check, but it's part of GLib and integrates nicely with it. I have started to use it for mu, and I am quite happy with it. Here, I will not go into the details of actually writing test cases, but talk about how to integrate GTest with your code. For the best results, you'd probably want to integrate it with your build system. I am using autotools.

The overall setup is that for all my directories with code, there is a subdirectory tests/ which contains the test code. Those test cases are unit-tests, which test one function or a couple of them combined. Now, of course it's a lot easier when your code is written in such a way that makes this easy[1]. In addition to the per-directery tests/, there is also a top-level tests/, which tests the whole software workflow. In the case of mu, this means that the tests will index some test messages, fill a database with that, and then run some test queries against this database. When all of that works correctly, I am quite confident that my software is not totally broken.

autotools

Now, let's discuss how you can integrate GTest with your code; this is inspired by the way GTK+ does it these days. First, here is gtest.mk, a file in the top of my source tree, that I include in all Makefile.ams that require GTest support:

With this, I make sure that my code also works with older versions of GLib; the unit tests will only work with newer versions, of course. With this, you'll have a symbol MU_HAVE_GTEST that you can use in your Makefile.am; for example, in index/Makefile.am, I have:

include $(top_srcdir)/gtest.mk

SUBDIRS= .

if MU_HAVE_GTESTSUBDIRS += testsendif[....]

As you can see, it includes gtest.mk mentioned above, and (conditionally) add tests/ as a subdirectory to visit.The unit tests are in this subdirectory. Note that by explicitly setting SUBDIRS to '.' first, we ensure that first we build the code in index, before we go to tests/.

unit tests

Below is a simple example unit test program; it only uses a small subset of GTest. You can further organize your test cases (see GTestSuite and GTestCase) and see Fixtures, which setup the testing environment. I don't use those, but they might be useful for others. In general, I am only using a small subset; check out the GTest-documentation to find out more. Anyway, here are some simple test cases:

With that, all we need to do is fix the bug and test again... rinse-lather-repeat. Using GTest, it's really easy to run test cases. In general I try to keep my software pass the tests at the end of every programming session. Now, this does not work when I do big changes, but after stabilizing things again, I make sure all test cases pass, both old and new.

parting thoughts

One thing still missing from GTest is some way to see the code coverage, i.e. to see which part of the code are covered by tests. I think it should be possible to do this using gcov, but it'd be nice if someone automated that a bit. Another issue is that for effective use, you will need something like the setup described here. One can hardly expect someone new to Unix-development to figure this out by themselves... but of course, we cannot really blame GTest for that.

Hopefully my setup helps a bit to setup non-boring testing (even though it might be a bit boring in itself...). There are real-life examples of this in both mu and GTK+. And finally, if you find any inaccuracies, please let me know -- there are no unit tests for blog entries to save me from mistakes...

[1] Now, a discussion of how to write easily testable functions deserves its own blog entry, but there are some general things to keep in mind. Keep your functions short, limit the number of parameters, avoid global variables, limit side-effects to only a few functions, etc. In other words, use the lessons learnt from functional programming languages. And as a nice side-effect (ha!), such functions tend to be much less error-prone in the first place.

I released mu 0.4 (my e-mail indexing/search tool), and as always, I try to learn things from it.

One of the main problems with writing correct and maintainable software is complexity. I am not talking about computational (big-O) complexity here - I am talking about code complexity, as a subjective measure for readability. Some people write very elegant and readable code, while others write code that is very hard to understand. It would be nice to have some objective measure.

cyclomatic complexity

While certainly not perfect, I found McCabe's Cyclomatic Complexity a useful tool for this. Thomas J. McCabe describes his method in his classic paper from 1976 as a metric of the flow graph of the program. I won't go into the details of the exact calculation here (it's straightforward though, read the paper) -- the bottom line is that the higher the complexity, the harder the code is to understand and to test. Indeed, it's not just about readability for humans: the complexity has a direct relation with the amount of code paths, and consequently, the testability of the function. If complexity is high, you'll have an unholy number of code paths, which are impossible to fully test, and software quality will suffer.

Making sure your code is not too complex (according to this measure) means simply assuring that there are not too many code-paths (really: decisions); ie. split your code in to short functions that do one thing, and do it well.

pmccabe

Now, how do we get the numbers to identify overly complex functions? Thankfully, we don't need to calculate anything by hand. There is the pccmcabe-package (debian/ubuntu) which does the work for us, for example:

recommendation

What should be the maximum recommended cyclomatic complexity for a function is debatable - but many coding guidelines suggest a value of 10. If you go much beyond that, it's easy to see that the function gets very complex.

As always we should use guidelines with care. I can imagine some inherently complex algorithms that you nevertheless wouldn't like to split precisely *because* you want to keep things as understandable as possible. But those will be rare exceptions.

practical

Obviously, limiting cyclomatic complexity is not sufficient to create maintainable software; there are still many other opportunities for making your code hard to understand. Still, it does not hurt to at least keep this one aspect under control, especially as experience suggests there is a high correlation between function complexity and error density. Fortunately, it's usually not too hard to reduce the complexity: split big functions (carefully!) into smaller ones; logical units that do one thing, and do one thing well.

I made sure the new mu follows the <=10-rule. I found some extra targets for Makefiles quite useful for that:

cc10: @pmccabe `find -name '*.c'` | sort -nr | awk '($$1 > 10)'

cc20: @pmccabe `find -name '*.c'` | sort -nr | awk '($$1 > 20)'

Now, I can simply type make cc10 or make cc20 to get all the functions that violate the rule CC <= 10, resp CC <= 20. Mu version 0.3 still contained a handful of function that broke the rule, but I have now simplified them - splitting big functions up. In my projects, I have usually followed the rule to some extent, intuitively, but I definitely could have written better code if I'd pay attention to the number before. There is of course a risk in changing working code just because of 'some number'; but in the long run I think it will really pay off.

Today just a short tip: if you are using emacs and git, I can recommend magit.

Magit is a git-mode for emacs, which makes using git convenient and easy to use. Magit was created by running mate Marius. It's under heavy development, but I have been a happy user for while. There is even a user manual, which you actually don't need very much, as things work very much as you would expect.