Jussi Pakkanen's development bloghttp://voices.canonical.com/jussi.pakkanen
Observations on development and related issuesFri, 13 Feb 2015 14:21:40 +0000en-UShourly1http://wordpress.org/?v=3.9.3Speeding up embedded development with CCache transplantshttp://voices.canonical.com/jussi.pakkanen/2015/02/13/speeding-up-embedded-development-with-ccache-transplants/
http://voices.canonical.com/jussi.pakkanen/2015/02/13/speeding-up-embedded-development-with-ccache-transplants/#commentsFri, 13 Feb 2015 14:21:40 +0000http://voices.canonical.com/jussi.pakkanen/?p=241Continue reading →]]>There are two different schools of thought when it comes to embedded development such as working on Ubuntu phone. The first group feels that compiling on the device is better and the other group prefers doing cross-builds. Both have their advantages and disadvantages.

One of the biggest disadvantages for compiling on-device is the fact that a reflash wipes all your state and you need to compile your project from scratch again. This is unfortunate if a fresh build takes 20 minutes (which is not that uncommon).

The question then becomes whether you can avoid this. It turns out that you can by doing a ccache transplant. Here’s how to do it.

]]>http://voices.canonical.com/jussi.pakkanen/2014/10/30/bug-finding-tools/feed/0Simple thread safety with pimpl’d objectshttp://voices.canonical.com/jussi.pakkanen/2014/07/24/simple-thread-safety-with-pimpld-objects/
http://voices.canonical.com/jussi.pakkanen/2014/07/24/simple-thread-safety-with-pimpld-objects/#commentsThu, 24 Jul 2014 13:27:15 +0000http://voices.canonical.com/jussi.pakkanen/?p=232Continue reading →]]>A use case that pops up every now and then is to have a self-contained object that needs to be accessed from multiple threads. The problem appears when the object, as part of its usual things calls its own methods. This leads to tricky locking operations, a need to use a recursive mutex or something else that is nonoptimal.

Another common approach is to use the pimpl idiom, which hides the contents of an object inside a hidden private object. There are ample details on the internet, but the basic setup of a pimpl’d class is the following. First of all we have the class header:

The main idea to realize is that Foo::Private can never call functions of Foo. Thus if we can isolate the locking bits inside Foo, the functionality inside Foo::Private becomes automatically thread safe. The way to accomplish this is simple. First you add a (public) std::mutex m to Foo::Private. Then you just change the functions of Foo to look like this:

Foo::Private can pretend that it is single-threaded which usually makes implementation a lot easier

The main drawback of this approach is that the locking is coarse, which may be a problem when squeezing out ultimate performance. But usually you don’t need that.

]]>http://voices.canonical.com/jussi.pakkanen/2014/07/24/simple-thread-safety-with-pimpld-objects/feed/0The two ways of doing somethinghttp://voices.canonical.com/jussi.pakkanen/2014/07/22/the-two-ways-of-doing-something/
http://voices.canonical.com/jussi.pakkanen/2014/07/22/the-two-ways-of-doing-something/#commentsTue, 22 Jul 2014 12:58:07 +0000http://voices.canonical.com/jussi.pakkanen/?p=230Continue reading →]]>There are usually two different ways of doing something. The first is the correct way. The second is the easy way.

As an example of this, let’s look at using the functionality of C++ standard library. The correct way is to use the fully qualified name, such as std::vector or std::chrono::milliseconds. The easy way is to have using std; and then just using the class names directly.

The first way is the “correct” one as it prevents symbol clashes and for a bunch of other good reasons. The latter leads to all sorts of problems and for this reason many style guides etc prohibit its use.

But there is a catch. Software is written by humans and humans have a peculiar tendency.

They will always do the easy thing.

There is no possible way for you to prevent them from doing that, apart from standing behind their back and watching every letter they type.

Any sort of system that relies, in any way, on the fact that people will do the right thing rather than the easy thing are doomed to fail from the start. They. Will. Not. Work. And they can’t be made to work. Trying to force it to work leads only to massive shouting and bad blood.

What does this mean to you, the software developer?

It means that the only way your application/library/tool/whatever is going to succeed is that correct thing to do must also be the simplest thing to do. That is the only way to make people do the right thing consistently.

]]>http://voices.canonical.com/jussi.pakkanen/2014/07/22/the-two-ways-of-doing-something/feed/0Code review only works for psychopaths and sworn enemieshttp://voices.canonical.com/jussi.pakkanen/2014/04/30/code-review-only-works-for-psychopaths-and-sworn-enemies/
http://voices.canonical.com/jussi.pakkanen/2014/04/30/code-review-only-works-for-psychopaths-and-sworn-enemies/#commentsWed, 30 Apr 2014 15:26:00 +0000http://voices.canonical.com/jussi.pakkanen/?p=227Continue reading →]]>Code review is generally acknowledged to be one of the major tools in modern software development. The reasons for this are simple, it spreads knowledge of the code around the team, obvious bugs and design flaws are spotted early, which makes everyone happy and so on.

But yet our code bases are full of horrible flaws, glaring security holes, unoptimizable algorithms and everything that drives a grown man to sob uncontrollably. These are the sorts of things code review was designed to stop and prevent, so why are they there. Let’s examine this with a thought experiment.

Suppose you are working on a team. Your team member Joe has been given the task of implementing new functionality. For simplicity’s sake let us assume that the functionality is adding together two integers. Then off Joe goes and returns after a few days (possibly weeks) with something.

And I really mean Something.

Instead of coding the addition function, he has created an entire new framework for arbitrary arithmetic operations. The reasoning for this is that it is “more general” because it can represent any mathematical operation (only addition is implemented, though, and trying to use any other operation fails silently with corrupt data). The core is implemented in a multithreaded async callback spaghetti hell that only has a data race on 93% of the time (the remaining 7% covers the one existing test case).

There is only one possible code review for this kind of an achievement.

Review result: Rejected
Comments: If there was a programmer's equivalent to chemical
castration, it would already have been administered to you.

In the ideal world that would be it. The real world has certain impurities as far as this thing goes. The first thing to note is that you have to keep working with whoever wrote the code for an unforeseeable amount of time. Aggravating your coworkers consistenly is not a very nice thing to do and gives you the unfortunate label of “assh*ole”. Things get even worse if Joe is in any way your boss, because critizising his code may get you on the fast track to the basement or possibly the unemployment line. The plain fact, however, is that this piece of code must never be merged. It can’t be fixed by review comments. All of it must be thrown away and replaced with something sane.

At this point office politics enter the fray. Most corporations have deadlines to meet and products to ship. Should you try to block the entry of this code (which implements a Feature, no less, or at least a fraction of one) makes you the bad guy. The code that Joe has written is an expense and if there is one thing organisations do not want to hear it is the fact that they have just wasted a ton of effort. The Code is There and it Must Be Used this Instant! Expect to hear comments of the following kind:

Why are you being so negative?

The Product Vision requires addition of two numbers. Why are you working against the Vision?

Do you want to be the guy that single-handedly destroyed the entire product?

This piece of code adds a functionality we did not have before. It is imperative that we get it in now (the product is expected to ship in one year from now)!

There is no time to rewrite Joe’s work so we must merge this (even though reimplementing just the functionality would take less effort than even just fixing the obvious bugs)

This onslaught continues until eventually you give in, do the “team decision”, accept the merge and drink yourself unconscious fully aware of the fact that you have to fix all these bugs once someone starts using them (you are not allowed to rewrite it, you must fix the existing code, for that is Law). For some reason or another Joe seems to have magically been transferred somewhere else to work his magic.

For this simple reason code review does not work very well in most offices. If you only ever get comments about how to format your braces, this may be affecting you. In contrast code reviews work quite well in other circumstances. The first one of them is the Linux kernel.

The code that gets into the kernel is being watched over by lots of people. More importantly it is being watched over by people who don’t work for you, your company or their subsidiaries. Linus Torvalds does not care one iota about your company’s quarterly sales goals, launch dates or corporate goals. The only thing he cares about is whether your code is any good. If it is not, it won’t get merged and there is nothing you can do about it. There is no middle manager you can appeal to or HR you can usurp on someone. Unless you have proven yourself, the code reviewers will treat you like an enemy. Anything you do will be scrutinised, analysed, dissected and even outright rejected. This intercorporate fire wall is good because it ensures that terrible code is not merged (sometimes poor code falls through the cracks, though, but such is life). On the other hand this sort of thing causes massive flame wars every now and then.

This does not work in corporate environments, though, for the reasons listed. One way to make it work is to have a master code reviewer who does not care about what other people might think. Someone who can summarily reject awful code without a lengthy “let’s see if we can make it better” discussion. Someone who, when the sales people come to him demanding something to be done half-assed, can tell them to buzz of. Someone who does not care about hurting people’s feelings.

In other words, a psychopath.

Like most things in life, having a psychopath in charge of your code has some downsides. Most of them flow from the fact that psychopaths are usually not very nice to work with.Also, one of the things that is worse than not having code review is having a psychopath master code reviewer that is incompetent or otherwise deluded. Unfortunately most psychopaths are of the latter kind.

So there you have it: the path to high quality code is paved with psychopaths and sworn enemies.

]]>http://voices.canonical.com/jussi.pakkanen/2014/04/30/code-review-only-works-for-psychopaths-and-sworn-enemies/feed/0What can you achieve with a single threadhttp://voices.canonical.com/jussi.pakkanen/2014/04/28/what-can-you-achieve-with-a-single-thread/
http://voices.canonical.com/jussi.pakkanen/2014/04/28/what-can-you-achieve-with-a-single-thread/#commentsMon, 28 Apr 2014 11:03:01 +0000http://voices.canonical.com/jussi.pakkanen/?p=225Continue reading →]]>Threads are a bit like fetishes: some people can’t get enough of them and other people just can’t see what the point is. This leads to eternal battles between “we need the power” and “this is too complex”. These have a tendency to never end well.

One inescapable fact about multithreaded and asynchronous programming is that it is hard. A rough estimate says that a multithreaded solution is between ten and 1000 times harder to design, write, debug and maintain than a single threaded one. Clearly, this should not be done without heavy duty performance needs. But how much is that?

Let’s do an experiment to find out. Let’s create a simple C++ network echo server the source code of which can be downloaded here. It can serve an arbitrary amount of clients but it uses only one thread to do so. The implementation uses a simple epoll loop over the open connections.

For our test we use 10 clients that do 10 000 queries each. To reduce the effects of network latency, the clients run on the same machine. The test hardware is a Nexus 4 running the latest Ubuntu phone.

The test finishes in 11 seconds, which means that a single threaded server can serve roughly 10 000 requests a second using basic ARM hardware. It should be noted that because the clients run on the same machine, they are stealing CPU time from the server. The service rates would be bigger if the server process got its own processor. It would also be bigger if compiler optimizations had been enabled but who needs those, anyway.

The end result of all this is that unless you need massive amounts of queries per second or your backend is incredibly slow, multithreading probably won’t do you much good and you’ll be much better of doing everything single-threaded. You’ll spend a lot less time in a debugger and will be generally happier as well.

Even if you need these, multithreading might still not be the way to go. There are other ways of parallelization, such as using multiple processes, which provides additional memory safety and error tolerance as well. This is not to say threads are bad. They are a wonderful tool for many different use cases. You should just be aware the some times the best way to use threads is not to use them at all.

Actually, make that “most times”.

]]>http://voices.canonical.com/jussi.pakkanen/2014/04/28/what-can-you-achieve-with-a-single-thread/feed/0Complexity, where does it come from?http://voices.canonical.com/jussi.pakkanen/2014/02/28/complexity-where-does-it-come-from/
http://voices.canonical.com/jussi.pakkanen/2014/02/28/complexity-where-does-it-come-from/#commentsFri, 28 Feb 2014 12:25:38 +0000http://voices.canonical.com/jussi.pakkanen/?p=220Continue reading →]]>People often wonder why even the simplest of things seem to take long to implement. Often this is accompanied by uttering the phrase made famous by Jeremy Clarkson: how hard can it be.

Well let’s find out. As an example let’s look into a very simple case of creating a shared library that grabs a screen shot from a video file. The problem description is simplicity itself: open the file with GStreamer, seek to a random location and grab the pixels from the buffer. All in all, ten lines of code, should take a few hours to implement including unit tests.

Right?

Well, no. The very first problem is selecting a proper screenshot location. It can’t be in the latter half of the video, for instance. The simple reason for this is that it may then contain spoilers and the mere task of displaying the image might ruin the video file for viewers. So let’s instead select some suitable point, like 2/7:ths of the way in the video clip.

But in order to do that you need to first determine the length of the clip. Fortunately GStreamer provides functionality for this. Less fortunately some codec/muxer/platform/whatever combinations do not implement it. So now we have the problem of trying to determine a proper clip location for a file whose duration we don’t know. In order to save time and effort let’s just grab the screen shot at ten seconds in these cases.

The question now becomes what happens if the clip is less than ten seconds long? Then GStreamer would (probably) seek to the end of the file and grab a screenshot there. Videos often end in black so this might lead to black thumbnails every now and then. Come to think of it, that 2/7:th location might accidentally land on a fade so it might be all black, too. What we need is an image analyzer that detects whether the chosen frame is “interesting” or not.

This rabbit hole goes down quite deep so let’s not go there and instead focus on the other part of the problem.

There are mutually incompatible versions of GStreamer currently in use: 0.10 and 1.0. These two can not be in the same process at the same time due interesting technical issues. No matter which we pick, some client application might be using the other one. So we can’t actually link against GStreamer but instead we need to factor this functionality out to a separate executable. We also need to change the system’s global security profile so that every app is allowed to execute this binary.

Having all this functionality we can just fork/exec the binary and wait for it to finish, right?

In theory yes, but multimedia codecs are tricky beasts, especially hardware accelerated ones on mobile platforms. They have a tendency to freeze at any time. So we need to write functionality that spawns the process, monitors its progress and then kills it if it is not making progress.

A question we have not asked is how does the helper process provide its output to the library? The simple solution is to write the image to a file in the file system. But the question then becomes where should it go? Different applications have different security policies and can access different parts of the file system, so we need a system state parser for that. Or we can do something fancier such as creating a socket pair connection between the library and the client executable and have the client push the results through that. Which means that process spawning just got more complicated and you need to define the serialization protocol for this ad-hoc network transfer.

I could go on but I think the point has been made abundantly clear.

]]>http://voices.canonical.com/jussi.pakkanen/2014/02/28/complexity-where-does-it-come-from/feed/0But can we make it faster?http://voices.canonical.com/jussi.pakkanen/2013/11/21/but-can-we-make-it-faster/
http://voices.canonical.com/jussi.pakkanen/2013/11/21/but-can-we-make-it-faster/#commentsThu, 21 Nov 2013 13:21:34 +0000http://voices.canonical.com/jussi.pakkanen/?p=214Continue reading →]]>A common step in a software developer’s life is building packages. This happens both directly on you own machine and remotely when waiting for the CI server to test your merge requests.

As an example, let’s look at the libcolumbus package. It is a common small-to-medium sized C++ project with a couple of dependencies. Compiling the source takes around 10 seconds, whereas building the corresponding package takes around three minutes. All things considered this seems like a tolerable delay.

But can we make it faster?

The first step in any optimization task is measurement. To do this we simulated a package builder by building the source code in a chroot. It turns out that configuring the source takes one second, compiling it takes around 12 seconds and installing build dependencies takes 2m 29s. These tests were run on an Intel i7 with 16GB of RAM and an SSD disk. We used CMake’s Make backend with 4 parallel processes.

Clearly, reducing the last part brings the biggest benefits. One simple approach is to store a copy of the chroot after dependencies are installed but before package building has started. This is a one-liner:

sudo btrfs subvolume snapshot -r chroot depped-chroot

Now we can do anything with the chroot and we can always return back by deleting it and restoring the snapshot. Here we use -r so the backed up snapshot is read-only. This way we don’t accidentally change it.

With this setup, prepping the chroot is, effectively, a zero time operation. Thus we have cut down total build time from 162 seconds to 13, which is a 12-fold performance improvement.

But can we make it faster?

After this fix the longest single step is the compilation. One of the most efficient ways of cutting down compile times is CCache, so let’s use that. For greater separation of concerns, let’s put the CCache repository on its own subvolume.

The latter command gave an error about incorrect ioctls. The same effect can be achieved with bind mounts, though.

When doing this the compile time drops to 0.6 seconds. This means that we can compile projects over 100 times faster.

But can we make it faster?

At this point all individual steps take a second or so. Optimizing them further would yield negligible performance improvements. In actual package builds there are other steps that can’t be easily optimized, such as running the unit test suite, running Lintian, gathering and verifying the package and so on.

If we look a bit deeper we find that these are all, effectively, single process operations. (Some build systems, such as Meson, will run unit tests in parallel. They are in the minority, though.) This means that package builders are running processes which consume only one CPU most of the time. According to usually reliable sources package builders are almost always configured to work on only one package at a time.

Having a 24 core monster builder run single threaded executables consecutively does not make much sense. Fortunately this task parallelizes trivially: just build several packages at the same time. Since we could achieve 100 times better performance for a single build and we can run 24 of them at the same time, we find that with a bit of effort we can achieve the same results 2400 times faster. This is roughly equivalent to doing the job of an entire data center on one desktop machine.

The small print

The numbers on this page are slightly optimistic. However the main reduction in performance achieved with chroot snapshotting still stands.

In reality this approach would require some tuning, as an example you would not want to build LibreOffice with -j 1. Keeping the snapshotted chroots up to date requires some smartness, but these are all solvable engineering problems.

]]>http://voices.canonical.com/jussi.pakkanen/2013/11/21/but-can-we-make-it-faster/feed/0Forward declaration: when can you get away with it?http://voices.canonical.com/jussi.pakkanen/2013/11/04/forward-declaration-when-can-you-get-away-with-it/
http://voices.canonical.com/jussi.pakkanen/2013/11/04/forward-declaration-when-can-you-get-away-with-it/#commentsMon, 04 Nov 2013 13:24:41 +0000http://voices.canonical.com/jussi.pakkanen/?p=209Continue reading →]]>One of the main ways of reducing code complexity (and thus compile times) in C/C++ is forward declaration. The most basic form of it is this:

class Foo;

This tells the compiler that there will be a class called Foo but it does not specify it in more detail. With this declaration you can’t deal with Foo objects themselves but you can form pointers and references to them.

This makes sense because you need to know the binary layout of Bar in order to pass it properly to and from a method. Thus a forward declaration is not enough, you must include the full header, otherwise you can’t use the methods of Foo.

But what if some class does not use either of the methods that deal with Bars? What if it only calls method something? It would still need to parse all of Bar (and everything it #includes) even though it never uses Bar objects. This seems inefficient.

It turns out that including Bar.h is not necessary, and you can instead do this:

You can define functions taking or returning full objects with forward declarations just fine. The catch is that those users of Foo that use the Bar methods need to include Bar.h themselves. Correspondingly those that do not deal with Bar objects themselves do not need to include Bar.hh ever, even indirectly. If you ever find out that they do, it is proof that your #includes are not minimal. Fixing these include chains will make your source files more isolated and decrease compile times, sometimes dramatically.

You only need to #include the full definition of Bar if you need:

to use its services (constructors, methods, constants, etc)

to know its memory layout

In practice the latter means that you need to either call or implement a function that takes a Bar object rather than a pointer or reference to it.

For other uses a forward declaration is sufficient.

Post scriptum

The discussion above holds even if Foo and Bar are templates, but making template classes as clean can be a lot harder and may in some instances be impossible. You should still try to minimize header includes as much as possible.

]]>http://voices.canonical.com/jussi.pakkanen/2013/11/04/forward-declaration-when-can-you-get-away-with-it/feed/0C++ has become a scripting languagehttp://voices.canonical.com/jussi.pakkanen/2013/10/15/c-has-become-a-scripting-language/
http://voices.canonical.com/jussi.pakkanen/2013/10/15/c-has-become-a-scripting-language/#commentsTue, 15 Oct 2013 12:43:26 +0000http://voices.canonical.com/jussi.pakkanen/?p=205Continue reading →]]>With the release of C++11 something quite extraordinary has happened. Its focus on usable libraries, value types and other niceties has turned C++, conceptually, into a scripting language.

This seems like a weird statement to make, so let’s define exactly what we mean by that. Scripting languages differ from classical compiled languages such as C in the following ways:

no need to manually manage memory

expressive syntax, complex functionality can be implemented in just a couple of lines of code

powerful string manipulation functions

large standard library

As of C++11 all these hold true for C++. Let’s examine this with a simple example. Suppose we want to write a program that reads all lines from a file and writes them in a different file in sorted order. This is classical scripting language territory. In C++11 this code would look something like the following (ignoring error cases such as missing input arguments).

That is some tightly packed code. Ignoring include boilerplate and the like leaves us with roughly ten lines of code. If you were to do this with plain C using only its standard library merely implementing getline functionality reliably would take more lines of code. Not to mention it would be tricky to get right.

Other benefits include:

every single line of code is clear, understandable and expressive

memory leaks can not happen, could be reworked into a library function easily

smaller memory footprint due to not needing a VM

compile time with -O3 is roughly the same as Python VM startup and has to be done only once

faster than any non-JITted scripting language

Now, obviously, this won’t mean that scripting languages will disappear any time soon (you can have my Python when you pry it from my cold, dead hands). What it does do is indicate that C++ is quite usable in fields one traditionally has not expected it to be.