How two volunteers built the Raspberry Pi’s operating system

Raspbian makes the Pi go faster—and supplanted Fedora as the #1 OS.

When you buy a Raspberry Pi, the $35 computer doesn't come with an operating system. Loading your operating system of choice onto an SD card and then booting the Pi turns out to be pretty easy. But where do Pi-compatible operating systems come from?

With the Raspberry Pi having just turned one year old, we decided to find out how Raspbian—the officially recommended Pi operating system—came into being. The project required 60-hour work weeks, a home-built cluster of ARM computers, and the rebuilding of 19,000 Linux software packages. And it was all accomplished by two volunteers.

Like the Raspberry Pi itself, an unexpected success story

Although there are numerous operating systems for the Pi, the Raspberry Pi Foundation recommends one for the general populace. When the Pi was born a year ago, the recommended operating system was a version of Red Hat's Fedora tailored to the computer's ARM processor. But within a few months, Fedora fell out of favor on the Pi and was replaced by Raspbian. It's a version of Debian painstakingly rebuilt for the Raspberry Pi by two volunteers named Mike Thompson and Peter Green.

It began because Thompson was ready for the next big thing. Thompson had been CTO and co-founder of Atomz, a hosted search service acquired by WebSideStory in 2005. Thompson got a "not unhealthy slice" of the $45 million or so that company sold for, and he eventually stopped working to spend a few years hanging out with his wife and kids.

A year ago, he was finally ready to get back into the technology field. Robotics is one of Thompson's primary interests, and the Pi looked like a great platform for that. It's cheap and small enough to be easily embedded into a variety of systems. But there wasn't an operating system fully optimized for the Pi's floating point unit, which is important in robotics projects and various other types of math-intensive applications.

"When I first learned of the Raspberry Pi, I was disappointed that none of the Linux distributions available for it would make use of the very fast floating point hardware that was present on the Pi," Thompson told me. "As a long-time Debian user, I was like, 'I'd rather see Debian [than Fedora]' and I wanted to see the floating point enabled because I have this longer-term interest in getting robotics working with these inexpensive boxes."

Debian had added floating point support for the ARMv7 processor, but not the ARMv6 processor used in the Pi. Debian "didn't see a product like the Raspberry Pi coming on the horizon. Even though ARMv6 in Pi has a pretty capable floating point unit, they didn't support it," Thompson said. Thus, "all the thousands or tens of thousands of software packages they built wouldn't support the Raspberry Pi."

Just as a graphics processing unit handles graphics operations very quickly, a "floating point unit performs all the math very quickly," Thompson said. "It's a peripheral that not every computer has, but when it does you really want to take advantage of it." The floating point unit is part of the PI's Broadcom BCM2835 system-on-a-chip.

If you don't take advantage of floating point capability in the hardware, a lot of mathematical operations must be performed in software, lengthening the amount of time it takes for the Pi to do useful work. This is important for robots, where complicated math operations are used for processing data from cameras and sensors, and for controlling motors with precision and speed, Thompson said. It's also important for multimedia processing, encoding music, physics simulations, or "pretty much anything that is numerically intensive."

A fruitful partnership

The path forward was clear to Thompson: rebuild Debian to run on the Raspberry Pi. This required shifting 19,000 packages from Debian to Raspbian—a monumental task.

Thompson didn't go it alone, though. He started a Raspberry Pi forum thread to talk to other people interested in bringing Debian to the Raspberry Pi. The thread caught the interest of Peter Green, a Debian developer and PhD student in the UK who went by the handle "plugwash" in the forums.

Green was a rare mix. He not only had the expertise to co-lead the project with Thompson, but he was also crazy enough to do it.

"I felt I was the only one of the people who were talking in the threads about the project that became Raspbian with enough Debian knowledge to make the rebuild a success," Green told me. "I'm sure there are other people within the Debian project who could have done it if they were interested and crazy enough, and there were many people in the Debian project who provided us with bits of help along the way."

When Thompson and Green got started, the Pi wasn't actually available yet. Even if it had been, it wouldn't be capable of rebuilding Debian in a reasonable amount of time. Thus, Thompson set up a cluster of eight Freescale iMX53 Quick Start Boards, each with 1GB of memory, a 1GHz ARMv7 processor, and (most importantly) SATA hard drives. One of the main reasons the Pi would be unsuitable for this type of work is the bottleneck introduced by USB storage, Thompson said. The Freescale boards were able to build Raspbian anywhere from four to 12 times faster than the Raspberry Pi would have.

Thompson spent nearly $3,000 on the cluster but recouped those costs through donations to the project. In addition to the single-board ARM systems, there is a separate Linux PC to serve as the repository for Raspbian builds. The repository server retrieves source packages from Debian repositories, schedules jobs for the Freescale systems, and collects the binary packages once the builds are complete. "The built packages are all staged in the repository and then synced with the external repository we maintain at www.raspbian.org where users pull their packages from," Thompson explained.

At first, Thompson used an ARM-based HP Media Vault MV5150 for the repository, but later upgraded to an Intel-based system when more horsepower was needed. Although each Freescale board has its own hard drive used for building packages, the main storage duties are handled by the repository server's 500GB drive, which is about two-thirds full now. Here's what the cluster looked like when he was first setting it up:

Enlarge/ The cluster would eventually include eight single-board ARM computers.

Mike Thompson

Thompson and Green weren't starting from scratch. Debian is already one of the most widely used Linux-based operating systems, obviously. Its ARMv7 port provided a solid foundation.

"We greatly leveraged the work done previously by the Debian Project to support floating point hardware on ARMv7 devices," Thompson said. "Other than the effort to actually build [19,000] software packages, 95 percent of the work done to port the software to support the Raspberry Pi was already done by Debian."

That's not to say the Thompson/Green work in bringing Debian to ARMv6 and its floating point unit was trivial, however. Green explained:

In Debian the compilers have default settings built into them. These default settings set things like the CPU family, the minimum CPU requirements, and the ABI [application binary interface] in use. Most packages leave those settings alone. We modified the compiler packages to reduce the default settings to ARMv6

For the majority of source packages rebuilding them with a compiler that uses our new defaults is enough to make them build ARMv6 code.

A lot of the original work was done manually, but Green eventually created auto-builder software to automate much of the process. Those auto-builders (which run in a chroot environment on top of Debian) are still running to this day, pulling updated packages from Debian repositories and automatically re-compiling them for Raspbian.

Those auto-builders also detect when there's a problem that prevents the package from being automatically rebuilt. Green explained in an e-mail:

There are several things that could cause a package to come out still containing ARMv7 code.

We hacked together a script (using readelf) to check for ARMv7 code in packages so we could tell which packages... needed further attention to eliminate it. ARMv7 contamination can come from several sources.

Static libraries [are one example]: these are taken from the build environment and incorporated into binaries. So if the static library contains ARMv7 code then it will go into the resulting binary. This was a big pain during the early days of the project as we had to use packages from Debian armhf [Debian's ARM floating point port] to break dependency cycles, but those packages could contain static libraries with ARMv7 code in and sometimes it took several attempts at rebuilding and installing different packages to figure out which library the contamination was coming from, and replace it with an uncontaminated version. However, this isn't really a problem anymore since having got past the bootstrapping problem we can nearly always build packages in a clean Raspbian environment.

Raspbian became available for download on April 20, 2012 in limited form, containing only about five percent of the Debian packages. "Enough to build a root filesystem that would boot to a command line," Thompson said.

Can someone more knowledgeable in operating system development explain why it's necessary and/or desirable to build on the architecture you're building for? Couldn't you set up the compiler to compile to a different architecture than it's running on? I understand that testing would be essential, but surely the hassle would make up for the tremendous speed increase rather than compiling on ARM boards.

Great work by these guys. I love it when people with lots of resources contribute to the common good.

Does anyone know if VMware View Client will run on Raspbian? I know there is a Linux version available in the Ubuntu store, but if it can run on a Pi, that could make for some very inexpensive thin clients in a VMware virtual desktop environment.

Can someone more knowledgeable in operating system development explain why it's necessary and/or desirable to build on the architecture you're building for? Couldn't you set up the compiler to compile to a different architecture than it's running on? I understand that testing would be essential, but surely the hassle would make up for the tremendous speed increase rather than compiling on ARM boards.

It's not just Debian that doesn't like being cross-compiled. Autoconf helps with getting cross-compiling working, but not all projects use it or use it in a way that breaks cross-compiling. Gentoo tries to support compiling the distribution with a cross-compiler, but even then it has issues.

The best work around I've used has been to use distcc with cross-compilers installed on the fast distributed nodes. That will use the native machine to partition up the work and send the heavy stuff to the cheaper fast x86 machines. You would still have a local storage bottle neck, but you wouldn't need an expensive cluster to speed it up. A few commodity x86 machines would be all you needed.

Can someone more knowledgeable in operating system development explain why it's necessary and/or desirable to build on the architecture you're building for? Couldn't you set up the compiler to compile to a different architecture than it's running on? I understand that testing would be essential, but surely the hassle would make up for the tremendous speed increase rather than compiling on ARM boards.

Couldn't you set up the compiler to compile to a different architecture than it's running on?

I seriously think the real reason for the build setup is that the guy wanted to set up a tiny cluster, and this was a great use-case to justify the project, even if a couple of x86 boxes have been more than adequate in reality.

The issue with Debian not being cross compiled isn't so much technical, but rather the philosophy of the distribution. It's my understanding that Debian has a guiding principle where it is desired that the platform that runs Debian be able to build Debian. Because Debian packages tend not to be constructed in a way that considers cross compilation issues, when one attempts to cross-compile packages it's not uncommon to run across issues that require modification to the package.

To make the Raspbian port manageable by two people, Peter and I attempted as much as possible to stay within the path already established by the Debian armhf team a year earlier. The issues between Armv6 (Raspbian armhf) and Armv7 (Debian armhf) were big enough to make the project difficult. To throw in cross compilation issues into the mix probably would likely have prevented Raspbian from ever being created. When you're dealing with 10,000's of packages, anything that makes the work harder is avoided.

Despite the philosophy of Debian to build packages on the target hardware, for practical matters Peter and I decided to use Armv7 devices rather than Raspberry Pi's themselves. Primarily, the Raspberry Pi wasn't generally available until towards the end of the porting effort. But also, the build process tends to stress hardware and you really want to have as much RAM available as possible to minimize swapping and use efficient SATA drive connections rather than USB. The Raspberry Pi's are terrific little devices, but build servers they are not.

Can someone more knowledgeable in operating system development explain why it's necessary and/or desirable to build on the architecture you're building for? Couldn't you set up the compiler to compile to a different architecture than it's running on? I understand that testing would be essential, but surely the hassle would make up for the tremendous speed increase rather than compiling on ARM boards.

I just dug through my notes and found a quote from Green that may explain this and I added it in.

"The big issue at the moment is finding good, affordable auto-builder hardware," Green said. "Debian is not designed to be cross-built. So we have to build natively on ARM hardware."

Does that help?

If anything, the fact that Debian does not support cross-compilation should be a huge negative against using it both as the preferred platform and especially as a development environment or target.

Essentially, that places every developer in the silly spot of having at least 2 PI's in order to get much done. This gets even worse when you consider that storage speed is typically the biggest slow down to compiling with larger programs / lots of libraries (devs limited by the speed of an SD card - WTF!). Further, why should developers be restricted to the limitations of PI, a machine specifically targeted at the "just enough" computing power paradigm, for compilation, when most will have something with 2-8 cores @ 2.5+ GHz and 4-32 GB of RAM sitting impotently next to it.

I really like these "behind the scenes" pieces from ArsTechnica. They paint IT and technology in general in a more human fashion, it's like a breath of fresh air among all the security-induced paranoia and hyperconected ADD world we live in.

To throw in cross compilation issues into the mix probably would likely have prevented Raspbian from ever being created. When you're dealing with 10,000's of packages, anything that makes the work harder is avoided.

Despite the philosophy of Debian to build packages on the target hardware, for practical matters Peter and I decided to use Armv7 devices rather than Raspberry Pi's themselves.

So, your logic boils down to this:

We can't cross-compile, for philosophical and technical reasons. So let's cross compile, but on a platform that is sufficiently close that we can pretend we aren't. Of course, this means we leak in static libs from the build system, which makes us have to fix things up by hand, among other unexpected problems.

I call idiot logic on that.

Exactly how much experience do you have in cross-compiling for ARM targets? From my personal experience in doing so, cross-compiling packages that had little to no dependencies was trivial, but once there were a lot of dependencies, cross-compiling flat out didn't work. When you add in the time trying to resolve these issues with a multitude of packages, it becomes more time efficient and less frustrating to just build the packages on the native CPU architecture.

As a linux tinker-er I found Raspbian to be excellent. The apt-get system works great and while it takes awhile to load packages, I've not had any trouble. I haven't had to monkey with repositories, things just load.

My 256Mb Ram "B" Pi is running as an airprint server serving up my creaky old Samsung printer using the SpliX project. Just getting a Samsung laser to work under Linux is tough and these guys made it "apt-get" simple.

Armv7 is 100% backwards compatible with the Armv6 instruction set, so it's not really cross compiling as much as it is compiling on a system that offers a superset of capabilities of our target system. Similar to building Intel 32-bit binaries on an Intel 64-bit system which can easily execute the native build tools and binaries being produced -- that's not really cross compiling in the way that building ARM binaries on an Intel 64-bit system would be.

The earliest part of Raspbian development was proving out that we could build packages without leakage from the host environment. Chroot largely takes care of this for us, but we needed to develop tools that would detect instances when a package would be built with armv7 binaries -- which, of course, wouldn't execute on the armv6 based Raspberry Pi. Such contamination did not occur from host static libraries, but rather because a number of packages had armv7 target CPU specific settings within them. When we found such packages, we needed to modify the package source to utilize armv6 target CPU specific settings. Finding such packages and making the needed changes formed a large percentage of the porting effort.

Finally, I believe Peter and I took the most efficient route we could to creating an optimized port of Debian to the Raspberry Pi. Of course, others might consider other routes to be more productive in theory, but it was our time that was being spent so we got to choose the path we followed. I'll let the results speak for themselves.

Exactly how much experience do you have in cross-compiling for ARM targets? From my personal experience in doing so, cross-compiling packages that had little to no dependencies was trivial, but once there were a lot of dependencies, cross-compiling flat out didn't work. When you add in the time trying to resolve these issues with a multitude of packages, it becomes more time efficient and less frustrating to just build the packages on the native CPU architecture.

Doom5, I can't speak for XolotlLoki, but I often utilize cross compilation tools for ARM. Primarily when working with the Linux kernel as my Intel based desktop system can cross compile and link a kernel in a few minutes where it might take 30 minutes or much longer on a decent ARM based system. I'm not adverse to cross compilation, but as you indicate, those who advocate porting an entire Debian distribution using cross compilation tools are speaking from limited experience. With enough tweaking it may be possible, but not at a practical level.

Essentially, that places every developer in the silly spot of having at least 2 PI's in order to get much done. This gets even worse when you consider that storage speed is typically the biggest slow down to compiling with larger programs / lots of libraries (devs limited by the speed of an SD card - WTF!). Further, why should developers be restricted to the limitations of PI, a machine specifically targeted at the "just enough" computing power paradigm, for compilation, when most will have something with 2-8 cores @ 2.5+ GHz and 4-32 GB of RAM sitting impotently next to it.

I believe you are misunderstanding some of the issues. Debian (and Raspbian) is perfect happy running cross compiled applications on the Raspberry Pi. Many people in fact do this, either using cross compilation tools or running Raspbian under an emulator such as QEMU on a much faster Desktop system. However, building a handful of custom applications for personal use under Debian is a MUCH different matter than building an entire Debian port.

The issue for Peter and myself was with the Debian package building system that we needed to work with to build the 10,000's of packages that went into Raspbian. By and large, the tools we needed to use are built without consideration of using cross compiler tools as Debian's own build systems all operate on native hardware. Furthermore, the Debian source packages own build scripts generally don't account for cross compilation either so the work is both in the tools and the source packages themselves. So from our perspective, the easiest path was to create a cluster of native ARM systems to build the software rather than tackle changing complex, poorly documented tools and hand tweaking 1000's of packages which would confound cross compilation tools.

If you are a person interested in building modestly complex applications for the Pi that would take a long time to build, don't let our choice of not using cross-compilation tools for Raspbian deter you. Our choice was based on a number of factors that would have little, if any, impact someone elses use of cross compilation tools for Raspbian or any of the other Raspberry Pi OS distributions.

I'm a little confused. Couldn't you just modify the make files to compile for Arm6?

Well, as the article tried to explain, apparently yes, they could do basically that a lot of the time. But some software gets more intimate with the hardware it's on, usually when it needs to run really fast, and those packages were the ones that took a lot of effort and attention. C is pretty portable, by design, but any project that uses inline ARM assembler would likely be troublesome, because ARM7 is Debian's official target, and the Pi is running ARM6+floating point, which is a weird combination.

Oh, and mpthompson, you might really want to check out if you can adapt distcc to do what you need. A single fast desktop machine would probably outperform your whole ARM cluster... I would think you could probably come close to doubling your compilation speed by adding a $500ish x86 box to your existing cluster.

I've never tried cross-compiling with it, but when I was building custom packages for the very weak and slow Soekris x86 hardware, some years ago, distcc saved me a ton of time, and I think the Pi is probably about the same speed as those creaky old net4501s were.

Oh, and mpthompson, you might really want to check out if you can adapt distcc to do what you need. A single fast desktop machine would probably outperform your whole ARM cluster... I would think you could probably come close to doubling your compilation speed by adding a $500ish x86 box to your existing cluster.

Unless that fast x86 box is ARM based, you're going to run into the whole nightmare of cross-compiling debian, which is why people use clusters of ARM build machines for this kind of thing.

FWIW, I recently got a raspberry PI running Raspbian and its been fantastic. I needed to hack up some equipment over I2C using libraries available for debian. Took about 30 minutes from flashing the SD card image to getting the hardware talking to the PI. Saved us a ton of time at work and got some important project off my desk.

Oh, and mpthompson, you might really want to check out if you can adapt distcc to do what you need. A single fast desktop machine would probably outperform your whole ARM cluster... I would think you could probably come close to doubling your compilation speed by adding a $500ish x86 box to your existing cluster.

I've never tried cross-compiling with it, but when I was building custom packages for the very weak and slow Soekris x86 hardware, some years ago, distcc saved me a ton of time, and I think the Pi is probably about the same speed as those creaky old net4501s were.

I've had the same positive experience with distcc for building packages in NetBSD on the slow hardware in my collection (68040-based Amiga, VAXstation 4000/90, etc.). The trick is to put links to distcc in your PATH in front of the native cc/gcc/c++/g++ compilers, perhaps via a shell script that calls the real GCC by explicit path (making sure that path matches the path to the cross-compiler on your distcc host), e.g.:

#!/bin/shdistcc /path/to/cross/gcc "$@"

The advantage of this approach is that you don't need to worry about installing any header files or libraries on the distcc host, because the C preprocessor and all linking is handled by the caller. The disadvantage is that distcc running in this mode won't accelerate C preprocessing or linking. :-)

Unless that fast x86 box is ARM based, you're going to run into the whole nightmare of cross-compiling debian, which is why people use clusters of ARM build machines for this kind of thing.

Incorrect. As I mentioned earlier, distcc works for cross-compiling just fine. I use it on Gentoo all the time. It calls arm-gcc on the remote machine just as if it was calling gcc locally. All the annoying cross-compiler stuff is abstracted because all the preprocessing and linking still occurs on the native target machine that is doing the build.

Nice article. A lot of times with community software projects you take for granted that you can just grab a file off a repository and not think about where it came from. In the background there is often an individual or small group who became obsessed enough to make it happen.

I was looking at the comments on the story, and people seem to beinterested in why you guys compiled on ARM rather than something else. Onecommenter asks: "I'm kinda of curious as to what aspects of Debian preventit from being cross compiled. Or is it a limitation of gcc?"

Cross-compiling is quite a complicated buisness, some things need to be done with a cross-tool while others need to be done with a native tool on the host. Some code may need to be built more than once if it needs to be both used during the build and placed in both the final package. Both the upstream build system and the distribution packaging need to be designed to carefully think about and maintain this distinction. Worse in some cases if the distiction is not correctly maintained or a cross tool is buggy then rather than causign outright failure the build may result in a subtuly broken package.

Debian has always been compiled natively, while there is some cross-building support in the tools having working cross-building support is not and has never been a requirement for packages in debian and afacit until very recently noone has really tried to cross-build an entire debian based distro. The debian/ubuntu arm64 porters are currently doing it because 64-bit arm hardware doesn't exist yet and all they have is excruciatingly slow emualtors but the results of their work to improve cross-building support are unlikely to end up in debian until jessie and even then I believe the intention is that cross-building will only be used for bootstrapping and experimental systems with the final debian arm64 port being built natively after real hardware is released.

malor wrote:

Oh, and mpthompson, you might really want to check out if you can adapt distcc to do what you need.

The distcc suggestion is an interesting one but it's still another potential thing to go wrong and afaict distcc doesn't help with linking. With the C++ monsters like webkit it's the linking that is the real killer not the compiling.

XolotlLoki wrote:

We can't cross-compile, for philosophical and technical reasons. So let's cross compile, but on a platform that is sufficiently close that we can pretend we aren't.Of course, this means we leak in static libs from the build system,

What we are doing is comparable to the way debian builds 486 compatible binaries on modern x86 hardware. I don't consider that to be "cross-compiling".

The leaking in of static libs only happened during bootstrapping when we were using a mixture of debian armhf and raspbian packages in the build environment to break dependency loops. Yes it did require quite a bit of manual work but I still belive the work required was far less than would have been needed to make all the packages needed for a build environment cross-buildable.

Now that we have raspbian bootstrapped we use build chroots that only contain raspbian packages, so static libraries cannot "leak in from the build environment" (though they can of course still be built by package build systems that choose to override the compiler defaults we set)

Finally I would like to comment on the articles mention of setting up the autobuilders of the the autobuilders. we didn't create the system from scratch, mike took an old version of the debian tools (after getting lost trying to get the recent versions to work) and hacked it until it worked with modern perl. While we discussed the autobuilder setup a lot pretty much all the actual work of setting them up was done by mike.

The advantage of this approach is that you don't need to worry about installing any header files or libraries on the distcc host, because the C preprocessor and all linking is handled by the caller. The disadvantage is that distcc running in this mode won't accelerate C preprocessing or linking. :-)

Unfortunately, compilation is only part of the issue. One of the other big issues is that linking very large packages such as Ice Weasel, Open Office and various other large packages requires gobs of memory -- on the order of 2GB to 3GB or more. Our build systems have only 1 GB of RAM so builds would dramatically slow down when the linker started to hit swap space. Such package builds could take 2 to 3 days because of the slowdown in swap.