Underestimating the ingenuity of complete fools...

Primary menu

Category Archives: Pacman

An alarming number of people have noticed, the pacman-4.2 release removed the --asroot option from makepkg. This means that you can no longer build packages as the root user. There are good reasons for this and the option was only included due to issue we had building under fakeroot (only the package() function gets fun under fakeroot these days, and there has been no issues with fakeroot in a while anyway).

Even if your PKGBUILD file is not malicious, there are good examples of when something goes wrong by accident. Remember the bumblebee bug that deleted /usr due to an extra space? Or just this week a steam bug that deletes a user home directory? Do you still want to run code as root? OK then… I am going to show you how not to!

Firstly, we need a build directory. I suggest /home/build. Putting this directory directly under /root will not work unless you want to relax its 700 permissions to allow the nobody user read/write access1. I suppose you could as you are running as root… but I will use /home/build. Create the directory and set permissions with the following:

Not that people running makepkg as root need to know what code is doing to run it… I’ll explain what is happening here. Firstly create a /home/build directory, make it owned by the nobody group and ensure that group has write permissions. Also add the sticky flag to the group permissions so all files created in that directory also are owned by the nobody group. Then we set ACLs to ensure all files and directories created in /home/build have group read/write permissions.

Now to building you package! Get you PKGBUILD in your new build directory and run makepkg as the nobody user. You can do this using su but using sudo has the advantage of being able to alias this command. Installing sudo does not create a security risk as you are running as root! You also do not need to configure anything as root will have full sudo permissions by default2. Build your package using:

sudo -u nobody makepkg

Done… I’d add “alias makepkg='sudo -u nobody makepkg” to your ~/.bashrc so you never have to type this again.

There is still a problem here. If you download and manually extract a package sourceball, or use an AUR helper such as cower to do so, the group write permissions get lost:

Doing “chmod -R g+w pacman-git/” will fix this. There is probably a way to avoid this – at least when manually extracting the tarball, but I have no interest in figuring it out. Otherwise, it is a two line function.

And if this does not satisfy you, revert that patch that removed --asroot. It should still revert cleanly.

1makepkg checks directory write permissions using the full path so fails if any parent directories are not writable. I guess this could be fixed if someone was interested.

2 Note that to have makepkg install missing dependencies and install your built package without being queried the password for the nobody user (which would be difficult to answer…), you will need to configure nobody to run sudo pacman without a password.

Both the pacman package manager and the makepkg tool for building packages verify files using PGP signatures. However, these two pieces of software do it using different keyrings. There seems to be a lot of confusion about this and misinformation is spreading at a rapid pace, so I’ll attempt to clarify it here!

Pacman Package File Signature Verification
By default, pacman is set-up to verify every package using a PGP signature. It has its own keychain for this purpose, located at /etc/pacman.d/gnupg/. This keychain is initialized during the Arch Linux install – a root key is created and the Arch Linux master keys are locally signed by the root key. The master keys sign all Arch Developer and Trusted User keys, creating an effective web-of-trust from your pacman root key to each of the packager keys allowing verification of package files.

If you want to allow the installation of package files from a non-official repository, you need to either disable signature verification (don’t do that…), or trust the packagers signing key. To do this you first need to verify their key ID, which should be well publicized. Then you import it into the pacman keyring using “pacman-key --recv-key <KEYID>” and signify that you trust the key by locally signing it with your pamcan root key by running “pacman-key --lsign <KEYID>“.

Makepkg Source File Signature Verification
When building a package, the source files are often (and should be!) signed, with a signature file available for download alongside the source file. This typically has the same name as the source file with the extension .sig or .asc.makepkg will automatically verify the signature if it is downloaded in the sources array. e.g.:

However, makepkg needs some information to verify the source signature. It will need the public PGP key of the person who signed the source file, and that key to be trusted. The difference here is that you do not trust whoever provided the source file to provide packages for your system (or at least you should not the vast majority of the time), so your user’s keyring is used. To get the key use “gpg --recv-key <KEYID>” and trust it (once suitably verified) using “gpg --lsign <KEYID>“.

If you provide a package to the AUR, it would be a lot of work for everyone to suitably verify a PGP key and locally sign it. To demonstrate that you have verified the key, you can add the following to the PKGBUILD:

Now makepkg will trust that key, even if it is not trusted in the package builder’s PGP keyring. The builder will still need to download the key, but that can be automated in their gpg.conf file.

Hopefully that clarifies the two separate types of PGP signature verification happening in pacman and makepkg and explains why they should be separate… Now can people stop recommending that the pacman keyring is imported into the user’s keyring and vice versa?

I released pacman-4.2 on the 19th of December – which is only marginally after the end of August as originally planned… We had 52 contributors provide patches to this release. Andrew takes the prize for most commits. Here are the top 10:

The real prize goes to the person who caused the first reported bug. That could have been Dave but he caught it just in time. And I mean just! I posted to IRC “any ideas for the tag message” and the response I got was “I think I broke updpkgsums“. The shame of being first is inversely proportional to your commit count. (The small typos discovered so far do not count…)

Packaging Changes
There has been a couple of useful features added to makepkg. The main ones are:

Architecture Specific Fields: The source and depends (and related fields) now can all specify architecture specific values. For example:

The source for a given architecture is used in addition to the global source. The ‘+=‘ when specifying extra sources for an architecture does nothing different than just using ‘=‘, but I use it to serve as a reminder that these are additional values. Thanks to Dave!

Templating PKGBUILDs: Many PKGBUILDs share a similar build system, making them highly redundant. This is an attempt to reduce the redundancy by providing a template system. The easiest way to describe this is using an example, so I will use a potential perl module template. We create a file /usr/share/makepkg-template/perl-module-1.0.template. In this file is the build(), check() and package() functions and any common biolerplate. As this is our current version, it is also symlinked to perl-module.template. In our PKGBUILD, we would add:

# template input; name=perl-module;

and run makepkg-template. Now look in the PKGBUILD and you will see that line is replaced with:

If we ever need to update the template, we create perl-module-2.0.template and update the symlink. Now run makepkg-template -n to update the PKGBUILD. Read “man makepkg-template” for more details. Thanks to Florian!

Incremental VCS Builds: Previously makepkg would remove its working copy of the VCS source directory before starting a new build. Now makepkg will just update the source copy (or attempt to in the case of SVN…) and build the package. This brings VCS builds in line with those using non-VCS sources. A new option -C/--clean was added to makepkg to remove the old $srcdir before building for cases where incremental builds fail. Thanks to Lukáš (and sorry it took me so long to deal with your patches)!

Source Package Information: To avoid things like the AUR attempting to parse bash to display information from a source tarball, we now provide a .SRCINFO file in an easily parseable format. Thanks to Dave!

Package Functions are Mandatory : The use of package() functions in PKGBUILD was introduced a long time ago. Now it is mandatory that a PKGBUILD has one (with the exception being metapackages that do not have a build() function either). Now that fakeroot usage is limited to the packaging step, the use of fakeroot is mandatory and building as root is disabled.

Misc. Changes: Other things of interest:

Static libraries are only removed with options=('!static') if they have a shared counterpart

Source signatures are required to be from a trusted source or listed in the validpgpkeys array. We also support kernel.org style source signing

Split packages can no longer override pkgver/pkgrel/epoch as that was a silly idea…

Pacman Changes

No we don’t have hooks… They are strongly planned for the next release.

Directory Symlink Handling: Example time! Arch Linux has a /lib -> /usr/lib symlink. Previously, if pacman was installing a package and it found files in /lib, it would follow the symlink and install it in /usr/lib. However the filelist for that package still recorded the file in /lib. This caused heaps of difficulty in conflict resolving – primarily the need to resolve every path of all package files to look for conflicts. That was a stupid idea! So now if pacman sees a /lib directory in a package, it will detect a conflict with the symlink on the filesystem. If you were using this feature to install files elsewhere, you probably need to look into what a bind mount is! Note that this change requires us to correct the local package file list for any package installed using this mis-feature, so we bumped the database version. Upgrade using pacman-db-upgrade. Thanks to Andrew!

Added an –assume-installed Option: I believe this options was invented during a perl update. Almost all compiled perl modules have a dependency on a specific perl version. So with a major perl update, all the modules need to be updated at the same time, or you can use -d to ignore dependency versions, but for all packages and not just perl. This is not a problem with the Arch repositories where all packages are updated at the same time, but if you have lots of perl modules from the AUR, you will need to remove those, update, then rebuild them. Instead you can use --assume-installed perl-5.18 and all those packages depending on perl=5.18 will not complain. Thanks to Florian!

Repository Usage Configuration: A new configuration keyword was added for repositories – Usage. It can take values Sync, Search, Install, Upgrade, All. For example, I have the [staging] and [multilb-testing] repositories in my pacman.conf with the Sync usage. That way I can look at what is in these repositories without using them for package updates. Thanks to Dave!

Mics. Changes: Other changes to pacman:

Improved dependency ordering – the dependency ordering did not go deep enough into the tree to ensure correct installation order.

A warning is printed if a directory on the filesystem has different permissions to the one being “installed” from the package.

And I have just realized that the only major change I contributed was the requiring of package() functions, which I am told means 1/3 of the AUR will not build! It feels good to be back to breaking things…

I was listening to Frostcast in the background today when I heard my name. That always makes me pay some attention. Then I heard wrong information. I don’t know why I care, but I do… so here goes the clarification.

The quote from Philip Müller at 14:35 into the podcast:

The lastest news was Allan McRae – he is a developer of pacman himself – he sent me an email to send over all the translations of Manjaro distribution does. So I forked pacman, and pacman itself has 20 translations and our branch has 44 translations of the same software so Arch Linux is asking us to be upstream and give them our translations…

OK… This is interesting. Time for some background here. When pacman-4.1 was released, we removed the broken SyncFirst option. This is needed by Manjaro Linux to run their update helper script that “fixes” the update process to remove any manual interventions. So Manjaro reverted our patch and brought back SyncFirst to pacman. That required three additional strings to be translated for their version of pacman so they also forked our translation project on Transifex.

As the Arch and Manjaro versions of these projects had started to diverge, I wrote to Phil noting that people were doing more than just translating those three additional strings, and it would be good if the translators were pointed at the Arch project so we all benefited, given the Arch project is the one the pacman developers set up.

Lets compare the status of the Arch and Manjaro translations as of 2013-09-24. There are 24 languages with complete translations in the Arch projects, and being nice and ignoring the additional three strings in the Manjaro project, they have 23. (Of those 23, only 6 actually have the additional three Manjaro strings translated). What are the differences? Manjaro has a complete Hungarian translation while Arch has complete Korean and Romanian translations. The Arch Hungarian translation is at 99%, while the Manjaro Korean and Romanian are at 21% and 62% respectively. So it is clear these languages have diverged since the split, with most of the work done in Arch.

Of the remaining languages with incomplete translations, Manjaro has 19 languages, while Arch has 15. Clearly not a total difference of 20 to 44 languages as claimed. Looking at these in more detail, 9 languages have not deviated between the two projects. The Arabic, Chinese (Taiwan), Dutch, Galician, Polish, Serbian (Latin) translations have all got additional translations in the Arch project since the split with the Manjaro project. So apart from languages that have been have had translations started in Manjaro but not in Arch, the Arch project is behind in 3 strings for the Hungarian language.

Maybe where the Arch translation project for pacman could gain is from the new languages in the Manjaro translation: Czech (Czech Republic) [99%], Bulgarian (Bulgaria) [62%], Uzbek [14%] and Danish (Denmark) [3%]. Also note that 3/4 of those languages have a sub-name there. Taking “Danish (Denmark)” as an example, there is already a “Danish” translation (language code: da) and this is adding a Denmark specialization (language code: da_DK). I might be entirely wrong here, but are there other variants of Czech, Bulgarian and Danish apart from their primary usage, or are these exactly the same and the work is just being repeated?

In summary, the translation project set up by the pacman developers is, and will remain, the upstream translation. I just approached Manjaro to send their translations our way so we would both benefit. Arch from (potentially) more translations, and it would be easier for Manjaro to merge their string translations without ending up removing several hundred perfectly good translations.

I will clarify this just because I have had several people ask me already. No, we did not remove the SyncFirst option in pacman to deliberately cause issues for Manjaro Linux. In fact, it was first discussed in Feburary 2012 and, as far as I can tell, Manjaro has only been around from late March 2012 (looking at the earliest commits in their git repository).

So lets keep the conspiracy theories to a minimum! (or at least come up with a better one…)

I have just released pacman-4.1 and packages are now in the [testing] repo. This is the first time I have made a release for any software project, so I was glad to have released a 4.1RC a few weeks back to learn everything that needed to be done.

It has been over a year since the pacman-4.0 release and there have been a large number of contributions made:

I win this time! Apart from the usual three contributors, it was great to see other people regularly helping out, both in providing and reviewing patches. A particular thanks to Andrew Gregory who helped me figure out how to fix something on several occasions and has been actively commenting on patches sent to the mailing list. His patch count also puts him in the top ten contributors of all time. In total we have 45 people with patches accepted for this release. Also a big thank you to our translators – particularly because I was learning how the system worked and may have required additional strings to be translated on a couple of occasions…

Moving on to what has changed. There have been quite a number of features added to pacman and makepkg and a couple of new helper scripts in this release.

The major feature for the release is tight integration between the package manager and systemd. After much discussion about how best to perform updates on a rolling release system, we realized that it was essential to have updates preformed with minimal other processes running. Also, the security aspects of updates mean that it is essential that these get provided as soon as possible. We felt the best way to achieve this was to perform updates on shutdown. This is achieved through a new daemon, pacmand that monitors and downloads updates in the background. When updates are found, it schedules a reboot of the system (hence the need to integrate systemd). At the moment the timing of the reboots is not configurable, but a timer will pop-up to allow you to delay it for a preset amount of time. Configuration will likely be added in pacman-4.2, when pacmanctl will be ready for general use. Until that release is made, Arch Linux will minimize the impact by performing all updates in its [testing] repository and only push updates on a yet to be decided day and time of the week. A news post will be made when that is decided.

Of course, all this makes systemd a hard dependency of pacman. We felt this was acceptable given Arch Linux has officially switched to using systemd. As this release is not tested (and unlikely to work) on systems without systemd, Arch users or other distributions using pacman will be required to make the switch to systemd if they want to continue using pacman as their package manager. The integration with system will become tighter in pacman-4.2 where we plan to use the upcoming kdbus message passing interface – through libsystemd-bus – to allow other programs to interact with pacman, making the development of alternative front-ends easier.

In terms of output, there has been improvements in a couple of areas. First colour support was added. This had been floating around for a long time, but no-one had ever spent the time to create a patchset and submit it. I think the colours for a simple update look good, although those when searching are a bit… rainbow. This can be only configured on or off at the moment. Extra informational output has been added for optdepends, providing details about whether an optdepend is installed or not and giving a warning when removing a package that is an optdepend for another. This also provides the groundwork for more complete optdepend handling in future releases.

When building packages using makepkg from this release, information about all the files in the package is stored, including permissions, modification times, sizes and checksums (md5 and sha256), etc. These can be checked using “pacman -Qkk“, excluding checksums (which requires additional support to be added to libarchive in order to read them in). Other useful features include never overwriting .pacsave files, but instead giving them a number suffix as needed. We have also polished the package signature checking, improving key importing and allowing configuration on how to validate packages installed with “pacman -U“, both using local files and from remote sources.

There are a few improvements to package building too. I have covered support for VCS packaging in makepkg previously, with bzr, git, hg and svn packages just requiring an appropriate line in the source array. Also a pkgver() function can be added to automatically update the pkgver variable in the PKGBUILD. With these VCS source lines, or any other source that is volatile, the value “SKIP” can be used in the checksum array.

An optional prepare() function can now be used in a PKGBUILD for preparation of the sources, such as patching and sed alterations. This function is run after the extraction of the sources and not run when --noextract is used, allowing operations that should only ever been run once on the sources to be skipped. Finally, a new debug option is available that will result all the debug symbols that are stripped from binary files to be stored in a separate package, which can be installed to allow easier debugging (another feature that has had patches floating around for a while).

Finally, two new helper scripts have been added to the contrib section: checkupdates and updpkgsums. The checkupdates script allows you to safely check for package updates without altering the system pacman remote databases. The updpkgsums script will perform an in place update of the checksums in a PKGBUILD, although more complex PKGBUILDs (such as those with different sources for each architecture) will not likely work…

So a long post, but this is a big release! There are enough of running the git version that it should be completely bug free, but just in case I am wrong report any issues to the bug tracker.

Edit: Yes – some of this was April Fools… (moderated comments are now restored too).

For those that are mildly adventurous, you can try the pre-release of the upcoming pacman-4.1. There are a handful of us who constantly run pacman from git so it should be fairly safe. All bugs found are to be reported to the bug tracker. (Only one issue found so far – in the rarely used pkgdelta script).

One fairly common criticism of the pacman package manager is that is very slow due to not using some sort of binary database as its backend. I found suggestions to use sqlite dating back to 2005 (although I am sure they go back further) and mailing list activity peaked around late 2007. Speed is one of pacman’s main features – and it beats the competition by a wide margin according to Linux Format – but I guess people want it even faster.

The problem is that we use a filesystem based “database” where each package has its information stored in multiple files. This means that we can get fragmentation of our “database” and the reading of all these files from the filesystem can be quite slow. Usually most of this is cached by the kernel after the first read so speed improves markedly after the first usage.

This was improved a lot in the pacman 3.5 release (March 2011). The sync databases started to be read directly from the downloaded tarball and the local database had the “desc” and “depends” files for each package merged into one file. This increased the speed of reading from the sync databases massively and was a reasonable improvement to the local database too.

So the local package “database” could be improved by reducing it to one or a few files. But every time I think about changing it, I am reminded why I like the plain text file format. I was updating a reasonably out of date computer when I had an issue with the python-pygame package being renamed to python2-pygame. All packages needing in the Arch Linux repos were rebuilt with the new dependency name, so it did not need a provides entry. But my solarwolf package from the AUR still depended on the old name:

As we have a file based database, adjusting the dependency is easy without rebuilding the package. Just open the relevant file and edit away (or use sed…)

$ vim /var/lib/pacman/local/solarwolf-1.5-5/desc

Now I can see my local database has an issue using the handy testdb tool – solarwolf depends on python2-pygame, but that is not installed.

$ testdb
missing python2-pygame dependency for solarwolf

But now I update as usual, installing python2-pygame which removes python-pygame, and my local pacman database is fully consistent.

I am sure all of this would still be possible if the database was in some other format, but it would have required more tools than a simple text editor. Of course, most people should never need to edit their local database, but I have introduced changes to it several times during pacman development and I consider being able to easily fix or revert these in the category of a “good thing”. And yes, I develop and test directly on my production system…

Of course, it is better to use a real database in performance critical situations. But pacman really does not fall into that category.

The current support from building packages from version control systems (VCS) in makepkg is not great for a number of reasons:

It relies on obscure (but documented…) variables being specified in the PKGBUILD, which actually achieve nothing in terms of downloading and updating the source as needed.

The whole VCS checkout/update mechanism needs repeated across every PKGBUILD that uses it so is a lot of unnecessary code duplication.

Building a package from a specific revision/branch/tag/… required using an altered version of this code, resulting in many non-standard work-arounds being made.

The automatic updating of the pkgver happens in what may not be an obvious way. For example, the pkgver for git PKGBUILDs is set to the build date, not the date of the last commit. Even if it was the date of the last commit, that can be far from unique. (Why not use git --describe? Because that relies on the tag being something suitable for an actual version number and many repos do not follow this.)

Even when a revision number is used for the updated pkgver, this results in different behaviour for different VCS. For example, with hg repos, you have to download/update the repo to determine the latest revision.

The updating of the pkgver is done before the makedepends are installed, so can fail if it relies on VCS tools.

The --holdver flag stopped the pkgver being updated, but the VCS repo was still updated to the latest version as usual.

You can not create a source package with the VCS sources included using --allsource

…

In fact, the issues with the current VCS implementation accounted for almost 10% of the bugs in the pacman bug tracker and there are a number more in the Arch bug tracker about how to improve the supplied prototypes for the VCS PKGBUILDs. It was clearly time for a rewrite.

An idea that had seen some discussion over the years, was to just put the VCS sources in the source array. Makes sense… right? The problem was choosing an appropriate syntax for the URLs that was consistent with what was already used and also flexible enough to handle the various possibilities of a VCS source. The format decided on is:

source=('[dir::][vcs+]url[#fragment]')

Simple! Well, it will be once I explain the parts… The url component should be obvious. The problem with it is that there is often no way to tell that is a VCS source. For example, for git repos without the git protocol enabled on the server, this will start with (e.g.) http://. To work around this, an optional vcs prefix can be added to the URL. So for git over http, you would used git+http://. This is based on the already used syntax when downloading subversion repo over ssh.

At the end of the URL is an optional #fragment. Providing information in a URL after a # character is some sort of standard that I am too lazy to provide a link for… Anyway, it allows us to specify information about what we want to check out when building. For example, I build my pacman-git package using the working branch of my git repo. To check that out, I use:

source=('git+file:///home/arch/code/pacman#branch=working')

Note the use of the git+ prefix there that allows me to check out from a local copy of my repo. The list of recognized fragments is built into makepkg and is documented in the PKGBUILD man page.

Finally, there is the optional dir:: prefix. This allows the specifying of a directory name for makepkg to download the source into. If not specified, makepkg trys to pick a good name from the URL, but there is such variation in VCS URLs that it will be often useful to change it. This is an old, but little known, syntax available in PKGBUILDs, which can be used to rename any source file once it is downloaded.

So now that VCS sources can be used, even multiple different repos to build the one package, how does makepkg chose how to update the pkgver variable? Sort answer is that it doesn’t. You can provide a pkgver() function that outputs a string to be used for the updated package version. This is run after all the sources are downloaded and (make-)dependencies are installed. For my pacman-git package, I use something like:

Currently supported protocols in the master git branch of pacman are git (branch, commit, tag), hg (branch, revision, tag), svn (revision). That covers ~92% of the VCS PKGBUILDs in the AUR. Adding support for the remaining VCS that are used (bzr, cvs, darcs) – or any other VCS – is quite simple but requires knowing how to efficiently use the VCS tools. I will create a patch to support any additional VCS if someone provides me:

How to checkout a repo to a given folder.

What url “fragments” need supported for that VCS.

How to create a working copy of the checked out repo (i.e. “copy” the primary checkout folder) and how to get it to the specified branch/tag/commit/whatever. That can be in all one step.

Note that the old VCS PKGBUILDs will not stop working as such, although they are likely to be broken… At least the pkgver will no longer update. I’m sure there are other subtle incompatibilities too and you would still suffer from all the issues listed above, so it is definitely worth getting proper support for your needed VCS into makepkg.

If you want to take the new implementation for a spin, checkout a copy of the pacman git repo and build it. For those that are somewhat brave, you could even use the pacman-git package in the AUR, but make sure you know the risks involved in running a developmental version of a package manager entails…

This is a story about a recent issue discovered in pacman, the Arch Linux package manager, and the difficulties we had hunting it down… The story is long, but so was the process of finding the bug.

It all started on a warm summer’s night (in my timezone and location… – it was probably cold and daytime for the other main pacman developers) with the reporting of FS#27805: “[pacman] seg faults when removing firefox”. Of course, my initial reaction was “bull shit” as we all know there are no bugs in the pacman code. But this was only a couple of weeks since pacman-4.0 was moved into the Arch Linux [core] repo so there was an ever so slight possibility it was real.

Luckily for us, the user reporting the bug was very helpful and installed a version of pacman with debugging symbols and gave us a full backtrace. It was very clear where the segfault was occuring:

That function is called in the package removal process when we check that a file that is going to be removed with a package is not also owned by another package (which would require someone using -Sf when they should not). If the package in the local database is the same as the one being removed, we do not need to run this check, and hence the test. As you can see above, for some reason _alpm_pkg_cmp is being passed a null pointer as the package from the local database and KABOOM!

So the question was, how do we get a null value for the package from our local database? Given pacman runs through the list of local packages on each package removal, this null entry must have been generated on the removal of the previous package. Here is a bit of background on how package information is stored in pacman. Package information is stored in a hash table that also provides access to the data as a linked list. This provides us with fast look-up by a package’s name but also allows us to loop through the (generally sorted) package list. Now the hash table code is fairly new (first introduced in pacman-3.5) and the removal of items from a hash with collision resolution done by linear probing is not straight forward, so there could be a bug. Dan pointed his finger my way as I wrote the original hash table code and I pointed my finger his way as he made optimizations to the removal part. But it turns out that both of us were not thinking too hard. It is the list that is being corrupted and that has items removed using code that has been around for years. Despite that, the whole hash table and linked list removal code got an in depth review and no issues were found.

We were stumped. Looking at the the debug output from pacman, we could see that a file that actually did not exist on the system was being “removed” right before the crash, but that is not uncommon and appeared to be handled correctly so was unlikely to be the cause. So back to the reporter to see if we could get more information to replicate. He was very helpful and provided us with a copy of his local package database. We created a chroot with exactly the same packages and had no luck replicating. The user even provided us with a complete copy of his chroot where the error was occurring, but again there was no luck replicating. It must be something specific to that users system. Right? Well, even re-extracting the tarball of the chroot the user provided us onto his own system made the bug go away. All in all, a great candidate for being “not a bug”….

Until on another warm summers evening, while being my usual extremely helpful self on IRC, someone mentioned they were getting a segfault while removing packages. A bug report was filed and, again, the user was extremely helpful and the backtrace provided was exactly the same. A core dump showed us there was definitely something wrong with the linked list. Well… bugger! This bug appears real. Again the red-herring of the removal of a non-existent file was shown in the debug log, but it would be very, very strange for that to break the linked list of package information so was ruled out.

It was time to find a reproducer! So I created a chroot and set this script running:

Within five minutes I could replicate the segfault. (It turns out I was very lucky as I ran the same script again for over four hours and did not strike the issue.) Now it was time to get debugging!

The first thing I did was print some debugging info in the linked list node removal code, but for some reason the node removal just before the segfault did not print anything. I was only printing information when removing a node from the middle of the list (because that is where the package causing this issue was located), but just to be sure I also added debug statements for the case of removing the head and tail nodes. And then pacman told me it was removing a node from the end of the list… “Why do you think that package is a the end of the list pacman?”, I asked. “Because the head node’s prev entry tells me it IS the end of the list”, replied pacman. “Oh, crap”, I said. “So it does!” Something was clearly wrong here.

It was time to investigate all removal operations on that list. So I printed the entire linked list before and after each package removal and found the error actually occurred before the removal operation even started. The initial list of the local database passed to the removal operation was already broken with the pointer to the tail entry not pointing to the tail. That was good to know as we had thoroughly reviewed the removal code and not found any issues.

This lead me to believe that the error must occur when reading in the local database. Next step: print out the linked list at the end of reading in the local database. But that was completely fine. So somewhere between reading in the local database and using it, things got broken. And, what do we do with the local database between reading it in and removing items from it? The only place where we modify the local database between those points is when it gets sorted by the package names. Sure enough, the pointer to the tail of the linked list is good going into the sort and bad coming out.

This limited the error to two functions: alpm_list_msort or alpm_list_mmerge. These implement a merge sort. Essentially alpm_list_msort recursively calls itself, dividing the list up into smaller pieces until it can not be divided any further and they are then they are merged in sorted order by alpm_list_mmerge. I had just started staring at the code when I saw something that seemed too obvious for such a hard to track down bug. My exact words on IRC were “I think I can fix this…”. And sure enough I could.

It turns out that when alpm_list_msort split a list into two, it did not set the pointer to the tail nodes in the two new lists correctly (or at all…). So a two line addition and we have the bug fixed. It turns out this bug had been present since the start of 2007. So I am still slightly amazed that we did not see it before now and when it did appear that we got a second report of it so quickly.

And why could we not reproduce the issue even with a copy of a chroot where it was occurring? It is entirely dependent on the order the directory entries are returned from the disk. This determined which package was pointed to as the “tail” of the sorted package list. The package incorrectly referred to as the tail had to be removed during a removal operation, and also not be the last package removed, to expose the bug. Given most systems will have many hundreds of packages on them and removal operations tend to involve one or a few packages, this is a fairly rare occurrence. But even if it occurred only a fraction of a percent of removal operations, I think we should have ran into this bug before now. I guess more people probably did experience the issue, but then could not immediately replicate and did not experience the issue again so did not report it.

And that is the end of the story of one of the most frustrating bugs I have ever managed to track down. A big thank you to the two users who installed versions of pacman with debug symbols and provided us backtraces, coredumps and entire chroots! Without their help, we would probably still be not entirely convinced that the bug was real and it would still be hiding away in the pacman source code.