Categories

Meta

Author: Philip Semanchuk

I recently started working with CentOS 5.11 as an operating system on which to build Python wheels for Linux. I wrote about why I used CentOS 5.11. The oversimplified reason is because it’s old. After using it for a while, I’ve started to wonder, how old is too old?

Why CentOS 5.11, Again?

To understand why the authors of PEP 513 recommend CentOS 5.11, we must briefly consider a subject that Python programmers can usually ignore—binary dependencies.

When one builds a binary on Linux (or any system, for that matter), it comes with dependencies on runtime libraries. Even a simple “hello world” program will depend on the C runtime library (glibc if you build with GCC). The benefit of building binaries on an older Linux is that the runtime dependencies—particularly glibc—are likely to be present on newer systems. The reverse is not true. If you link your binary to a brand new glibc, it won’t be able to run on older systems because they don’t have the glibc needed to load your binary.

CentOS 5.11 was released on the last day of September 2014 (according to DistroWatch.org), so it’s only 1½ years old. But it’s a derivative of Red Hat which is a notoriously conservative distro. To give you some idea of how conservative it is, CentOS 5.11 provides Python 2.4.3 which was released in 2006, eight years before CentOS 5.11, and almost ten years before PEP 513 was released. CentOS 5.11 is a snapshot of what was state-of-the-art some years ago.

There’s nothing wrong with a Linux distro that chooses to be this conservative. If you want modern software, or bleeding edge, there are distros for that. RedHat Enterprise Linux (and thus CentOS) is not one of them.

CentOS 5.11’s “old school” attitude makes it a very safe bet that the versions of it base libraries (like glibc) will appear on other Linux distros, and that’s why it’s a good choice for building Linux wheels.

Great! Are There Any Downsides?

Yup.

The point of using an older Linux is to ensure that runtime libraries are not too new to appear on other Linuxes. But what if the opposite happens?

What happens if a library on CentOS 5.11 is so old that some of the Linux world no longer supports it? That’s what happened when I tried (for one of my clients) to wrap a Fortran library with Python and distribute it as a wheel. I built the Fortran code with the default GFortran/GCC version which was 4.1.2.

The resulting binary has a dependency on libgfortran.so.1. This library has become old enough that it’s not always easy to install. For instance, it’s not in the repositories of the very popular Ubuntu 14.04 LTS.

That’s particularly surprising when you consider that Ubuntu 14.04 LTS was released about six months before CentOS 5.11. Despite this, the former had already dropped support for the default libgfortran of the latter.

This is a good example of how CentOS 5.11 helps to avoid dependency problems, but doesn’t entirely solve them. In short, caveat munitor (builder beware).

How I Resolved the libgfortran.so.1 Dependency

I was able to build a wheel for my client that solved the specific libgfortran.so.1 dependency problem described above. I set the binary’s rpath to include the binary’s directory ($ORIGIN) and shipped libgfortran.so.1 as part of the Python wheel in the same directory as the custom shared library. The relevant Makefile portion looks like this—

PEP 513 gives guidelines on how to build broadly compatible Linux platform wheels for Python. The PEP names CentOS 5.11 as the reference OS on which a Python wheel must run if it is to earn the right to the manylinux1 name.

The surest way to get one’s binary package to run on CentOS 5.11 is to build it there. This post explains how I set up a CentOS 5.11 VirtualBox guest to build manylinux1 Python wheels.

PEP 513 offers a prebuilt Docker container of CentOS 5.11. If you’re on Linux and/or you’re familiar with Docker, that’s probably a better route than building a VM.

Note that I’m not at all a Linux expert. If I’ve done something foolish or incorrect, I’d like to hear about it in the comments. Please be nice. =)

Why CentOS 5.11?

CentOS has a few things going for it that make it a good choice —

It’s free

As a derivative of Red Hat Enterprise Linux, it’s a conservative distro, so the libraries on it are likely older than the libraries on contemporaneous distros.

At the time PEP 513 was written, CentOS 5.11 was already over a year old. That increases the odds that other distros will have the libraries that it has.

CentOS 5.11 Setup

Download the CentOS 5.11 ISO and install under VirtualBox. During the CentOS installation, I opted to disable SELinux. Since I only use this installation for builds and not as a server or daily desktop, I don’t feel the need for high security.

Once CentOS is installed, let the updater download and install its patches.

Next, you’ll want to install VirtualBox guest additions to make the guest OS easier to use. In order to do that, you first have to add yourself to the sudoers file.

Add Yourself to sudoers

Open a terminal and enter the following commands —

su -

vim /etc/sudoers

At the end of the file, add this line:your_username ALL=(ALL) ALL

Save the file with :wq!

Type exit to exit the su - shell.

Now you should be able to run commands with sudo.

Build the VirtualBox Guest Additions

Install GCC:

sudo yum install gcc gcc-c++

Insert the Guest Additions CD.

Start a terminal and cd /media/VBOXADDITIONS_xxxx. Note that the exact name of the VBOXADDITIONS directory changes with each each version of VirtualBox.

sudo ./VBoxLinuxAdditions.run

Eject the Guest Additions CD and reboot.

Add Packages

Build Python

CentOS 5.11 comes with Python 2.4. You will undoubtedly want a newer Python, so download and untar the source code for the Python you want to use and then build it. I built Python 2.7.11 with these steps —

sudo ./configure --enable-unicode=ucs4
sudo make altinstall

It’s important to use UCS4 (as opposed to the default UCS2) during the configure step to increase your odds of being compatible with the Pythons built for other Linux distros.

make altinstall tells Python to install itself in such a way that it doesn’t interfere with the default (system) Python.

Once my Python was built, I added a symlink to make it the default Python in my shell —

sudo ln -s /usr/local/bin/python2.7 /usr/local/bin/python

At this point, if you start a new terminal and type python, you should
get Python 2.7.11.

Those of you who grew up in Philadelphia in the 1980s might recognize the name Robert Hazard, leader of Robert Hazard and the Heroes, author of Escalator of Life, Change Reaction, and Out of the Blue, and, of course, Girls Just Wanna Have Fun. Robert’s popularity never grew much out of the Philly/NJ area, but Cyndi Lauper’s version of Girls Just Wanna Have Fun sold a zillion copies worldwide and touched a lot of people who had never heard of Robert Hazard, and never will.

I have something in common with him — someone has given my work far more exposure than I ever expected it to get. (Another thing we have in common is growing up in the same small Philadelphia suburb. But without Cyndi Lauper’s involvement, that’s just trivia.)

I was surprised to learn from http://pypi-ranking.info that posix_ipc, one of my open source packages, is currently in the top .5% (½ of 1%) of the most downloaded on PyPI. Now, posix_ipc might be good at what it does, but it fills a tiny niche that’s nowhere near big enough to justify all of those 1.7-million-and-counting downloads. Why is it a top 1% download? Because it has become part of something much bigger — the massively popular OpenStack.

OpenStack didn’t have to rewrite portions of posix_ipc (like Cyndi did with Girls Just Wanna Have Fun, with Robert’s permission). They haven’t yet made a video of it that includes a nod to the Marx Brothers (like Cyndi did, with or without Robert’s permission). And as far as I know, OpenStack has yet to be nominated for a Grammy. But they have shown me the value of putting something out into the world, because you never know where it will end up.

So thanks, OpenStack! And thanks to Robert Hazard for music I enjoyed growing up (and still do). R.I.P, Robert.

A few years ago I wrote a Django app for a client. One part of the app called os.getcwd(), and another part (that I thought of as completely separate) used a temporary directory to build PDFs.

Occasionally the call to os.getcwd() would raise an error. I was confused. How can there be no current directory? It took me a while to figure it out, but in hindsight it’s kind of obvious (as these things often are).

My PDF-building code created a temporary directory, set that directory to be the current working directory, and then removed the directory once the PDF was built. After that, there was no current working directory. It’s easy to demonstrate —

As part of testing my Python wrapper for SysV IPC, I wrote tests for the time-related attributes of the IPC objects that change when something happens to the object. For instance, when someone sends a message to a queue, the queue’s last_send_time attribute (msg_stime in the C code) is updated to the current time.

I have a hard time imagining many programmers care about these attributes. Knowing the last time someone changed the uid of a message queue, for instance, just doesn’t have many use cases. But they’re part of the SysV IPC API and so I want my package to expose them.

I wrote tests to ensure that each one changed when it was supposed to. The tests failed consistently although the code worked when I tested it “by hand” in an interactive shell. Here’s the relevant portion of a failing test:

def test_property_last_change_time(self):
"""exercise MessageQueue.last_change_time"""
original_last_change_time = self.mq.last_change_time
# This might seem like a no-op, but setting the UID to
# any value triggers a call to msgctl(...IPC_STAT...)
# which should set last_change_time.
self.mq.uid = self.mq.uid
# Ensure the time actually changed.
self.assertNotEqual(self.mq.last_change_time,
original_last_change_time)

The problem is obvious, right? No, it wasn’t obvious to me, either.

The problem is that in C, a message queue’s last change time (msg_ctime) is of variable type time_t which is typedef-ed as an integral type (int or long) on most (all?) systems. Because the test above executed in less than 1 second, the assertion always failed. Setting self.mq.uid correctly caused an update to the last change time (msg_ctime), it was just being updated to the same value that had been saved in the first line of the test.

My solution was to add a little sleeping, like so –

def test_property_last_change_time(self):
"""exercise MessageQueue.last_change_time"""
original_last_change_time = self.mq.last_change_time
time.sleep(1.1)
# This might seem like a no-op, but setting the UID to
# any value triggers a call to msgctl(...IPC_STAT...)
# which should set last_change_time.
self.mq.uid = self.mq.uid
# Ensure the time actually changed.
self.assertNotEqual(self.mq.last_change_time,
original_last_change_time)

That ensured that the value stored in original_last_change_time at the start of the test would differ from self.mq.last_change_time by at least 1 at the end of the test.

I installed WordPress via Softaculous, and it offered a checkbox that said “Email an installation log to…”. That sounded like a good idea until I found out that the email contained my admin password in cleartext. Ouch!

I changed my admin password from p@ssw0rd to p@ssw0rd1 just to be on the safe side.