What Barry Warsaw talks about

Posts tagged with 'debian'

For UDS-R for Raring (i.e. Ubuntu 13.04) in Copenhagen, I sponsored three blueprints. These blueprints represent most of the work I will be doing for the next 6 months, as we're well on our way to the next LTS, Ubuntu 14.04.

I'll provide some updates to the other blueprints later, but for now, I want to talk about OAuth and Python 3. OAuth is a protocol which allows you to programmatically interact with certain website APIs, in an authenticated manner, without having to provide your website password. Essentially, it allows you to generate an authorization token which you can use instead, and it allows you to manage and share these tokens with applications, so that you can revoke them if you want, or decide how and which applications to trust to act on your behalf.

A good example of a site that uses OAuth is Launchpad, but many other sites also support OAuth, such as Twitter and Facebook.

There are actually two versions of OAuth out there. OAuth version 1 is definitely the more prevelent, since it has been around for years, is relatively simple (at least on the client side), and enshrined in RFC 5849. There are tons of libraries available that support OAuth v1, in a multitude of languages, with Python being no exception.

One of the very earliest Python libraries to support OAuth v1, on both the client and server side, was python-oauth (I'll use the Debian package names in this post), and on the Ubuntu desktop, you'll find lots of scripts and libraries that use python-oauth. There are major problems with this library though, and I highly recommend not using it. The biggest problems are that the code is abandoned by its upstream maintainer (it hasn't be updated on PyPI since 2009), and it is not Python 3 compatible. Because the OAuth v2 draft came after this library was abandoned, it provides no support for the successor specification.

For this reason, one of the blueprints I sponsored was specifically to survey the alternatives available for Python programmers, and make a decision about which one we would officially endorse for Ubuntu. By "official endorsement" I mean promote the library to other Python programmers (hence this post!) and to port all of our desktop scripts from python-oauth to the agreed upon library.

After some discussion, it was unanimous by the attendees of the UDS session (both in-person and remotely), to choose the python-oauthlib as our preferred library.

python-oauthlib has a lot going for it. It's Python 3 compatible, has an active upstream maintainer, supports both RFC 5849 for v1, and closely follows the draft for v2. It's a well-tested, solid library, and it is available in Ubuntu for both Python 2 and Python 3. Probably the only negative is that the library does not provide any support for the server side. This is not a major problem for our immediate plans, since there aren't any server applications on the Ubuntu desktop requiring OAuth. Eventually, yes, we'll need server side support, but we can punt on that recommendation for now.

Another cool thing about python-oauthlib is that it has been adopted by the python-requests library, meaning, if you want to use a modern replacement for the urllib2/httplib2 circus which supports OAuth out of the box, you can just use python-requests, provide the appropriate parameters, and you get request signing for free.

So, as you'll see from the blueprint, there are several bugs linked to packages which need porting to python-oauthlib for Ubuntu 13.04, and I am actively working on them, though contributions, as always, are welcome! I thought I'd include a little bit of code to show you how you might port from python-oauth to python-oauthlib. We'll stick with OAuth v1 in this discussion.

The first thing to recognize is that python-oauth uses different, older terminology that predates the RFC. Thus, you'll see references to a token key and token secret, as well as a consumer key and consumer secret. In the RFC, and in python-oauthlib, these terms are client key, client secret, resource owner key, and resource owner secret respectively. After you get over that hump, the rest pretty much falls into place. As an example, here is a code snippet from the piston-mini-client library which used the old python-oauth library:

The constructor is pretty simple, and it uses the old OAuth terminology. The key thing to notice is the way the old API required you to create a consumer, a token, and then a request object, then ask the request object to sign the request. On top of all the other disadvantages, this isn't a very convenient API. Let's look at the snippet after conversion to python-oauthlib.

See how much nicer this is? You need only create a client object, essentially using all the same bits of information. Then you ask the client to sign the request, and update the request headers with the signature. Much easier.

Two important things to note. If you are doing an HTTP GET, there is no request body, and thus no request content which needs to contribute to the signature. In python-oauth, you could specify an empty body by using either None or the empty string. piston-mini-client uses the latter, and this is embodied in its public API. python-oauthlib however, treats the empty string as a body being present, so it would require the Content-Type header to be set even for an HTTP GET which has no content (i.e. no body). This is why the replacement code checks for an empty string being passed in (actually, any false-ish value), and coerces that to None.

The second issue is that python-oauthlib requires the keys and secrets to be Unicode objects; they cannot be bytes objects. In code ported straight from Python 2 however, these values are usually 8-bit strings, and so become bytes objects in Python 3. python-oauthlib will raise a ValueError during signing if any of these are bytes objects. Thus the use of the _unicodeify() function to decode these values to unicodes.

The above works in both Python 2 and Python 3. Of course, we don't know for sure that the bytes values are UTF-8, but it's the only sane encoding to expect, and if a client of piston-mini-client were to be so insane as to use an incompatible encoding (US-ASCII is fine because it's compatible with UTF-8), it would be up to the client to just pass in unicodes in the first place. At the time of this writing, this is under active discussion with upstream, but for now, it's not too difficult to work around.

Anyway, I hope this helps, and I encourage you to help increase the popularity of python-oauthlib on the Cheeseshop, so that we can one day finally kill off the long defunct python-oauth library.

So, now all the world now knows that my suggested code name for Ubuntu 12.10, Qwazy Quahog, was not chosen by Mark. Oh well, maybe I'll have more luck with Racy Roadrunner.

In any case, Ubuntu 12.04 LTS is to be released any day now so it's time for my semi-annual report on Python plans for Ubuntu. I seem to write about this every cycle, so 12.10 is no exception. We've made some fantastic progress, but now it's time to get serious.

For Ubuntu 12.10, we've made it a release goal to have Python 3 only on the desktop CD images. The usual caveats apply: Python 2.7 isn't going away; it will still probably always be available in the main archive. This release goal also doesn't affect other installation CD images, such as server, or other Ubuntu flavors. The relatively modest goal then only affects packages for the standard desktop CD images, i.e. the alternative installation CD and the live CD.

Update 20120425: To be crystal clear, if you depend on Python 2.7, the only thing that changes for you is that after a fresh install from the desktop CD on a new machine, you'll have to explicitly apt-get install python2.7. After that, everything else will be the same.

This is ostensibly an effort to port a significant chunk of Ubuntu to Python 3, but it really is a much wider, Python-community driven effort. Ubuntu has its priorities, but I personally want to see a world where Python 3 rules the day, and we can finally start scoffing at Python 2 :).

Still, that leaves us with about 145 binary packages (and many fewer source packages) to port. There are a few categories of packages to consider:

Already ported and available. This is the good news, and covers packages such as dbus-python. Unfortunately, there aren't too many others, but we need to check with Debian and make sure we're in sync with any packages there that already support Python 3 (python3-dateutil comes to mind).

Upstream supports Python 3, but it is not yet available in Debian or Ubuntu. These packages should be fairly easy to port, since we have pretty good packaging guidelines for supporting both Python 2 and Python 3.

Packages with better replacements for Python 3. A good example is the python-simplejson package. Here, we might not care as much because Python 3 already comes with a json module in its standard library, so code which depends on python-simplejson and is required for the desktop CD, should be ported to use the stdlib json module. python-gobject is another case where porting is a better option, since pygi (gobject-introspection) already supports Python 3.

Canonical is the upstream. Many packages in the archive, such as python-launchpadlib and python-lazr.restfulclient are developed upstream by Canonical. This doesn't mean you can't or shouldn't help out with the porting of those modules, it's just that we know who to lean on as a last resort. By all means, feel free to contribute to these too!

Orphaned by upstream. These are the most problematic, since there's essentially no upstream maintainer to contribute patches to. An example is python-oauth. In these cases, we need to look for alternatives that are maintained upstream, and open to porting to Python 3. In the case of python-oauth, we need to investigate oauth2, and see if there are features we're using from the abandoned package that may not be available in the supported one.

Unknowns. Well, this one's the big risky part because we don't know what we don't know.

We need your help! First of all, there's no way I can personally port everything on our list, including both libraries and applications. We may have to make some hard choices to drop some functionality from Ubuntu if we can't get it ported, and we don't want to have to do that. So here are some ways you can contribute:

Fill in the spreadsheet with more information. If you're aware of an upstream or Debian port to Python 3, let us know. It may make it easier for someone else to enable the Python 3 version in Debian, or to shepherd the upstream patch to landing on their trunk.

Help upstream make a Python 3 port available. There are lots of resources available to help you port some code, from quick references to in-depth guides. There's also a mailing list (and Gmane newsgroup mirror) you can join to get help, report status, and have other related discussions. Some people have asked Python 3 porting questions on StackOverflow, using the tags #python, #python-3.x, and #porting

Get packages ported in Debian. Once upstream supports Python 3, you can extend the existing Debian package to expose this support into Debian. From there, you or we can make sure that gets sync'd into Ubuntu.

Spread the word! Even if you don't have time to do any ports yourself, you can help publicize this effort through social media, mailing lists, and your local Python community. This really is a Python-wide effort!

Python 3.3 is scheduled to be released later this year. Please help make 2012 the year that Python 3 reached critical mass!

-----------------------------

On a more personal note, I am also committed to making Mailman 3 a Python 3 application, but right now I'm blocked on a number of dependencies. Here are the list of dependencies from the setup.py file, and their statuses. I would love it if you help get these ported too!

sbuild is an excellent tool for locally building Ubuntu and Debian packages. It fits into roughly the same problem space as the more popular pbuilder, but for many reasons, I prefer sbuild. It's based on schroot to create chroot environments for any distribution and version you might want. For example, I have chroots for Ubuntu Oneiric, Natty, Maverick, and Lucid, Debian Sid, Wheezy, and Squeeze, for both i386 and amd64. It uses an overlay filesystem so you can easily set up the primary snapshot with whatever packages or prerequisites you want, and the individual builds will create a new session with an overlaid temporary filesystem on top of that, so the build results will not affect your primary snapshot. sbuild can also be configured to save the session depending on the success or failure of your build, which is fantastic for debugging build failures. I've been told that Launchpad's build farm uses a customized version of sbuild, and in my experience, if you can get a package to build locally with sbuild, it will build fine in the main archive or a PPA.

Right out of the box, sbuild will work great for individual package builds, with very little configuration or setup. The Ubuntu Security Team's wiki page has some excellent instructions for getting started (you can stop reading when you get to UMT :).

One thing that sbuild doesn't do very well though, is help you build a stack of packages. By that I mean, when you have a new package that itself has new dependencies, you need to build those dependencies first, and then build your new package based on those dependencies. Here's an example.

I'm working on bug 832864 and I wanted to see if I could build the newer Debian Sid version of the PySide package. However, this requires newer apiextractor, generatorrunner, and shiboken packages (and technically speaking, debhelper too, but I'm working around that), so you have to arrange for the chroot to have those newer packages when it builds PySide, rather than the ones in the Oneiric archive. This is something that PPAs do very nicely, because when you build a package in your PPA, it will use the other packages in that PPA as dependencies before it uses the standard archive. The problem with PPAs though is that when the Launchpad build farm is overloaded, you might have to wait several hours for your build. Those long turnarounds don't help productivity much. ;)

What I wanted was something like the PPA dependencies, but with the speed and responsiveness of a local build. After reading the sbuild manpage, and "suffering" through a scan of its source code (sbuild is written in Perl :), I found that this wasn't really supported by sbuild. However, sbuild does have hooks that can run at various times during the build, which seemed promising. My colleague Kees Cook was a contributor to sbuild, so a quick IRC chat indicated that most people create a local repository, populating it with the dependencies as you build them. Of course, I want to automate that as much as possible. The requisite googling found a few hints here and there, but nothing to pull it all together. With some willful hackery, I managed to get it working.

Rather than post some code that will almost immediately go out of date, let me point you to the bzr repository where you can find the code. There are two scripts: prep.sh and scan.sh, along with a snippet for your ~/.sbuildrc file to make it even easier. sbuild will call scan.sh first, but here's the important part: it calls that outside the chroot, as you (not root). You'll probably want to change $where though; this is where you drop the .deb and .dsc files for the dependencies. Note too, that you'll need to add an entry to your /etc/schroot/default/fstab file so that your outside-the-chroot repo directory gets mapped to /repo inside the chroot. For example:

# Expose local apt repository to the chroot

/home/barry/ubuntu/repo /repo none rw,bind 0 0

An apt repository needs a Packages and Packages.gz file for binary packages, and a Sources and Sources.gz file for the source packages. Secure APT also requires a Release and Release.gpg file signed with a known key. The scan.sh file sets all this up, using the apt-ftparchive command. The first apt-ftparchive call creates the Sources and Sources.gz file. It scans all your .dsc files and generates the proper entries, then creates a compressed copy, which is what apt actually "downloads". The tricky thing here is that without changing directories before calling apt-ftparchive, your outside-the-chroot paths will leak into this file, in the form of Directory: headers in Sources.gz. Because that path won't generally be available inside the chroot, we have to get rid of those headers. I'm sure there's an apt-ftparchive option to do this, but I couldn't find it. I accidentally discovered that cd'ing to the directory with the .dsc files was enough to trick the command into omitting the Directory: headers.

The second call to apt-ftparchive creates the Packages and Packages.gz files. As with the source files, we get some outside-the-chroot paths leaking in, this time as path prefixes to the Filename: header value. Again, we have to get rid of these prefixes, but cd'ing to the directory with the .deb files doesn't do the trick. No doubt there's some apt-ftparchive magical option for this too, but sed'ing out the paths works well enough.

The third apt-ftparchive file creates the Release file. I shameless stole this from the security team's update_repo script. The tricky part here is getting Release signed with a gpg key that will be available to apt inside the chroot. sbuild comes with its own signing key, so all you have to do is specify its public and private keys when signing the file. However, because the public file from

/var/lib/sbuild/apt-keys/sbuild-key.pub

won't be available inside the chroot, the script copies it to what will be /repo inside the chroot. You'll see later how this comes into play.

Okay, so now we have the repository set up well enough for sbuild to carry on. Later, before the build commences, sbuild will call prep.sh, but this script gets called inside the chroot, as the root user. Of course, at this point /repo is mounted in the chroot too. All prep.sh needs to do is add a sources.list.d entry so apt can find your local repository, and it needs to add the public key of the sbuild signing key pair to apt's keyring. After it does this, it needs to do one more apt-get update. It's useful to know that at the point when sbuild calls prep.sh, it's already done one apt-get update, so this does add a duplicate step, but at least we're fortunate enough that prep.sh gets called before sbuild installs all the build dependencies. Once prep.sh is run, the chroot will have your overriding dependent packages, and will proceed with a normal build.

Simple, huh?

Besides getting rid of the hackery mentioned above, there are a few things that could be done better:

So, yesterday (June 21, 2011), six talented and motivated Python hackers from the Washington DC area met at Panera Bread in downtown Silver Spring, Maryland to sprint on PEP 382. This is a Python Enhancement Proposal to introduce a better way for handling namespace packages, and our intent is to get this feature landed in Python 3.3. Here then is a summary, from my own spotty notes and memory, of how the sprint went.

First, just a brief outline of what the PEP does. For more details please read the PEP itself, or join the newly resurrected import-sig for more discussions. The PEP has two main purposes. First, it fixes the problem of which package owns a namespace's __init__.py file, e.g. zope/__init__.py for all the Zope packages. In essence, it eliminate the need for these by introducing a new variant of .pth files to define a namespace package. Thus, the zope.interfaces package would own zope/zope-interfaces.pth and the zope.components package would own zope/zope-components.pth. The presence of either .pth file is enough to define the namespace package. There's no ambiguity or collision with these files the way there is for zope/__init__.py. This aspect will be very beneficial for Debian and Ubuntu.

Second, the PEP defines the one official way of defining namespace packages, rather than the multitude of ad-hoc ways currently in use. With the pre-PEP 382 way, it was easy to get the details subtly wrong, and unless all subpackages cooperated correctly, the packages would be broken. Now, all you do is put a * in the .pth file and you're done.

Sounds easy, right? Well, Python's import machinery is pretty complex, and there are actually two parallelimplementations of it in Python 3.3, so gaining traction on this PEP has been a hard slog. Not only that, but the PEP has implications for all the packaging tools out there, and changes the API requirements for PEP 302 loaders. It doesn't help that import.c (the primary implementation of the import machinery) has loads of crud that predates PEP 302.

On the plus side, Martin von Loewis (the PEP author) is one of the smartest Python developers around, and he's done a very good first cut of an implementation in his feature branch, so there's a great place to start.

Eric Smith (who is the 382 BDFOP, or benevolent dictator for one pep), Jason Coombs, and I met once before to sprint on PEP 382, and we came away with more questions than answers. Eric, Jason, and I live near each other so it's really great to meet up with people for some face-to-face hacking. This time, we made a wider announcement, on social media and the BACON-PIG mailing list, and were joined by three other local Python developers. The PSF graciously agreed to sponsor us, and while we couldn't get our first, second, third, or fourth choices of venues, we did manage to score some prime real-estate and free wifi at Panera.

So, what did we accomplish? Both a lot, and a little. Despite working from about 4pm until closing, we didn't commit much more than a few bug fixes (e.g. an uninitialized variable that was crashing the tests on Fedora), a build fix for Windows, and a few other minor things. However, we did come away with a much better understanding of the existing code, and a plan of action to continue the work online. All the gory details are in the wiki page that I created.

One very important thing we did was to review the existing test suite for coverage of the PEP specifications. We identified a number of holes in the existing test suite, and we'll work on adding tests for these. We also recognized that importlib (the pure-Python re-implementation of the import machinery) wasn't covered at all in the existing PEP 382 tests, so Michael worked on that. Not surprisingly, once that was enabled, the tests failed, since importlib has not yet been modified to support PEP 382.

We also came up with a number of questions where we think the PEP needs clarification. We'll start discussion about these on the relevant mailing lists.

Finally, Eric brought up a very interesting proposal. We all observed how difficult it is to make progress on this PEP, and Eric commented on how there's a lot of historical cruft in import.c, much of which predates PEP 302. That PEP defines an API for extending the import machinery with new loaders and finders. Eric proposed that we could simplify import.c by removing all the bits that could be re-implemented as PEP 302 loaders, specifically the import-from-filesystem stuff. The other neat thing is that the loaders could probably be implemented in pure-Python without much of a performance hit, since we surmise that the stat calls dominate. If that's true, then we'd be able to refactor importlib to share a lot of code with the built-in C import machinery. This could have the potential to greatly simplify import.c so that it contains just the PEP 302 machinery, with some bootstrapping code. It may even be possible to move most of the PEP 382 implementation into the loaders. At the sprint we did a quick experiment with zipping up the standard library and it looked promising, so Eric's going to take a crack at this.

This is briefly what we accomplished at the sprint. I hope we'll continue the enthusiasm online, and if you want to join us, please do subscribe to the import-sig!

My friend Tim is working on a very cool Bazaar-backed wiki project and he asked me to package it up for Ubuntu. I'm getting pretty good at packaging Python projects, but I always like the practice because each time it gets a little smoother. This one I managed to package in about 10 minutes so I thought I'd outline the very easy process.

First of all, you want to have a good setup.py, and if you like to cargo cult, you can start with this one. I highly recommend using Distribute instead of setuptools, and in fact the former is what Ubuntu gives you by default. I really like adding the distribute_setup.py which gives you nice features like being able to do python setup.py test and many other things. See lines 18 and 19 in the above referenced setup.py file.

The next thing you'll want is Andrew Straw's fine stdeb package, which you can get on Ubuntu with sudo apt-get install python-stdeb. This package is going to bootstrap your debian/ directory from your setup.py file. It's not perfectly suited to the task (yet, Andrew assures me :), but we can make it work!

These days, I host all of my packages in Bazaar on Launchpad, which is going to make some of the following steps really easy. If you use a different hosting site or a different version control system, you will have to build your Ubuntu package using more traditional means. That's okay, once you have your debian/ directory, it'll be fairly easy (but not as easy as described here ). If you do use Bazaar, you'll just want to make sure you have the bzr-builddeb. Just do sudo apt-get install bzr-builddeb on Ubuntu and you should get everything you need.

Okay, so now you have the requisite packages, and a setup.py, let's build us a deb and upload it to our personal package archive so everyone on Debian and Ubuntu can easily try it out.

Notice that this leaves us with a deb_dist/ directory, not the debian/ directory we want. The latter is in there, just buried a bit. Let's dig it out:

% mv deb_dist/wikkid-0.1/debian .

% rm -rf deb_dist

% bzr add debian

% bzr commit -m'Debianize'

Note that "wikkid-0.1" will be replaced by the name of your package. In order to build the .deb package, you need an "orig.tar.gz" file. Packaging sort of assumes that you've got an original upstream tarball somewhere and you're just adding the necessary Debian goo to package the thing. In this case, we don't have an upstream tarball, although we could easily create one, and upload it to the Cheeseshop or Launchpad or wherever. However, that just slows us down so let's skip that for now! (Aside: if you do have an upstream tarball somewhere, you'll want to add a debian/watch which points to it; that'll eliminate the need to do the next step, by downloading the tarball instead).

Let's create the tarball right now and copy it to where the following step will expect it:

% python setup.py sdist

% mv dist/Wikkid-0.1.tar.gz ../wikkid_0.1.orig.tar.gz

Here's the second icky bit. Building a Debian source package imposes a very specific naming convention on the tarball. Wikkid's setup.py happens to build a tarball with an incompatible name, while the sdist command leaves it in a place where the next step can't find it. The rename just gets everything into the proper place. YMMV.

Now we can build the Debian source package. It's the source package that we'll upload to our Launchpad PPA. Launchpad will then automatically (if we've done everything right) build the binary package from the uploaded source package, from which Ubuntu and Debian users can easily install.

Oops! Before we do this, please edit your debian/changelog file and change unstable to lucid. You should also change the version number by adding a ~ppa1 to the end of it. Yeah, more ickiness.

I do hope to work with the appropriate developers to make some of the ickiness go away. Please do contact me if you want to help!

Addendum (2010-06-10)

Let's say you publish your tarball on the Cheeseshop or Launchpad, and you don't want to have to build a different tarball locally in order to package it. Here's what I think works:

Create a debian/watch file that points to the download location you publish to. If your package is not yet available in Debian or Ubuntu, then use this command to build your source package:

bzr bd -S -- -sa

The bit at the end tells the Debian packaging primitives to include your tarball when your source package is uploaded. The debian/watch file is used to download your published tarball and automatically renamed to the required .orig.tar.gz name. When you dput your package, your tarball will be uploaded too, and everything should build properly.

Oh, and don't forget to look carefully at the lintian output. Try to make this as clean as possible. The Debian and Ubuntu packaging guides can help here.

Addendum 2 (2010-06-10)

Andrew Straw has added a debianize command to his stdeb package, which makes things much nicer. With this you can create the debian/ directory right next to your setup.py. AFAIK, this version of stdeb isn't released yet, so you need to install his git head in a virtualenv, and it has a few minor buglets, but it does seem like the best-of-breed solution. I'll post another article with a more detailed follow up later.