APT programming snippets for Debian system maintenance

The Python API for the Debian package manager APT is useful for writing practical system maintenance scripts, which are going beyond shell scripting capabilities.
There are Python2 and Python3 libraries for that available as packages, as well as a documentation in the package python-apt-doc.
If that’s also installed, the documentation then could be found in /usr/share/doc/python-apt-doc/html/index.html, and there are also a couple of example scripts shipped into /usr/share/doc/python-apt-doc/examples.
The libraries mainly consists of Python bindings for the libapt-inst and libapt-pkg C++ core libraries of the APT package manager, which makes it processing very fast.
Debugging symbols are also available as packages (python{,3}-apt-dbg).
The module apt_inst provides features like reading from binary packages, while apt_pkg resembles the functions of the package manager.
There is also the apt abstraction layer which provides more convenient access to the library, like apt.cache.Cache() could be used to behave like apt-get:

boil out selections

As widely known, there is a feature of dpkg which helps to move a package inventory from one installation to another by just using a text file with a list of installed packages.
A selections list containing all installed package could be easily generated with $ dpkg --get-selections > selections.txt.
The resulting file then looks something similar like this:

The counterpart for this operation (--set-selections) could be used to reinstall (add) the complete package inventory on another installation resp. computer (that needs superuser rights), like that’s explained in the manpage dpkg(1).
No problem so far.

The problem is, if that list contains a package which couldn’t be found in any of the package inventories which are set up in /etc/apt/sources.list(.d/) on the target system, dpkg stops the whole process:

# dpkg --set-selections < selections.txt
dpkg: warning: package not in database at line 524: google-chrome-beta
dpkg: warning: found unknown packages; this might mean the available database
is outdated, and needs to be updated through a frontend method

Thus, manually downloaded and installed “wild” packages from unofficial package sources are problematic for this approach, because the package installer simply doesn’t know where to get them.

Luckily, dpkg puts out the relevant package names, but instead of having them removed manually with an editor this little Python script for python3-apt automatically deletes any of these packages from a selections file:

The script takes one argument which is the name of the selections file which has been generated by dpkg.
The low level module apt_pkg first has to been initialized with apt_pkg.init().
Then apt_pkg.Cache() can be used to instantiate a cache object (here: cache).
That object is iterable, thus it’s easy to not perform something if a package from that list couldn’t be found in the database, like not copying the corresponding line into the outfile (.boiled), while the others are copied.

That script might be useful also for moving from one distribution resp. derivative to another (like from Ubuntu to Debian).
For productive use, open() should be of course secured against FileNotFound and IOError-s to prevent program crashs on such events.

purge rc-s

Like also widely known, deinstalled packages leave stuff like configuration files, maintainer scripts and logs on the computer, to save that if the package gets reinstalled at some point in the future.
That happens if dpkg has been used with -r/--remove instead of -P/--purge, which also removes these files which are left otherwise.

It could be purged over them afterwards to completely remove them from the system.
There are several shell coding snippets to be found on the net for completing this job automatically, like this one here:

It’s not yet production ready (like there’s an infinite loop if dpkg returns error code 1 like from “can’t remove non empty folder”).
But generally, ATTENTION: be very careful with typos and other mistakes if you want to use that code snippet, a false script performing changes on the package database might destroy the integrity of your system, and you don’t want that to happen.

detect “wild” packages

Like said above, installed Debian packages might be called “wild” if they have been downloaded from somewhere on the net and installed manually, like that is done from time to time on many systems.
If you want to remove that whole class of packages again for any reason, the question would be how to detect them.
A characteristic element is that there is no source connected to such a package, and that could be detected by Python scripting using again the bindings for the APT libraries.

The package object doesn’t have an associated method to query its source, because the origin is always connected to a specific package version, like some specific version might have come from security updates for example.
The current version of a package can be queried with DepCache.get_candidate_ver() which returns a complex apt_pkg.Version object:

These file objects contain the index files which are associated with a specific package source (a downloaded package index), which could be read out easily (using a for-loop because there could be multiple file objects):

That explains itself: the nano binary package on this amd64 computer comes from httpredir.debian.org/debian testing main.
If a package is “wild” that means it was installed manually, so there is no associated index file to be found, but only /var/lib/dpkg/status (libcudnn5 is not in the official package archives but distributed by Nvidia as a .deb package):

The simple trick now is to find all packages which have only /var/lib/dpkg/status as associated system file (that doesn’t refer to what packages contain), an not an index file representing a package source.
There’s a little pitfall: that’s truth also for virtual packages.
But virtual packages commonly don’t have an associated version (python-apt docs: “to check whether a package is virtual; that is, it has no versions and is provided at least once”), and that can be queried by Package.has_versions.
A filter to find out any packages that aren’t virtual packages, are solely associated to one system file, and that file is /var/lib/dpkg/status, then goes like this:

for package in cache.packages:
if package.has_versions:
version = mydepcache.get_candidate_ver(package)
if len(version.file_list) == 1:
if 'dpkg/status' in version.file_list[0][0].filename:
print(package.name)

On my Debian testing system this puts out a quite interesting list.
It lists all the wild packages like libcudnn5, but also packages which are recently not in testing because they have been temporarily removed by AUTORM due to release critical bugs.
Then there’s all the obsolete stuff which have been installed from the package archives once and then being forgotten like old kernel header packages (“obsolete packages” in dselect).
So this snippet brings up other stuff, too.
Thus, this might be more experimental stuff so far, though.