These files/links are added by the debugedit (<code>/usr/lib/rpm/debugedit</code>) and find-debuginfo.sh (</code>/usr/lib/rpm/find-debuginfo.sh</code>) programs which make sure every executable and shared library (and the separate <code>.debug</code> debuginfo packages) have Build-IDs embedded and that the links above are added under /usr/lib/debug/.debug-id.

+

These files/links are added by the debugedit (<code>/usr/lib/rpm/debugedit</code>) and find-debuginfo.sh (<code>/usr/lib/rpm/find-debuginfo.sh</code>) programs which make sure every executable and shared library (and the separate <code>.debug</code> debuginfo packages) have Build-IDs embedded and that the links above are added under /usr/lib/debug/.debug-id.

This makes it extremely easy to find the executable or shared library

This makes it extremely easy to find the executable or shared library

Revision as of 12:27, 17 May 2010

The main page for this idea is Summer Coding 2010 ideas - Universal Build-ID.

Status: "Idea"

Summary of idea: Extend the Build-ID support to make it more universally usable.

More information

Summary

Build-IDs are currently being put into binaries, shared libraries, core files and related debuginfo files to uniquely identify the build a user or developer is working with. There are a couple of conventions in place to use this information to identify "currently running" or "distro installed" builds. This helps with identifying what was being run and match it to the corresponding package, sources and debuginfo for tools that want to help the user show what is going on (at the moment mostly when things break). We would like to extend this to a more universial approach, that helps people identify historical, local, non- or cross-distro or organisational builds. So that Build-IDs become useful outside the current "static" setup and retrain information over time and across upgrades.

Build-ID background

Build-IDs are unique identifiers of "builds". A build is an executable, a shared library, the kernel, a module, etc. You can also find the build-id in a running process, a core file or a separate
debuginfo file.

The main idea behind Build-IDs is to make elf files "self-identifying".
This means that when you have a Build-ID it should uniquely identify a final executable or shared library. The default Build-ID calculation (done through ld --build-id, see the ld manual) calculates a sha1 hash (160 bits/20 bytes) based on all the ELF header bits and section contents in the file. Which means that it is unique among the set of meaningful contents for ELF files and identical when the output file would otherwise have been identical. GCC now passes --build-id to the linker by default.

When an executable or shared library is loaded into memory the Build-ID will also be loaded into memory, a core dump of a process will also have the Build-IDs of the executable and the shared libraries embedded. And when separating debuginfo from the main executable or shared library into .debug files the original Build-ID will also be copied over. This means it is easy to match a core file or a running process to the original executable and shared library builds. And that matching those against the debuginfo files that provide more information for introspection and debugging should be trivial.

Getting Build-IDs

A simple way to get the build-id(s) is through eu-unstrip (part of elfutils).

build-id from an executable, shared library or separate debuginfo file:

$ eu-unstrip -n -e <exec|.sharedlib|.debug>

build-ids of an executable and all shared libraries from a core file:

$ eu-unstrip -n --core <corefile>

build-ids of an executable and all shared libraries of a running process:

$ eu-unstrip -n --pid <pid>

build-id of the running kernel and all loaded modules:

$ eu-unstrip -n -k

Build-IDs are the bits, not the hex-stringAlthough in the examples above the Build-ID is always represented as a 20 character hex-string, this is just a representation. A Build-ID is any number of bytes, not fixed at 20 (160 bits) or any other number. Specs and formats should be open to varying sizes, though optimize for any given producer (vendor/distro, OS toolchain, etc.) using a single size for all its IDs.

Current conventions and usage

Build-IDs are as useful as the methods we build around them to look things up based on them.

The convention that is currently being used by Fedora (and which has
been adopted by the upstream GNU toolchain in for example GDB to find files)
is to include a link in the debuginfo package that points to the elf file and the
debuginfo file under /usr/lib/debug/.build-id/XX/YYYY (where XX are the first two
hex-digits of the build id and YYYY are all the others).

So for example the bash-debuginfo package has the following files/links:

These files/links are added by the debugedit (/usr/lib/rpm/debugedit) and find-debuginfo.sh (/usr/lib/rpm/find-debuginfo.sh) programs which make sure every executable and shared library (and the separate .debug debuginfo packages) have Build-IDs embedded and that the links above are added under /usr/lib/debug/.debug-id.

This makes it extremely easy to find the executable or shared library
and the corresponding debuginfo just given the build-id. If they are
installed on your system.

Since these are files included in the rpm package, it also makes it
easy to find the package that provided the executable/library, that
corresponds to the build id (gdb and systemtap will suggest the right
debuginfo package to install based on the build-id they found for the
program you wanted to introspect). You can ask yum to install it, or
use repoquery to figure out the details of the package and binary
involved.

But this is only for the latest current/up-to-date installed
repository. There is no support for historical information, local
builds, cross-distro, etc. Extending the usefulness of having
build-ids is what this idea is about.

How do we scale this up/down? The actual Universial Build-IDs idea

The target is that when you get a build-id for something (anything
really, an old executable, a core file once made but never fully
investigated, some currently running process that needs to be
introspected but that has had its libraries upgraded on disk
already) and mapping it to the original developer, "creator",
package, distributor, executable, sources, debuginfo files, etc.

Up in fedora, what about getting "historical" mappings?

Up towards other distributions (packagekit?)

Up towards a general build-id mapping universe (build-id.org).

Generic registration, querying and mapping of build-ids

Down towards to local database for lone developer.

Or an local shop that builds upon an existing distro, but also has (internal) apps in their organization.

To totally disorganized "installs" where people move around executables all the time (inotify/updatedb).

How do we "proxy" this information between the different layers, so tools can have one query mechanism that works for any build-id that they happen to come across.