One of the things I work on is
Tecken which runs
Mozilla Symbols Server. It's a server that handles
Breakpad symbols files upload, download, and stack symbolication.

Bug #1614928 covers adding
line numbers to the symbolicated stack results for the symbolication API. The
current code doesn't parse line records in Breakpad symbols files, so it
doesn't know anything about line numbers. I spent some time looking at how much
effort it'd take to improve the hand-written Breakpad symbol file parsing code
to parse line records which requires us to carry those changes through to the
caching layer and some related parts--it seemed really tricky.

"The stack" is an array of addresses in memory corresponding to the value of
the instruction pointer for each of those stack frames. You can use the module
information to convert that array of memory offsets to an array of [module,
module_offset] pairs. Something like this:

What you really want is a human-readable stack of function names and files and
line numbers. Then you can go look at the code in question and start your
debugging adventure.

When the program is compiled, the act of compiling produces a bunch of compiler
debugging information. We use dump_syms to extract the symbol information
and put it into the Breakpad symbols file format. Those files get uploaded to
Mozilla Symbols Server where they join all the symbols files for all the builds
for the last 2 years.

Symbolication takes the array of [module, module_offset] pairs, the list of
modules in memory, and the Breakpad symbols files for those modules and looks
up the symbols for the [module, module_offset] pairs producing symbolicated
frames.

That helped, but I had questions those didn't answer. I have an intrepid
freshman understanding of Rust, so I ended up reading the code, tests, and
examples.

The one big thing that tripped me up was that Symbolic can't parse Breakpad
symbols files from a bye stream--they need to be files on disk. Tecken doesn't
store Breakpad symbols files on disk--they're in AWS S3 buckets. So it
downloads them and parses the byte stream. In order to use Symbolic, we'll have
to adjust that to save the file to disk, then parse it, then delete the file
afterwards. 1

Anyhow, here's some sample annotated code using Symbolic to do symbol lookups:

importsymbolic# This is a Breakpad symbols file I have on disk.archive=symbolic.Archive.open("XUL/75A79CFA0E783A35810F8ADF2931659A0/XUL.sym")# We do debug ids as all-uppercase with no hyphens. However, symbolic# requires that get normalized into the form it likes.debug_id=symbolic.normalize_debug_id("75A79CFA0E783A35810F8ADF2931659A0")# This parses the Breakpad symbols file and returns a symcache that we can# look up addresses in.obj=archive.get_object(debug_id=ndebug_id)symcache=obj.make_symcache()# Symbol lookup returns a list of LineInfo objects.lineinfos=symcache.lookup(0xf5aa0)print("line: %s symbol: %s"%(lineinfos[0].line,lineinfos[0].symbol))

Further, Symbolic parses files of a variety of other debug binary formats.
This could be handy for skipping the intermediary Breakpad symbol file and
using the debug binaries directly. More on that idea later.

Tecken is maintained by a team of two and we have other projects, so it spends
a lot of time sitting in the corner feeling sad. Meanwhile, Symbolic is
actively worked on by Sentry and a cadre of other contributors including
Mozilla engineers because it's one of the cornerstone crates for the great Rust
rewrite of Breakpad things. That's a big win for me.

So then I built a prototype

Today, I threw together a web app that does symbolication using Symbolic and
called it Sherwin Syms.

Building a separate prototype gives me something to tinker with that's not in
production. I was able to add line number information pretty quickly. I can
experiment with caching on disk. I can compare the symbolication API output for
stacks between the prototype and what the Mozilla Symbols Server produces.

There's a lot of scaffolding in there. The Symbolic-using bits are in this
file:

Next steps

I need to integrate this into Tecken. I think that means writing a new v6 API
view because the v4 and v5 code is tangled up with downloading and caching.

Markus and Gabriele suggested Tecken skip Breakpad symbols files and instead
use the debug binaries directly. Breakpad symbols files don't have symbols for
inline functions, so they lose that information--using the debug binaries would
be better. I hope to look into that soon.

Summary

That summarizes the week I spent with Symbolic.

Want to comment? Send an email to willkg at bluesock dot
org. Include the url for the blog entry in your comment so I have
some context as to what you're talking about.