The “VM SIZE” column tells you how much space the binary will take when it is loaded into memory. The “FILE SIZE” column tells you about how much space the binary is taking on disk. These two can be very different from each other:

Some data lives in the file but isn't loaded into memory, like debug information.

Some data is mapped into memory but doesn't exist in the file. This mainly applies to the .bss section (zero-initialized data).

The default breakdown in Bloaty is by sections, but many other ways of slicing the binary are supported such as symbols and segments. If you compiled with debug info, you can even break down by compile units and inlines!

Each line shows the how much each part changed compared to its previous size. Most sections grew, but one section at the bottom (.debug_str) shrank. The “TOTAL” line shows how much the size changed overall.

Hierarchical Profiles

Bloaty supports breaking the binary down in lots of different ways. You can combine multiple data sources into a single hierarchical profile. For example, we can use the segments and sections data sources in a single report:

Bloaty displays a maximum of 20 lines for each level; other values are grouped into an [Other] bin. Use -n <num> to override this setting. If you pass -n 0, all data will be output without collapsing anything into [Other].

Debugging Stripped Binaries

Bloaty supports reading debuginfo/symbols from separate binaries. This lets you profile a stripped binary, even for data sources like “compileunits” or “symbols” that require this extra information.

Bloaty uses build IDs to verify that the binary and the debug file match. Otherwise the results would be nonsense (this kind of mismatch might sound unlikely but it‘s a very easy mistake to make, and one that I made several times even as Bloaty’s author!).

If your binary has a build ID, then using separate debug files is as simple as:

Bloaty does not currently support the GNU debuglink or looking up debug files by build ID, which are the methods GDB uses to find debug files. If there are use cases where Bloaty‘s --debug-file option won’t work, we can reconsider implementing these.

Mach-O

Mach-O files always have build IDs (as far as I can tell), so no special configuration is needed to make sure you get them.

Mach-O puts debug information in separate files which you can create with dsymutil:

Configuration Files

Any options that you can specify on the command-line, you can put into a configuration file instead. Then use can use -c FILE to load those options from the config file. Also, a few features are only available with configuration files and cannot be specify on the command-line.

The configuration file is a in Protocol Buffers text format. The schema is the Options message in src/bloaty.proto.

The two most useful cases for configuration files are:

You have too many input files to put on the command-line. At Google we sometimes run Bloaty over thousands of input files. This can cause the overall command-line to exceed OS limits. With a config file, we can avoid this:

For custom data sources, it can be very useful to put them in a config file, for greater reusability. For example, see the custom data sources defined in custom_sources.bloaty. Also read more about custom data sources below.

Data Sources

Bloaty has many data sources built in. These all provide different ways of looking at the binary. You can also create your own data sources by applying regexes to the built-in data sources (see “Custom Data Sources” below).

While Bloaty works on binaries, shared objects, object files, and static libraries (.a files), some of the data sources don't work on object files. This applies especially to data sources that read debug info.

Segments

Segments are what the run-time loader uses to determine what parts of the binary need to be loaded/mapped into memory. There are usually just a few segments: one for each set of mmap() permissions required:

Sections

Sections give us a bit more granular look into the binary. If we want to find the symbol table, the unwind information, or the debug information, each kind of information lives in its own section. Bloaty's default output is sections.

You can control how symbols are demangled with the -C MODE or --demangle=MODE flag. You can also specify the demangling mode explicitly in the -d switch. We have three different demangling modes:

-C none or -d rawsymbols: no, demangling.

-C short or -d shortsymbols: short demangling: return types, template parameters, and function parameter types are omitted. For example: bloaty::dwarf::FormReader<>::GetFunctionForForm<>(). This is the default.

-C full or -d fullsymbols: full demangling.

One very handy thing about -C short (the default) is that it groups all template instantiations together, regardless of their parameters. You can use this to determine how much code size you are paying by doing multiple instantiations of templates. Try bloaty -d shortsymbols,fullsymbols.

Input Files

When you pass multiple files to Bloaty, the inputfiles source will let you break it down by input file:

You are free to use this data source even for non-.a files, but it won't be very useful since it will always just resolve to the input file (the .a file).

Compile Units

Using debug information, we can tell what compile unit (and corresponding source file) each bit of the binary came from. There are a couple different places in DWARF we can look for this information; currently we mainly use the .debug_aranges section. It‘s not perfect and sometimes you’ll see some of the binary show up as [None] if it's not mentioned in aranges (improving this is a TODO). But it can tell us a lot.

Inlines

The DWARF debugging information also contains “line info” information that understands inlining. So within a function, it will know which instructions came from an inlined function from a header file. This is the information the debugger uses to point at a specific source line as you're tracing through a program.

If we want to bucket all of these by which library they came from, we can write a custom data source. It specifies the base data source and a set of regexes to apply to it. The regexes are tried in order, and the first matching regex will cause the entire label to be rewritten to the replacement text. Regexes follow RE2 syntax and the replacement can refer to capture groups.