CBFS

The wiki is being retired!

Documentation is now handled by the same processes we use for code: Add something to the Documentation/ directory in the coreboot repo, and it will be rendered to https://doc.coreboot.org/. Contributions welcome!

Introduction

This document describes the coreboot CBFS specification (from here
referred to as CBFS). CBFS is a scheme for managing independent chunks
of data in a system ROM. Though not a true filesystem, the style and
concepts are similar.

The CBFS architecture consists of a binary associated with a physical
ROM disk referred hereafter as the ROM. A number of independent of
components, each with a header prepended on to data are located within
the ROM. The components are nominally arranged sequentially, though they
are aligned along a pre-defined boundary.

The bootblock occupies the last 20k of the ROM. Within
the bootblock is a master header containing information about the ROM
including the size, alignment of the components, and the offset of the
start of the first CBFS component within the ROM.

(Note that the master header is currently being removed as part of a significant redesign and modernization of the CBFS structures. See the #Redesign section for more details on the changes being made and where we are in the process.)

Master Header

The master header contains essential information about the ROM that is
used by both the CBFS implementation within coreboot at runtime as well
as host based utilities to create and manage the ROM. The master header
will be located somewhere within the bootblock (last 20k of the ROM). A
pointer to the location of the header will be located at offset
-4 from the end of the ROM. This translates to address 0xFFFFFFFC on a
normal x86 system. The pointer will be to physical memory somewhere
between - 0xFFFFB000 and 0xFFFFFFF0. This makes it easier for coreboot
to locate the header at run time. Build time utilities will
need to read the pointer and do the appropriate math to locate the header.

magic is a 32 bit number that identifies the ROM as a CBFS type. The magic number is 0x4F524243, which is 'ORBC' in ASCII.

version is a version number for CBFS header. cbfs_header structure may be different if version is not matched.

romsize is the size of the ROM in bytes. Coreboot will subtract 'size' from 0xFFFFFFFF to locate the beginning of the ROM in memory.

bootblocksize is the size of bootblock reserved in firmware image.

align is the number of bytes that each component is aligned to within the ROM. This is used to make sure that each component is aligned correctly with regards to the erase block sizes on the ROM - allowing one to replace a component at runtime without disturbing the others.

offset is the offset of the the first CBFS component (from the start of the ROM). This is to allow for arbitrary space to be left at the beginning of the ROM for things like embedded controller firmware.

architecture describes which architecture (x86, arm, ...) this CBFS is created for.

(Note that the master header is currently being removed as part of a significant redesign and modernization of the CBFS structures. See the #Redesign section for more details on the changes being made and where we are in the process.)

Bootblock

The bootblock is a mandatory component in the ROM. It is located in the
last
20k of the ROM space, and contains, among other things, the location of the
master header and the entry point for the loader firmware. The bootblock
does not have a component header attached to it.

(Note that the master header location is currently being removed as part of a significant redesign and modernization of the CBFS structures. See the #Redesign section for more details on the changes being made and where we are in the process.)

Components

CBFS components are placed in the ROM starting at 'offset' specified in
the master header and ending at the bootblock. Thus the total size
available
for components in the ROM is (ROM size - 20k - 'offset'). Each CBFS
component is to be aligned according to the 'align' value in the header.
Thus, if a component of size 1052 is located at offset 0 with an 'align'
value
of 1024, the next component will be located at offset 2048.

Each CBFS component will be indexed with a unique ASCII string name of
unlimited size.

(Note that there are some changes to the header format in the pipeline as part of a significant redesign and modernization of the CBFS structures. See the #Redesign section for more details on the updates and where we are in the process.)

Searching Alogrithm

To locate a specific component in the ROM, one starts at the 'offset'
specified in the CBFS master header. For this example, the offset will
be 0.

From that offset, the code should search for the magic string on the
component, jumping 'align' bytes each time. So, assuming that 'align' is
16, the code will search for the string 'LARCHIVE' at offset 0, 16, 32, etc.
If the offset ever exceeds the allowable range for CBFS components, then no
component was found.

Upon recognizing a component, the software then has to search for the
specific name of the component. This is accomplished by comparing the
desired name with the string on the component located at
offset + sizeof(struct cbfs_file). If the string matches, then the
component
has been located, otherwise the software should add 'offset' + 'len' to
the offset and resume the search for the magic value.

(Note that the first step is currently being changed such that the location of the first component is read from the image's global FMAP. See the #Redesign section for more details on the updates and where we are in the process.)

Data Types

The 'type' member of struct cbfs_file is used to identify the content
of the component data, and is used by coreboot and other
run-time entities to make decisions about how to handle the data.

There are three component types that are essential to coreboot, and so
are defined here.

Stages

Stages are code loaded by coreboot during the boot process. They are
essential to a successful boot. Stages are comprised of a single blob
of binary data that is to be loaded into a particular location in memory
and executed. The uncompressed header contains information about how
large the data is, and where it should be placed, and what additional memory
needs to be cleared.

Stages are assigned a component value of 0x10. When coreboot sees this
component type, it knows that it should pass the data to a sub-function
that will process the stage.

compression is an integer defining how the data is compressed. There are three compression types defined by this version of the standard: none (0x0), lzma (0x1), and nrv2b (0x02, deprecated), though additional types may be added assuming that coreboot understands how to handle the scheme.

entry is a 64 bit value indicating the location where the program counter should jump following the loading of the stage. This should be an absolute physical memory address.

load is a 64 bit value indicating where the subsequent data should be loaded. This should be an absolute physical memory address.

len is the length of the compressed data in the component.

memlen is the amount of memory that will be used by the component when it is loaded.

The component data will start immediately following the header.

When coreboot loads a stage, it will first zero the memory from 'load' to
'memlen'. It will then decompress the component data according to the
specified scheme and place it in memory starting at 'load'. Following that,
it will jump execution to the address specified by 'entry'.
Some components are designed to execute directly from the ROM - coreboot
knows which components must do that and will act accordingly.

Payloads

Payloads are assigned a component value of 0x20. When coreboot sees this
component type, it knows that it should pass the data to a sub-function
that will process the payload. Furthermore, other run time
applications such as 'bayou' may easily index all available payloads
on the system by searching for the payload type.

PAYLOAD_SEGMENT_CODE 0x45444F43 The segment contains executable code
PAYLOAD_SEGMENT_DATA 0x41544144 The segment contains data
PAYLOAD_SEGMENT_BSS 0x20535342 The memory speicfied by the segment
should be zeroed
PAYLOAD_SEGMENT_PARAMS 0x41524150 The segment contains information for
the payload
PAYLOAD_SEGMENT_ENTRY 0x52544E45 The segment contains the entry point
for the payload

compression is the compression scheme for the segment. Each segment can be independently compressed. There are three compression types defined by this version of the standard: none (0x0), lzma (0x1), and nrv2b (0x02, deprecated), though additional types may be added assuming that coreboot understands how to handle the scheme.

offset is the address of the data within the component, starting from the component header.

load_addr is a 64 bit value indicating where the segment should be placed in memory.

len is a 32 bit value indicating the size of the segment within the component.

mem_len is the size of the data when it is placed into memory.

The data will located immediately following the last segment.

Option ROMS

The third specified component type will be Option ROMs. Option ROMS will
have component type '0x30'. They will have no additional header, the
uncompressed binary data will be located in the data portion of the
component.

NULL

There is a 4th component type ,defined as NULL (0xFFFFFFFF). This is
the "don't care" component type. This can be used when the component
type is not necessary (such as when the name of the component is unique.
i.e. option_table). It is recommended that all components be assigned a
unique type, but NULL can be used when the type does not matter.

Redesign

The CBFS system is currently being modernized with the vision of greater adaptability to a variety of use cases. This work is being pursued as part of the Flashmap integration work; as such, it's being committed to the Chromium OS coreboot fork first, then upstreamed to the main repository. There are several important aspects to the design changes:

FMAP

One of the big goals here is to seamlessly support multiple CBFSes per firmware image. This is something that would be very useful to users such as the Chromium OS project that have complex firmware stacks with numerous components and modular update schemes, and having it in upstream coreboot would make the project more scalable to large projects, and consequently more adoptable.

The flashmap format provides the flash chip analog of disks' partition tables, and is therefore well suited to describing where the individual CBFSes live in flash. The idea is that the FMAP will eventually be a mandatory component of a coreboot firmware image, and will be consulted to find the location of the CBFS(es) before reading their structure. The coreboot binaries themselves will have compiled-in knowledge of where the FMAP begins, eliminating the need for expensive runtime flash searches while ideally reducing the number of compiled-in offsets to the one.

Master Header

Most of the information contained in the CBFS master header becomes redundant once a flashmap is present:

romsize: All uses can be replaced with reads of the chip size as stored in FMAP.

bootblocksize: The only reason this is needed is that the bootblock is currently jammed into space carved from that otherwise available to the CBFS. However, once we have an FMAP to describe the image's layout, we can place the bootblock in its own fixed-size FMAP section. On x86, it should always fit in 8K, and on ARM, we can let it be 128K; this space can be carved out of that currently allocated to the (primary) CBFS.

align: In practice, this is always 64, so we can just default to that. There's very little reason to change it and a risk of breakage, so users who really want it to be different can manually update the #define s in cbfstool and coreboot proper.

offset: This "points" to the first file header within the CBFS. This has been necessary on non-x86 because the bootblock occupies the lowest addresses of the CBFS section; however, with the bootblock moved to its own fixed-size FMAP section (see the bootblocksize point), the first entry can always be located at the beginning of each CBFS region, allowing one to find it simply by reading the offset from its FMAP section.

architecture: This is only (decreasingly) used by cbfstool, and the special cases that required it are no longer important to cbfstool once the bootblock is located in its own image region.

Version Coding and Hashing: File Entry Headers

Because the version information was previously stored in the master header, it needs a new place to live before the new type of image comes into use. As such, the following modifications to the per--file entry header format are being implemented:

The checksum field has always been present with a fixed width of 32 bytes, but it has never been used or set to anything besides 0. It's being converted into two unsigned 16-bit fields, version and flags. The latter is not yet used for anything, but is intended to be used as a bitfield of special properties to be defined later.

A new unsigned 32-bit header_len field is being added for the purpose of making it easy for future old versions of cbfstool to skip---or even copy---headers that they are too old to completely understand.

The concept of a hash is being reintroduced as a variable-width field with extensible support for multiple hash algorithms. Its length is dictated by a hash_type field that can be 0 for no hash or any other value representing a hash recognizable to the CBFS driver. It---along with all future variable-width header fields---are to be located directly after the fixed-width fields and immediately followed by the NUL-terminated filename; within this area, their relative positions are dictated by a combination of the header_len field and their individual offset fields (e.g. hash_offset for the hash).

Progress and Future Work

In short: cbfstool support for the new format is landing/on the way, but the build system, coreboot, and libpayload changes still need to be made.

cbfstool

Support for the redesigned image format discussed herein has been implemented in cbfstool with the following user-visible interface changes:

Most actions now accept the -r option for specifying a comma-separated list of which regions to operate on when working with new-format images. If this flag is't provided, they default to the primary CBFS, which will hereafter always be located in a section called COREBOOT.

The create action now has a second form that accepts a compiled FMAP and creates a new-format image instead of what we're now calling the legacy type. During creation is your only chance to initialize designated image regions as CBFSes; by default, only the COREBOOT section, but the -r switch can be used to add others as well.

There's a new layout action for listing the mutable regions of a new-format image. It also accepts a switch to display the read-only regions as well.

There are new read and write actions for working with sections containing raw data that is not part of a CBFS. Overwriting CBFS-containing regions with raw data is not permitted.

With the exception of top-aligned addresses, any positions specified or requested are now relative to the beginning of their image region (rather than the beginning of the entire flash image).

While support for reading from the new image format has not yet been added to coreboot itself or libpayload, this will be a much easier problem for two reasons: these components shouldn't need write support and there's no reason for them to still support reading legacy images once we make the switch. It should be noted that the changes made to cbfstool preserve its backwards compatibility with legacy images: because invoking it in the same way as before continues to create and manipulate legacy images, it is possible for the build system coreboot to continue using the legacy format exclusively until all the necessary pieces are in place to switch over.

Build System

The following build system changes are needed before we can complete a switch to the new image format:

In order to create the new images, one needs to have a flashmap file. We've designed a textual language called fmd ("flashmap descriptor") for describing these files and a compiler to produce them (see Flashmap), but we still need to add template fmd files for all common architecture/flash chip size pairings. These files should be checked in under src/arch/*/, and should be named such that the build system can use $(CONFIG_COREBOOT_ROMSIZE_KB) to decide which one to use. For x86, we'll probably need two sets of fmd files, one for Intel chips that need to be built with the IFD/ME blob combination.

The build system then needs to actually *call* the compiler to produce the FMAP binary from the fmd file. A side effect of this process is a header file containing a #define to the FMAP's offset from the beginning of the firmware image, and since this information is needed during compile time, the FMAP will have to be compiled before the code. The user should be able to override which fmd file is used via Kconfig to allow custom configurations.

The CONFIG_FLASHMAP_OFFSET options needs to be replaced with the #define in the header generated when the fmd file is compiled down to an FMAP. Other unnecessary Kconfig keys should also be removed while we're at it: for instance, CONFIG_CBFS_SIZE comes to mind as one that should be read from the FMAP at runtime rather than set redundantly in Kconfig.

We'll need Make to actually build the new style of image. It'll need to pass in the generated FMAP, a list of the sections that will contain CBFSes (which can also be obtained when the fmd file is compiled), and explicitly add the bootblock to the appropriate individual region, choosing whether to bottom- or top-align it based on the architecture.

It would be nice to have a pluggable post-packaging step where the user can optionally use Kconfig to specify a script that should be run on the image to make any necessary alterations or customizations as soon as the normal build process has completed. As an example, such scripts' responsibilities might include adding binary data to raw image regions or copying stages and other files into secondary CBFSes within the same image.

When all the plumbing is in place (including a relevant postprocessing script), all that will need to be done to produce an image with multiple CBFSes is to modify the fmd file to annotate more section(s) as "(CBFS)."

coreboot/libpayload

The main coreboot code and libpayload need to be updated to be able to read from the new images at all:

In order to find the CBFS, they need to know to search the FMAP for a COREBOOT section instead of expecting a CBFS master header pointer. This should be relatively easy to implement in adurbin's new region-based FMAP/CBFS API, but getting it into libpayload (ideally without code duplication) will take additional thought. Also be warned that on some platforms (or just x86?), the first CBFS scan happens in assembly; the code for this is at src/arch/x86/lib/walkcbfs.S, and nothing apparent will happen until it's at least limping along.

They need to be updated to cope with the length-determination mechanism of the revised headers once those have been upstreamed.

The above should be (close to?) sufficient to get things *running*, but to get multiple CBFSes to work, there'll need to be additional changes to the CBFS API. When designing that new interface, it'll be important to consider use cases that involve copying stages between CBFSes: for instance, Chromium OS copies the ramstage (and romstage, on arm) from the main, permanently read-only CBFS to two updateable sections. If we want to continue allowing identical relocatable code to run from multiple parts of the image, the API will need to be stateful (i.e. remember which CBFS it's currently reading from). If this isn't something we want to continue "supporting," it might be okay to require the caller to specify the desired CBFS at every read call.

Remaining Design Work

The major remaining shortcoming is support for taking a structural hash of an entire CBFS. If such a hash were hierarchical and hashed the headers (including hashes and filenames) of all contained files, it would provide a guarantee that that entire portion of the image had been read correctly and hadn't been tampered with. Here are some design considerations, all of which are still open questions:

How should the hashes be concatenated? Hash extension is the easiest and most space efficient; however, we should keep in mind that a faulty (or malicious?) memory controller might not feed us the same data each time we request it, so if we proceed in that way, we'll need to rehash at least all preceding headers each time we read one back in from storage.

Where should the resulting hierarchical hash be stored? This has proven to be somewhat of a contentious issue, with some people believing it should be placed in a new---but completely different---"master header" for the CBFS, and others holding it should just go in a file with a specific name or its own raw section. A related question is which component should be responsible for checking the structural hash: if it's stored in a special header, the CBFS driver would be able to automatically check the whole image just as it will verify individual files based on their headers' stored hashes. Otherwise, vboot or user-specific custom code might have to do the verification, which could result in less code reuse, worse adoptability, and a messier and more poorly encapsulated CBFS read API (that, for instance, might require passing in a separately cached comparison hash every time a read is requested).

How do we skip some files, if that's something we want to support? Maybe null-type and deleted headers shouldn't be included in the hash? Maybe there should also be a flag that can be set to exclude certain files? (The latter would be necessary if we were to store the hierarchical hash in a file within the checksummed CBFS itself.)

How do we guard against malicious headers (e.g. those with really long filenames intended to make the hashing hang, overflow, or otherwise barf?