CFI with Abbrevs

The idea here is to provide a more general description for CFI information based
on DIEs and attributes. Any future need for more information in the header for
a CFI entry can be encoded as a new attribute without breaking older consumers.
Similarly, a producer can extend the information in the CFI without fear of
breaking consumers that may not understand the extension. Of course, this
benefit only starts after the release of a standard using this change, as it is
totally incompatible with the DWARF 3 standard.

6.4.1: Structure of Call Frame Information

...This table would be extremely large... [all preceding and paragraph as
before]

The virtual unwind information is encoded in two self-contained sections called
.debug_frame_info and .debug_frame_abbrev. Entries in a .debug_frame_info are
Frame Description Entries (FDEs), which are specialized Debugging Information
Entries with tag DW_TAG_frame_info.

If the range of code addresses for a subprogram is not contiguous, there
may be multiple FDEs corresponding to the parts of that subprogram.

An FDE may contain any of the following attributes:

* DW_AT_frame_version whose value is a constant version number specific to the
call frame information and independent of the DWARF version number (see
Appendix F).
* DW_AT_code_alignment_factor whose value is a constant that is factored out of
all advance location instructions (see below). If the attribute is not
present, the code alignment factor is 1.
* DW_AT_data_alignment_factor whose value is a constant that is factored out of
all offset instructions (see below). If the attribute is not present, the
data alignment factor is 1.
* DW_AT_return_address_register whose value is a constant that indicates which
column in the rule table represents the return address of the function. Note
that this column might not correspond to an actual machine register. If this
attribute is not present, the return address column is 0.
* DW_AT_low_pc and DW_AT_high_pc whose values encode the contiguous address
range described by the FDE (see Section 2.17).
* DW_AT_initial_instructions whose value is a block containing a sequence of
rules that are interpreted to create the initial setting of each column in
the table.
The default rule for all columns before interpretation of the initial
instructions is the undefined rule. However, an ABI authoring body or a
compilation system authoring body may specify an alternate default value for
any or all columns.
This attribute is distinct from the DW_AT_instructions attribute so that
it can appear in a separate DW_TAG_frame_info entry referenced by a
DW_AT_frame_info attribute, and so that the initial state that it defines can
be used by the DW_CFA_restore instruction.
* DW_AT_instructions whose value is a block containing a sequence of table
defining instructions that are described below.
* DW_AT_frame_info whose value is a reference to another FDE which contains
additional frame information not present in this FDE. A common use for this
attribute is to reference an FDE which contains attributes that apply to a
number of FDEs such as DW_AT_cie_version, DW_AT_code_alignment_factor,
DW_AT_data_alignment_factor, and DW_AT_return_address_register. Such an
FDE would have been a CIE in DWARF 3.

Additional FDE attributes may be defined by an ABI authoring body or a
compilation system authoring body.

7.23: Call Frame Information

The Call Frame Information is encoded in the same fashion as Debugging
Information, as described in section 7.5. However, the Call Frame Information
is located in the .debug_frame_info and .debug_frame_abbrev sections, which are
analogous to the .debug_info and .debug_abbrev sections, respectively.
Normally, they will contain only DW_TAG_frame_info and null entries. (Also,
normally, the .debug_info and .debug_abbrev sections will not contain
DW_TAG_frame_info entries.)

The value of the DW_AT_frame_version version number is 4 (see Appendix F).

Call frame instructions are encoded... [as before]

Figure 18: Tag Encodings:

[Add:]
DW_TAG_frame_info [The value thereof should be the current last tag + 1]

The .debug_frame section is gone, replaced by .debug_frame_info and
.debug_frame_abbrev. I considered calling .debug_frame_info .debug_frame, but I
was concerned that the totally different format might confuse DWARF 2/3
consumers.

The length field was omitted because it can be determined from the abbreviation
associated with the FDE and from the block size for the instructions in each of
those. This does require a bit more reading to skip a CIE or FDE. Or, a
producer could use a DW_AT_sibling attribute to allow a consumer to walk the
FDEs even faster. This issue could be revisited if this is deemed a problem.

The cie_id is not necessary because there is no distinction between a CIE and an
FDE any more.

The DW_AT_cie_pointer attribute can use the DW_FORM_refn forms to eliminate
relocations.

The FDE uses the DW_AT_low_pc and DW_AT_high_pc attributes. This is predicated
on the acceptance of the "DW_AT_high_pc encoded as a constant offset from the
DW_AT_low_pc" proposal which allows the elimination of the relocation entry for
the DW_AT_high_pc. This is important because the old FDE encoding required a
relocation entry for only the "initial location" (DW_AT_low_pc counterpart) and
not for the "address range" (DW_AT_high_pc counterpart).

The augmentation string is gone. Producers now can define their own attributes
instead of relying on odd augmentation strings. This provides greater
compatibility. Currently, if a producer uses an augmentation string, it may
imply that a CIE that contains the string or an FDE that references such a CIE
may have additional header fields. As a result, consumers cannot interpret
those CIEs or FDEs at all. With an attribute-based approach, a consumer can
ignore only those attributes that it doesn't understand and need not worry about
become desynchronized with the stream of bytes because every attribute has a
form which implies its size. (This is just like ordinary DIEs.)

The .debug_frame section would have compilation unit headers, just like
.debug_info.

The DW_AT_initial_instructions exists primarily so that DW_CFA_restore remains
meaningful. Were it not for that, DW_AT_instructions could have been used for
both with some rules about interpreting instructions from a
DW_AT_frame_info-referenced FDE before interpreting instructions from the
current FDE.

Any padding required can be done with DW_CFA_nop instructions in the
DW_AT_initial_instructions and DW_AT_instructions blocks.

I'll take care of these after the plan starts to settle down and it's time to
produce a final proposal.

====================================================================

Further discussion:

The issue of nesting for DIEs in these sections was brought up. It was suggested that that could be used for "fallback" information instead of following an attribute. Also, with either approach, the issue of how contradictions between a DIE and a referenced (or parent) DIE are resolved. This is particularly an issue for DW_AT_instructions. My comments on this were:

I envisioned no nesting in the .debug_frame_info section. The proposal defines
no meaning for nesting, and so I figure any nesting would be semantically
neutral.

The general problem you bring up about attributes that exist in the "leaf"
DW_TAG_frame_info and any referenced DW_TAG_frame_info seems analogous to our
handling of abstract & concrete DIEs. In that case, there is no direct
statement about the meaning of the DWARF if a concrete DIE has an attribute that
contradicts one in its abstract counterpart. I would imagine that the one from
the concrete DIE would win, but I can find no statement to that effect. Rather,
it is stated that the concrete DIE merely omits attributes because they are
present in the abstract DIE. I figure the implication is that a contradiction
between the abstract & concrete DIEs would be considered bad Dwarf.

I can see a number of solutions:

1) Reformulate this in terms similar to that for abstract/concrete DIEs, so
that attributes are omitted from frame_info's if they are specified in a
frame_info referenced by a DW_AT_frame_info (transitively). This leaves a
contradiction undefined, but with the implication of bad Dwarf again.
2) Indicate that any attributes in a frame_info override any same-named
attributes in another frame_info referenced by DW_AT_frame_info.
3) Indicate that any attributes in a frame_info override any same-named
attributes in another frame_info referenced by DW_AT_frame_info, except
for DW_AT_initial_instructions and/or DW_AT_instructions, in which case
the instructions are appended to those from the referenced frame_info
(transitively).

Personally, I favor approach #2. I don't see contradictions happenning often,
and this description is simple and flexible.

====================================================================

These are comments from the meeting, transcribed directly from the minutes:

Should Todd proceed with this, turning it into a full-fledged
proposal, working out all the parts of the document which need
to be changed to accommodate?

Someone would like to see some discussion of what we'll do for
old consumers who aren't prepared to read this new format.
He doesn't think we need a comprehensive list of all the little
places that change in the std to have a discussion about it.

Does anyone know offhand what the gcc augmentation strings
do? Most of the augmentations exist to specify the size of
different objects in the frame info (how big a pointer is, whether
a given pointer value is relative to the frame or is absolute, etc).

David Anderson observes that this mostly appears in the eh_frame
information. Todd agrees that most of what they've added has been
added to the eh_frame information.

Should we incorporate exception handling & unwinding in DWARF?
Is there any value in trying to incorporate description of the
EH section into DWARF? General feeling was that this was outside
the scope of DWARF as a debugging format.

Todd would be inclined to see if the gcc developers would be inclined
to moving towards an abbrev/info format like we're looking at.

DWARF has the property that old consumers can skip information produced
by new producers - and that's what we want to address in this proposed
change.

Michael asks if a new consumer would really be able to understand
frame info if it doesn't understand all the attributes? Jim argues
yes - that it will behave like the debug_info section does today.

John DelSignore points out that consumers will have to consume both
old debug_frame and new frame+abbrev informations (programs will
undoubtedly have a mixture of frame infos within the program

Michael asks what the impact of doing something like this will be.
Is it big work? Little work? Small work for big benefit?
Jim Blandy says that it's a question of how much value you
place on making the format flexible.

Andrew says that the alternative is to add an augmentation which
specifies the address size. But consumers that don't know that
augmentation will ignore the debug_frame info altogether.

Would it be less incompatible if we defined the augmentation strings
so they were somehow self describing, or their size was self
describing? Jim says that this is what the abbrev format is.

Andrew suggests that the augmentation is for compression.

In terms of the spec, Jim suggests that making the debug frame
info/abbrev-based, he thinks it will reduce the size of the spec.
Instead of having a different format used in the frame info.

Michael is less worried about the size of the spec than the ease of
understanding the frame info - he thinks the info/abbrev style format
would be easier to understand.

Todd notes that a lot of the CIE would remain as it is.

One person suggests that the format change seems like a lot of
trouble for an intangible gain.

Jim says that the original impetus for this was the address size
issue. Then there's the segmentation discussion. Maybe what
we should do is say OK, here's an idea, we know how it would work;
we wait until we have something exciting that we want to change in
the frame info and when someone says Gee, I wish I could do this
with CFI, then we can roll out our info+abbrev style CFI.

Isn't it a possibility here to pick up & document the augmentations
used by the eh_frame and allow them to be used in the DWARF frame?
Those have the capabilities for describing the different pointer
sizes and addressing modes. This might be added to the website
or wiki, rather than part of the standard.

Jim has been uninterested in documenting the augmentation strings
because in practice the augmentation strings are not so different
from version numbers; in practice, when you see a character,
either you recognize it and can consume the information or you
don't and you have to stop reading. In practice there isn't that
much difference between adding a new character in the augmentation
and simply bumping the version number.

So in the specific case of describing the target address size, instead
of adding it to the augmentations, he'd be just as happy to bump
the CFI version number and describe it in a header.

John suggests (Bill White suggested in the past) that we make the
CFI format the same but we make the augmentation field extensible
- maybe with a DWARF1 style abbrev+info+value. This format could
be indicated via a new augmentation character in the existing format.
There is a lot of agreement that this could be the best approach
impacting consumers the least amount while allowing for future
extensibility.

Where do we go: A proposal to add address size; a proposal to move
to abbrev+info for the whole section; a proposal to use a structured
format augmentation format? Or leave it alone altogether?

Jim argues for some way to, at least, specify an address size in
the CIE.

Michael thinks that adding address size information would be a good
thing.

Given that we're talking about DWARF 4, John Bishop would give a weak
Yes vote given that we'll be doing a major revision.

Michael summarizes by saying that adding address information is
good - he's not hearing a lot of support for adding a new abbrev+info
format for augmentation, or revising the format entirely with an
abbrev+info format.

====================================================================

Although many people like this idea and said that, if we were designing this functionality from scratch, this would be a good design, it doesn't seem that there's enough bang for the buck to redesign it in this fashion at this late date.