5.4 Hierarchies

A hierarchy tends to be an organization where there is a directional
one-to-many relationship between parent records and their associated
children. combine works with hierarchies within reference files
when the file has a record for each node and each record points to its
parent in the hierarchy.

Because the hierarchy is assumed to be stored in a reference file, it
is accessed by matching to a data record. Once an individual reference
record has been matched to the data record, its relationship to other
records within the hierarchy is followed through the hierarchy until
there is no further to go.

The standard process is to assume that the key that matched to the
data file key is at the top of the hierarchy. When traversing the
hierarchy, combine looks for the key on the current record in
the hierarchy key of other reference records. This repeats until
there are no further linkages from one record to the next. For each
record that is linked to the hierarchy, that record is treated as a
reference record that matched the data record.

In this section, we’ll use the following hierarchy file. It is a simple
hierarchy tree with ‘Grandfather’ as the top node and 2 levels of
entries below.

If my data file consisted only of a record with the key ‘Grandfather’,
then the following command would result in the records listed after it.
Each record written includes the entry itself and its parent.

We can arrive at the same number of records, each containing the entire hierarchy
traversed to get to the leaf nodes, by using the option ‘--flatten-hierarchy’
(‘-F’). This option takes a number as an argument, and then includes
information from that many records found in traversing the hierarchy, starting from
the record that matched the data record. This example tells combine to report
three levels from the matching ‘Grandfather’ record.

As with other areas within combine, the hierarchy manipulation is extensible
through Guile. The key fields can be modified as with any other fields. See section Field-specific extensions, for details. The matches within the hierarchy
can be further filtered, using the ‘h’ suboption of the option ‘-x’.
(see section Extending combine.) As with matches between reference records and data
this filtering can allow you to perform fuzzy comparisons, to do more complex
calculations to filter the match, or to decide when you have gone far enough and would
like to stop traversing the hierarchy.