Creating Hierarchical Data Structure Mashups : Page 2

You can use SQL to connect or "mash-up" hierarchical structures, joining them at or below the root to create larger queryable hierarchical structures.

by Michael M. David

Jan 15, 2009

Page 2 of 3

Dynamic Structure Joining
Combining data structures does not need to be performed beforehand statically. You can dynamically specify the combining of structures. For example, the SQL statement in Figure 3 shows a query that uses the relational and XML views directly to dynamically combine structures. The dynamic query in Figure 3 is a good example of distributing hierarchical data modeling between stored views and users to add flexibility. The more complex relational and XML views that define the basic view structures can still be utilized flexibly. Users can easily join them in an ad hoc or interactive fashion using a single LEFT OUTER JOIN operation. You can interactively join the views in any number of ways; for example, by joining the XML structure over the relational structure instead of under it, or changing their connection points to use other data relationships which can also form other structures.

Figure 3. Dynamic Hierarchical Structure Join: This example shows how a user can create a single join point to combine two complex hierarchical structures.

Hierarchical Optimization

Figure 4. Omitting Unneeded Nodes: The query specifies a list of required nodes, so the optimized SQL can omit the LEFT JOIN on the M node, because it's not needed to reach any of the listed nodes.

You aren't limited to querying complete hierarchical structures. In Figure 4, the SELECT list specifies that only specific nodes should be returned; unselected nodes (such as D and M) are not output. But a simple join of the XML and RDB views would produce unnecessary nodes (such as node M). Fortunately, hierarchical structures can remove unnecessary pathways without affecting the results. This also means that they can automatically support large global views where unnecessary optimized out nodes in the view cause no overhead.

Compared to Figure 3, the query in Figure 4 removes the LEFT OUTER JOIN for the M node, because it's not selected for output, and therefore is not necessary to the query. However, the D node is still required for the query even though it is not selected for output because you need it to navigate to the B node (which is selected for output).
Figure 5 shows the three structure stages that hierarchical data processing can go through using the query from Figure 4. The first structure in Figure 5 defines and joins the basic input structure(s). If semantic structure optimization cannot be applied to the joined structure based on its runtime query, then this is the structure that the query processes. Otherwise, the joined structure is optimized and then processed as indicated in Figure 5. After processing, the result structure can be further modified semantically to remove nodes that were not selected for output, but were required for processing (such as the D node).

The rightmost output structure in Figure 5 shows the D node sliced out—also removed from the internal relational processing result set, causing its child B node to be appended to D's parent (node R). This process is known as node promotion for hierarchical processing and projection for relational processing, which preserves the semantics of the desired output data structure. This natural mapping of the SQL query in Figure 4 to operate hierarchically makes the SQL to XML mapping operation seamless. This is possible because SQL and XML are both operating hierarchically, and the SQL internal rowset result and the external output hierarchical result shown in Figure 5 are both hierarchical. That creates a one-to-one operational mapping—which can also be used to generate XML output automatically. This optimization and automatic XML output has also been achieved in the middleware XML extension mentioned previously.
The automatic XML output processing described above is not currently supported by XML processors on the market. These processors support more of a preformatted static XML output format. But with SQL's dynamic data modeling of hierarchical structures and its structure-aware processing, dynamic XML output has many advantages missing today in the XML industry. Beside hierarchical optimization, these include hierarchical processing accuracy and dynamic control of the output such as the node promotion shown in the Output Structure of Figure 5 which is controlled by the SQL SELECT clause used in Figure 4.

Structure-Aware Processing
A number of advanced operational SQL features and capabilities occur through the processing phases shown in Figure 5 that are not present in other XML query languages. This is possible because the expanded LEFT OUTER JOIN data modeling consists of the hierarchical data modeling instructions that define the full structure being accessed in Figure 4 enabling structure-aware processing. When separate structures are joined into a single hierarchical structure so are their separate LEFT OUTER JOIN sequences, automatically performing this complex real-time processing required for structure-aware processing.
By determining which pathways of the query are not necessary, the unnecessary pathway processing can be removed easily and precisely by eliminating the join operation for the unneeded nodes. This global optimization process is not being performed elsewhere today because XML today uses only user navigation and the processing logic used contains procedural coding instructions as with XQuery's user-specified looping logic, making global optimization difficult. In addition, XML processors like XQuery allow non-hierarchical operations such as the inner join operation which invalidates hierarchical structures.