Related pages

The RDBMS datamodel

Relational database management systems (RDBMS) are probably the most well-known and standardised form for data persistence, with a standardised access mechanism (Standard Query Language or SQL), and are likely to cover a large fraction of the existing data sources of interest.

Relational databases are constructed using tables.

A table stores information about entities of the same kind

Each row of a table contains information describing a single entity.

Each column of a table has a datatype and label.

Tables are an efficient and natural representation of (perhaps large) homogeneous datasets. The complete information in a database may be stored in several tables in order to separate out ("normalise") information related to different entities, so that they can be combined in different ways in a report. Relationships between entities in different tables are asserted using keys, which are stored in specially designated columns within the tables. The rules for the organisation of information in tables are known as the relational database schema, or SQL schema.

RDBMS support "ad-hoc" queries, in which arbitrary combinations of data drawn from the tables can be combined, providing

the tables carry the necessary keys, and

that an SQL expression can be written that describes the query.

Of course most database users do not write raw SQL and therefore do not avail themselves of the ad-hoc capability.
Rather access to a RDBMS is usually via a form with a limited combination of fields being queried.
This "parameterised template" models the most common operations for the particular database, using stored queries ("hard-coded" SQL) behind the scenes.

The XML datamodel

In contrast, the fundamental structure of an XML document is a �tree�, with the atoms of information held on its �leaves�. XML documents are well suited for �semi-structured� information such as textual data, and for transport of self-describing and (potentially) inhomogeneous extracts. The tree structure maps more naturally (though not exactly) onto an object-oriented view of information than the relational view.

In hierarchically structured data the role of the query is replaced by the notion of a path. The user typically examines information in tree-oriented data structure by navigation. In XML the standard syntax for this is XPath, which maps a path from a point within the document (the "context node") to some related position. For example,

/FeatureCollection/featureMembers/Fault[@id="D345"]/length

selects the length element of the Fault whose id attribute has the value of "D345", where the Fault is nested within a featureMembers child of a FeatureCollection element.

A variety of operators are available to build XPath expressions. However, the main features are

information that is on the same branch of the tree as the context node may be addressed more efficiently than information on a different branch. Arbitrary searching within a hierarchical structure involves multiple traversals up and down the tree, and may be inefficient in the absence of an overlaid index.

several different XPath expressions may resolve to the same "node" (element or attribute) within an XML document.

Object modelling

Example of a concrete feature type: Road:

The notation used here to illustrate the feature model is UML - the Unified Modelling Language - which is widely used in object-oriented data modelling. The Feature Model concepts are closely related to object-oriented data modelling, involving inheritance, polymorphism, labelled associations, etc, and GML documents may be modelled using UML. In fact, in modelling application domains ISO 19103 requires the use of a conceptual schema language that supports object-oriented principles, and strongly recommends UML. Of course, since XML is a static data encoding, "operations" or "methods" are not available in GML descriptions of features. But apart from this, the Feature Model can be understood to be just an application of object-modelling.

Mapping between styles

Objects and Markup

Because of their "self-describing" nature, markup languages capture information from both instance and model meta-levels in a single document.

A model-driven approach (i.e. develop the UML model first, then let everything follow from that) is required for strict conformance with the ISO 19100 series.
Regular rules are available for mapping from the UML meta-model for static classes to GML - see GmlImplementation.

Tables and objects

The entity-relationship (E-R) method used commonly in the design of RDBMS systems has many similarities with UML.
However, as well as lacking an inheritance model, the semantics of E-R relationships are not differentiated into the range of association types available in UML.
Furthermore, a very significant omission in E-R is the absence of meaningful labels on relationships.
Explicit property names, including properties representing relationships between complex objects, are a key aspect of GML.
Thus, mapping from E-R to GML is less direct if properties are represented by relationships between tables as well as attributes within tables.

A simple mapping involves equating rows from relational tables (�entities�) with objects. The table columns correspond to the attributes within a class definition, and relational joins to associations between classes. For example, Road features might be recorded in a table like this:

Mapping 1

Roads

id

description

name

nLanes

number

surfaceTreatment

destination

destination

pavement

centreLine

R456

The road to Gundagai

Hume Highway

4

H31

bitumen

G6421

G6423

S789

C123

R457

The road to Ettamogah

...

1

X96

dirt

Note that in this example we have chosen to record the values of destination, pavement and centreLine in other tables, and the values shown here are foreign keys to rows in those tables.

The details of how the object-table mapping is done can have a huge impact on performance. In particular, object models typically involve many classes with many associations between them, and following a naive recipe will probably result in an explosion of tables and joins. A full automated mapping, though theoretically possible, is not generally practical.

Note, however, that the scope of WFS means that building WFS on top of RDBMS does not require solution supporting a fully general query model. WFS limits interest to information quanta corresponding to the Feature Types identified in its capabilities. These Feature Types implicitly define the "parameterised template" for queries.

Tables and markup

Comparing the example XML view with the table view, we find that a table within a relational system could be mapped directly to a �complexType� element as defined by the XML Schema, i.e. a row in the table is serialised as an XML element whose content is a sequence of elements, each corresponding to one cell. Where the cell supports a join with another table, within the XML document this can lead to the corresponding sub-element either

containing further sub-elements in turn, reflecting the columns in that table, etc.

carrying an explicit link to an object identified by a URI.

By following the pattern shown, XML could be generated automatically from a relational database. The result would be documents whose structure directly reflects the table structure or RDBMS schema. The corollary is that, in order to generate XML documents that have a predefined structure in a simple way, the table structure must be reverse-engineered from the XML schema.

Since the mapping from Objects to XML is almost direct, and the table-object mapping has been studied rigorously, there may be some benefit in using an intermediate object layer in the table to XML conversion.

Rob's 2c worth

Relational and Object Views

It is common to find people having an argument over which of these is the "right way". As with many "wars" of this kind the answer is both, it depends on the purpose. One useful comparison I picked up a few years ago is the following:

A relational view is like owning a set of cars and when you park the cars in the garage you take them apart and put all the different parts into carefully marked jars. All the rims for all the cars go in the Rims jar, all the seatbelts for all the cars go in the SeatBelts jar, and so forth.

An object view is like owning a set of cars and when you park the cars in the garage you leave them in one piece.

The difference comes when you use them:

How many Rims do my cars use?

Relational view requires you to look in the Rims jar and count them

Object view requires you to take apart the cars, one at a time, and count rims to get the total.

How many cars do I own?

Relational view requires you to go to every jar and pull out the parts belonging to the first car, then the same for the second, then the third, and so forth until you run out of parts.

Object view simply means counting the cars

Yes, this is something of a simplification but it does illustrate the difference in query styles and the effect on performance.

For the purposes of the M1.5 DataBases project (in particular the use cases associated with obtaining material properties, model data, etc) the Object View seems to be a more natural fit and hence the use of Features for interoperability purposes.
-- RobertWoodcock - 23 Apr 2003

Which schema?

In most cases of interest, the RDBMS schema for a particular data set has been designed with internal requirements of the hosting organisation in mind. For example, the information from the previous example can also be found in the following table:

Mapping 2

Roads

ID

updated

responsible

class

begin

end

surface

track

comment

label

laneCount

key

H31

2003-10-31

cox075

bitumen

G6421

G6423

S789

C123

The road to Gundagai

Hume Highway

4

R456

X96

never

no-one

dirt

The road to Ettamogah

...

1

R457

However, this time the fields carry different labels, and there are a few extra columns that carry information that is not in the GML. Nevertheless, this datastore can still support a WFS with the same schema. There is some translation of tags required, and information from some fields is not published, usually because it is only of private interest to the custodian, but possibly because it is deliberately restricted for some reason - this is a legitimate decision for the data provider to decide.

But of course, the information corresponding to a "feature" will often not be in a single table.
For example a database schema may have short entries for the primary object, and then move most of the "properties" into a single table:

Mapping 3

Roads

id

description

name

R456

The road to Gundagai

Hume Highway

R457

The road to Ettamogah

...

Properties

featureRef

property

value

R456

nLanes

4

R457

nLanes

1

R456

number

H31

R457

number

X96

R456

surfaceTreatment

bitumen

R457

surfaceTreatment

dirt

R456

destination

G6421

R456

destination

G6423

R456

pavement

S789

R456

centreLine

C123

Again, all the information required to populate the standard Road feature descriptions is still clearly present, but the mapping follows a different set of rules.

Different use-cases => either different infosets or different views

Primary maintenance of a database will be carried out within the custodian organisation using �forms� in desktop or �thick-client� applications. These applications have a close relationship with the �native� interface to the data source, and are capable of supporting fully general operations including ad-hoc queries. The maintainers may need access to all of the information, both private and publishable components, and may need to access the information in arbitrary ways, using a fully general query language, like SQL.

On the other hand, external users may only be interested in a subset of the information, exposed according to a public schema designed by a domain-specific information community. Use of a public schema allows the client application to be configured in advance, and then bind to any conformant service (i.e. that advertises that it uses this schema) at run-time without having to negotiate a special agreement or configuration.

The WebFeatureService interface provides access to Features and Feature Collections, though the requestor can specify which subset of properties get reported for the selected features, as a result of a GetFeature operation. But aside from that, any additional re-formatting or processing is the responsibility of the client - though of course this may use standard stylesheets, etc.

The specific feature type views that are provided are the prerogative of the service provider. Multiple views of the same datastore may be provided either as multiple feature-types within a single WFS, or through multiple WFS's.

Rob's 2c worth

The important thing in the preceding paragraphs relates to the "use cases" of the application (not whether or not it is thick-client or thin-client though there is often a correlation). A web based "forms" interface running off a web server can be just as performance demanding as any desktop "thick-client" from the database perspective. In general, the greater the flexibility and/or performance demands of the application the more specific the databases "internal requirements" become and the less likely it will be interoperable. -- RobertWoodcock - 23 Apr 2003_

Relationship between GML and other meta-models

General Feature Model

GML properties correspond with feature properties, not including operations.

GML generalises the GFM pattern and uses it for Objects that do not represent objects in the real world and thus are not strictly Features, such as Geometries, Definitions, etc.
This allows the Feature-property encoding pattern to be applied for a wider variety of types.

Table schema

The GML meta-model relates to a "conventional" tabular representation of a set of entities as follows:

GML Objects (including Features) typically correspond with an Entity, or row in a table

a GML property name corresponds with an attribute name, shown as a column name

a simple GML property value corresponds with an attribute value, the contents of table cell

Complex property values may correspond with a row from a related table, but other mappings are also possible .