Visualizing XML Schemas in Data Modeling Tools

In 2004 I was repeatedly told by developers that we don’t need data models anymore – extensible markup language (XML) is the “silver bullet” that solves all data integration issues. In reality, XML is only a partial solution to data integration issues; perhaps it’s just the shiny silver plating on the bullet, not the bullet itself. The “bullet” is whatever mechanism we use to transport the XML silver to its destination – service-oriented architecture (SOA), or whatever architecture the XML fits into. Without that architecture, all we have is some shiny metal.

In order to use XML to successfully integrate data from disparate sources, we must have semantic consistency between schemas. The structure of a parcel of XML (a “document” or a “message”) must adhere to the rules defined in an agreed set of semantics, or data model.

Back in 2004, some of the developers I spoke to accepted the need to base their schemas on existing data models, but weren’t willing to do this manually. If we could generate the XML from our models they would use it, otherwise they’d be forced to handcraft them in their favourite XML editor, using their own naming and definition standards. At the time, there was nothing we could do to help them, unless we wanted to build our own XML generator.

There was a disconnect between the Entity-Relationship and UML tools on the one hand, and XML schemas on the other, with the exception of a few integrated toolsets, such as Mega.

Data modelling tools have moved on a lot since those days. Many have the ability to generate XML schemas from relational models, and some have the ability to model XML directly. If the modelling capabilities they provide are good enough, then we stand a chance of persuading developers and designers to use them for schema design, instead of tools like Altova XML Spy. This situation is analogous to using data modelling tools to create and edit database schemas, and then creating and updating actual database instances. An XML Schema is, after all, another form of physical data model. XML Spy would of course be used by XML developers to understand and work with the schema, but wouldn’t be used to design it.

There are some key differences between the database and XML design paradigms that pose problems for data modelling tool vendors, not because they’re difficult to deal with, but because the tool vendors need to take a new approach to manage them.

Inheritance

An XML schema can inherit structures from other XML Schemas via Import and Include statements, and the XML designer needs to be able to visualise these structures as if they were part of the content of schema they’re currently working on.

It’s not uncommon for there to be multiple levels of import / include. For example, the following diagram contains two partial views of the schema inheritance hierarchy for the OAGi standard. Each box represents a schema – the schema marked ‘B’ is the same schema, and we can see all of its parent and child schemas here. The schema marked ‘A’ is a message schema, and it includes or imports the full content of all the schemas marked with a ‘*’. That’s a total of 12 schemas whose content is available to schema A, but is not actually contained in the file ‘A.xsd'; that content must all be made available to the schema designer.

(mouseover image to enlarge)

Multiple Namespaces

A schema can include structures from multiple namespaces (or semantic models),using a prefix to qualify tags (names) where necessary. In the above diagrams, there are three namespaces, and there may be some names duplicated across the namespaces.

Strict Hierarchical Structure

The structure of an XML schema is hierarchical, not a network or a web of relations. Easily finding points in a hierarchy and navigating up and down the hierarchy are crucial.

Re-use of Types

A schema can contain or inherit re-usable type and element definitions, which can be extended and restricted in different ways, multiple times within the schema.

Visualising the Schema

The key challenge these factors pose tool vendors is visualising the schema structure. Altova XML Spy provides a great visual editing experience, resolving the complexities posed by inheritance, re-use and namespaces, to present an expandable hierarchy derived from the information in the schema and the included and imported schemas. The example from the Altova web site (below), demonstrates multiple namespaces and re-use: ‘Address_EY’ is defined by the type ‘ipo:EU_address’ that is contained within a separate namespace. The type definition might be in a separate schema file, perhaps even three or four levels up a hierarchy of schemas.

It’s vital that the detailed content of the type is made available to the schema designer. I have seen one tool where the XML model would contain a reference to the type, and that’s where the hierarchy stops; the designer would have to find the type in the browser or another diagram to see its structure. If that type contains an element that references an element or type defined elsewhere, then the designer has to go searching for that one as well.

Here’s the OAGi “OrderManagementComponents” schema shown as a Mindjet Mind Manager mind map, with the first level shown horizontally to force it into landscape orientation. Mind Manager doesn’t include inherited structures within the visualisation, so the “full” schema will be much larger than this.

In order to wean people away from using XML Spy for ‘modelling’ XML, data modelling tools must provide the ability to visualise the underlying hierarchy of an XML element or type in one place, either via a browser view or using symbols on a diagram, preferably both. For XML designers, being able to see the structure of an element or schema without having to open multiple edit boxes or nested composite diagrams is essential. This is true whether we’re using a dedicated XML model, a UML model, or a relational model.

We also need the ability to expand and contract multiple levels in the schema, whatever works for us at the time with the schema we’re working on. Anything less just slows us down and makes life more difficult.

What We All Need To Do

If you’re evaluating data modelling tools, or checking the state of play for tools you already use, make sure you ask every vendor how they plan to help you visualise your schemas.

Tell the vendor that you must be able to show all the content of a schema, no matter where it is inherited from

Tell the vendor that you need to be able to hide all or selected (perhaps inherited from a given schema or namespace) inherited components

Tell the vendor that you need the schema visualisation to highlight inherited components, and to tell you where they came from (perhaps in a pop-up box when you hover the mouse over them)

Tell the vendor that you need control over the style and content of symbols on schema diagrams, that you need to be able to use different styles on different diagrams, to be able to re-use the styles you developed on one diagram, and to be able re-use them elsewhere

Tell the vendor that you need complete control over how much of the schema you can see at any one time. You might even ask them for the ability to see multiple views of the same schema at the same time (like having two windows open on the same spreadsheet)

Tell the vendor that you want the ‘browser’ view to be able show the full inherited structure, not just stop when a reference to a type or an element in another schema is reached

Tell the vendor that they need to provide a better visual design and editing experience than any of the XML editing tools on the market