Login

Doing More With XML Schemas (part 3)

This article introduces you to the concept of uniqueness in the XML Schema world,
showing you how to use built-in schema constructs to enforce uniqueness within
your XML document instances.In the first two parts of this article, I spent lots of time and space blathering
on about the advanced aspects of XML schema design, including such arcane concepts
as complex datatypes, derivation by extension and restriction, and type redefinition.
You were probably bored out of your wits, but you nodded your head wisely throughout
out of politeness, and quietly hoped that that all that jargon was a prelude to
something more interesting.

I’m sorry to tell you that it isn’t. In fact, this third part is filled with
even more technical gobbledygook, including such beauties as “primary key reference”
and “selector”. None of these terms are likely to make your day any sunnier –
but hey, they’ll sure teach you a thing or two about designing good schemas. If
that sounds like something you’d like to learn more about, keep reading – it’s
time to take a little detour through the supermarket!{mospagebreak title=A Day At The Supermarket} What’s a supermarket got to with an XML schema, you ask wonderingly?
Quite a lot, actually. You see, all supermarkets consist of aisles, with products
placed neatly in each aisle for customers and employees alike. In an XML document,
this design would be represented as follows:

As you can see, I have a list of <aisle> elements, which in turn enclose multiple
<item> elements. Each <aisle> is associated with a “name” that represents
the category of items in the aisle, and a “number”, which is used for easy reference.
Each <item> is associated with a “quantity” and a “price”.

Writing an XML schema to validate the XML document instance above is child’s
play, especially considering the amount of practice I’ve had over the last couple
of weeks.

Now, let’s suppose that, one fine day, the store manager decides to add a few
items to aisle 1. In the XML universe, he has two options available to him: he
could add it to the existing <item> list for the appropriate aisle, or he could
add another <aisle> element to the bottom of the document instance, reference
it with the same aisle number, and attach the new items there.

With option two, the XML document instance doesn’t look as clean as it did initially.
Proceeding along this path, it would soon have a number of different entries for
the same <aisle> at different locations in the document tree. Obviously, this
is a maintenance nightmare.

You can prevent this from happening via the very cool <xsd:unique> element
– as in the revised schema below:

The <xsd:unique> element is what gets the ball rolling – it is used to impose
uniqueness constraints on an XML document instance. You can assign it a name –
I’ve called mine “NoRepeatAisle” – to makes its function clearer.

The <xsd:unique> element encloses <xsd:selector> and <xsd:field> elements,
which help to identify the unique components of the document. The “xpath” attribute
of the <xsd:selector> element contains an XPath expression that helps to limit
the scope within which the uniqueness constraint will be applied. In my case,
this is restricted to all the <aisle> elements that are the children of the
<supermarket> element only; if there exist any other <aisle> elements in
the hierarchy, this constraint is not valid.

The second element component of the uniqueness constraint is the <xsd:field>
element, which specifies which attribute values should be unique – in the example
above, this is the value of the “number” attribute.{mospagebreak title=Of Fruits And Vegetables} The schema design on the previous page ensures that a truant store
manager cannot damage the integrity of my XML document instance by throwing up
new aisles wherever (s)he likes. Now, let’s take it one step further and add another
integrity check, this one to ensure that the <item>s in each aisle actually
exist in the store’s inventory system.

Here’s the updated document instance – note that, this time around, I’ve added
an extra <items> block that serves as the inventory, matching item codes with
a human-readable description of each item.

In order to enforce the “only-add-those-items-to-aisles-that-exist-in-inventory”
rule, I need to import two new concepts into my schema, concepts that may already
be familiar to you from your work on relational databases: keys and relationships.

In the database world, a primary key uniquely identifies every record in a table
– it might be a single field, or a combination of fields, but it serves as a unique
fingerprint to identify any record in a table. It also serves as an important
component of the relational database model – relations between different tables
are created on the basis of primary keys.

Based on this knowledge, it’s pretty obvious what the primary key is in the scenario
above – it’s the “code” attribute of each <item>. Or, to put it in schema lingo,

Key definition takes place via the <xsd:key> element, which identifies the
key by a unique name. The <xsd:selector> and <xsd:field> elements are then
used, in conjunction with regular XPath expressions, to drill down to the element/attribute
combination representing the primary key.

Once the key is defined, the next step is to define a relationship around which
it pivots. In the scenario above, it is fairly clear that the key reference has
to be maintained between the <item> under the <items> element (the inventory
master list) and the <item> under the <aisle> element (the inventory itself).

With this in mind, let’s add a condition to the schema definition with the <xsd:keyref>
element.

The <xsd:keyref> element is used to indicate a reference to a key defined
elsewhere in the schema. I have given the reference an appropriate name – “NoIllegalEntries”
– which is displayed to the XML document author by the validator in case a violation
of the reference takes place. The “refer” attribute of the <xsd:keyref> element
links this reference to the primary key defined previously, via the unique key
name “itemKey”.

At this point, an integrity check has been added to the schema to ensure that
only valid <items> from the inventory appear in the <aisle>s. You can verify
this by adding an item to the aisles with a product code not listed in the inventory
– your XML validator should barf and throw up lots of ugly errors.{mospagebreak title=Taking On The Fleet} Let’s look at another example to better understand
how keys and references work. This time, I’ll leave the all-too-human world of
supermarkets and travel back to that galaxy far, far away, to see exactly what’s
sitting in the cargo hold of two of the better-known starships in the Star Wars
fleet.

Now, let’s suppose I wanted to add a couple more droids to the Falcon. Sure,
I could add another <ship> element with the same name…or I could do the smart
thing, and add another <droid> element to the existing definition. As discussed
in the previous example, the latter option is much cleaner, and also fairly easy
to implement via the <xsd:unique> element. Here’s the relevant snippet of the
updated schema definition:

In order to verify this, you can try creating two <ship> elements with the
same “name”, and seeing your XML validator throw up all over the screen. It’s
always fun to watch, and it doesn’t hurt anything!{mospagebreak title=Breaking The Mold} Next, how about introducing a referential integrity constraint similar
to the one in the example above? Let’s say we have a master list of available
droid types, and only those droid types may be requisitioned for the various ships
in the fleet. Here’s my new XML document:

There are two significant changes in this version of the XML document. First,
I have introduced a listing of droids which will act as a master list for all
the droids on the ships of the fleet. To make things easier, I have further classified
the droids on each ship into sections like administration, communication and repairs,
based on their advertised functionality.

Now, all I need to do is update the schema to reflect this referential integrity
constraint, in a manner similar to that in the previous example:

As I have three functional groups for droids on each ship, the key reference
needs to be repeated thrice, one for each group (you could probably do this in
a more efficient manner if you have a large number of groups – I leave that to
you as an exercise, preferring this slightly clunkier option for illustrative
purposes). The “refer” attribute of each <xsd:keyref> element links this constraint
to the droid master list via the unique label “droidNameKey”.

And that’s it! You now have two constraints in a single schema definition, one
ensuring that ship names are unique, and the other ensuring that only valid droids
appear in each ship. Try it out and see for yourself!{mospagebreak title=Two For One} Now, how about letting me twist your mind a little further? In the example
on the previous page, I stated that there could only be one entry for each ship
in the fleet. Let’s now modify that statement a little and permit repetitions,
so long as the the combination of ship name and droid code is unique.

As you can see above, my document includes more than one entry for a particular
ship name. However, the combination of ship and droid is unique. In order to ensure
that this rule is followed consistently, I need to update my schema definition
to use what is known as a “unique composed value”. A unique composed value specifies
uniqueness of an element by using two (or more) parameters.

It’s pretty simple – all I’ve done is add an additional field to the <xsd:unique>
element, in order to ensure that only the combined set of ship name and droid
type is unique. You don’t have to stop there either – a unique composed value
may be made up of as many components as you like.

How do I know that this actually works? Create a duplicate entry in the XML document
above, validate the file, and see what happens. Your XML validator should start
complaining bitterly about uniqueness constraint violations.

And that’s about it for the moment. In this article, I introduced you to the
concept of uniqueness in the XML Schema world, showing you how to use built-in
schema constructs to enforce uniqueness within your XML document instances. I
also showed you how you could replicate RDBMS referential integrity constraints
within the context of an XML document, using schema equivalents of primary and
foreign keys, and creating relationships between the different nodes of an XML
document. All these techniques come in handy to reduce errors when you’re building
schema definitions which contain internal inter-relationships that need to be
rigidly enforced across document instances.

In the next (and final) article in this series, I’ll be wrapping up this discussion
of advanced schema theory with an overview of XML namespaces, and how they fit
into the XML Schema picture. Make sure you come back for that one!

Note: All examples in this article have been tested on Linux/i586. Examples are
illustrative only, and are not meant for a production environment. Melonfire provides
no warranties or support for the source code described in this article. YMMV!