XML Data can be Stored as Generated XML or Composed XML

Generated XML, where the XML data is stored across object-relational tables or as views in the database. This data can then be generated back into XML format, dynamically, when necessary

Composed (Authored/Native) XML, where the XML document is stored as is in CLOBs

Generated XML

XML can be generated from object-relational tables and views. The benefits of using object-relational tables and views as opposed to pure relational structures are discussed below.

Generated XML is used when the XML is an interchange format and existing business data is wrapped in XML structures (tags). This is the most common way of using XML in the database. Here, XML is used only for the interchange process itself and is transient.

Generated XML Examples

Examples of this kind of document include sales orders and invoices, airline flight schedules, and so on.

Oracle, with its object-relational extensions has the ability to capture the structure of the data in the database using object types, object references, and collections. There are two options for storing and preserving the structure of the XML data in an object-relational form:

Store the attributes of the elements in a relational table and define object views to capture the structure of the XML elements

Store the structured XML elements in an object table

Once stored generated, in the object-relational form, the data can be easily updated, queried, rearranged, and reformatted as needed using SQL.

Object-Relational Storage for Generated XML Documents

Complex XML documents can be stored as object-relational instances and indexed efficiently. Such instances fully capture and express the nesting and list semantics of XML. With Oracle's extensibility infrastructure, new types of indices, such as path indices, can be created for faster searching through XML documents.

XML SQL Utility (XSU) provides the means to store an XML document by mapping it to the underlying object-relational storage, and conversely, provides the ability retrieve the object-relational data as an XML document.

XSU converts the result of an SQL query into XML by mapping the query alias or column names into the element tag names and preserving the nesting of object types. The result can be in text or a DOM (Document Object Model) tree. The generation of the latter avoids the overhead of parsing the text and directly realizes the DOM tree.

Composed (Authored/Native) XML

Oracle8i and higher support the storage of large objects or LOBs as character LOBs (CLOB), binary LOBs (BLOB), or externally stored binary files (BFILE). LOBs are used to store composed (Authored/Native) XML documents.

Storing Composed XML Data in CLOBs or BFILEs

If the incoming XML documents do not conform to one particular structure, then it might be better to store such documents in CLOBs. For instance, in an XML messaging environment, each XML message in a queue might be of a different structure.

CLOBs store large character data and are useful for storing composed XML documents.

BFILEs are external file references and can also be used, although they are more useful for multimedia data that is not accessed often. In this case the XML is stored and managed outside Oracle, but can be used in queries on the server. The metadata for the document can be stored in object-relational tables in the server for fast indexing and access.

Storing an intact XML document in a CLOB or BLOB is a good strategy if the XML document contains static content that will only be updated by replacing the entire document.

Composed XML examples include written text such as articles, advertisements, books, legal contracts, and so on. Documents of this nature are known as document-centric and are delivered from the database as a whole. Storing this kind of document intact within Oracle gives you the advantages of an industry-proven database and its reliability over file system storage.

Storage Outside the database. If you choose to store an XML document outside the database, you can still use Oracle features to index, query, and efficiently retrieve the document through the use of BFILES, URLs, and text-based indexing.

Oracle allows the creation of Oracle Text (interMedia Text) indexes on LOB columns, in addition to URLs that point to external documents. This indexing mechanism works for XML data as well.

Oracle8i and Oracle9i recognize XML tags, and section and sub-section text searching within XML elements' content. The result is that queries can be posed on unstructured data and restricted to certain sections or elements within a document.

Oracle Text Example: Searching Text and XML Data Using CONTAINS

This Oracle Text (interMedia Text) example presume you have already created the appropriate index.

SELECT *
FROM purchaseXMLTab
WHERE CONTAINS(po_xml,"street WITHIN addr") >= 1;

CLOB storage is ideal if the structure of the XML document is unknown or dynamic.

Disadvantages of Using Composed XML Storage

Much of the SQL functionality on object-relational columns cannot be exploited. Concurrency of certain operations such as updates may be reduced. However, the exact copy of the document is retained.

Using a Hybrid XML Storage Approach for Better Mapping Granularity

The previous section described the following:

How structured XML documents (Generated) can be mapped to object-relational instances

How composed XML documents (Authored) can be stored in LOBs

However, in many cases, you need better control of the mapping granularity.

For example, when mapping a text document, such as a book, in XML, you may not want every single element to be expanded and stored as object-relational. Storing the font and paragraph information for such documents in an object-relational format may not be useful with respect to querying.

On the other hand, storing the whole text document in a CLOB reduces the effective SQL queriability on the entire document.

A Hybrid Approach Allows for User-Defined Storage Granularity

The alternative is to have user-defined granularity for such storage. In the book example, you may want the following:

To query on top-level elements such as chapter, section, title, and so on. These elements can be stored in object relational tables.

To query the book's contents in each section. These sections can be stored in a CLOB.

You can specify the granularity of mapping at table definition time. The server can automatically construct the XML from the various sources and generate queries appropriately.

Figure 2-1 Hybrid XML Storage Approach: Querying Top Level Elements in Tables While Contents are in a CLOB

Hybrid Storage Advantages

The advantages of the hybrid storage approach for storing XML documents are the following:

It gives the flexibility of storing useful and queryable information in object-relational format while not decomposing the entire document.

Saves time in reconstructing the document, since the entire document is not broken down.

Enables text searching on those parts of the document stored in LOBs

Transforming Generated XML

XML generated from the database is in a canonical format that maps columns to elements and object types to nested elements. However, applications might require different representations of the XML document in different circumstances.

When the XML Document Structure Needs Transforming

If an XML document is structured, but the structure of the XML document is not compatible with the structure of the underlying database schema, you must transform the data into the correct format before writing it to the database. You can achieve this in one of the following ways:

Use XSL stylesheets or other programming approaches

Store the data-centric XML document as an intact single object

Define object views corresponding to the various XML document structure and define instead-of triggers to perform the appropriate transformation and update the base data.

Combining XML Documents and Data Using Views

Finally, if you have a combination of structured and unstructured XML data, but still want to view and operate on it as a whole, you can use Oracle views.

Views enable you to construct an object on the fly by combining XML data stored in a variety of ways. You can do the following:

Store structured data, such as employee data, customer data, and so on, in one location within object-relational tables.

Store related unstructured data, such as descriptions and comments, within a CLOB.

When you need to retrieve the data as a whole, simply construct the structure from the various pieces of data with the use of type constructors in the view's select statement. XML SQL Utility then enables retrieving the constructed data from the view as a single XML document.

Using XSLT to Transform Query Results

This involves querying on the original document and transforming the result into a form required by the user or application. For instance, if an application is talking to a cellular phone using WML, it might need to transform the XML generated into WML or other similar standard suitable for communicating with the cellular phone.

This can be accomplished by applying XSLT transformations on the result XML document. The XSLT transformations can be pushed into the generation phase itself as an optimization. A scalable, high performance XSLT transformation engine within the database server would be able to handle large amounts of data.

Indexing and Querying Transformations

You may need to create indexes and query on transformed views of an XML document. For example, in an XML messaging environment, there could be purchase order messages in different formats. You may want to query them canonically, so that a particular query can work across all purchase order messages.

In this case, the query is posed against the transformed view of the documents. You can create functional indexes or use regular views to achieve this.

Indexing Approaches

Native implementation for the extract() and existsNode()member functions is to parse the XML document, perform path traversal, and extract the fragment. However, this is not a performance-enhancing or scalable solution.

XML Schemas and Mapping of Documents

W3C has chartered a schema working group to provide a new, XML based notation for structural schema and datatypes as an evolution of the current Document Type Definition (DTD) based mechanism. XML schemas can be used for the following:

Datatypes themselves can either be primitive (such as bytes, dates, integers, sequences, intervals) or user-defined (including ones that are derived from existing datatypes and which may constrain certain properties -- range, precision, length, mask -- of the basetype.) Application-specific constraints and descriptions are allowed.

XML Schema provides inheritance for element, attribute, and datatype definitions. Mechanisms are provided for URI references to facilitate a standard, unambiguous semantic understanding of constructs. The schema language also provides for embedded documentation or comments.

For example, you can define a simple data type as shown in the following example.

XMLSchema Example 1: Defining a Simple Data Type

It is clear even from the simple example above that XMLSchema provides a number of important new constructs over DTDs, such as a basetype, and a minimum value constraint.

When dynamic data is generated from a database, it is typically expressed in terms of a database type system. In Oracle, this is the object-relational type system described above, which provides for much richness in data types, such as NULL-ness, variable precision, NUMBER(7,2), check constraints, user-defined types, inheritance, references between types, collections of types and so on. XML Schema can capture a wide spectrum of schema constraints that go towards better matching generated documents to the underlying type-system of the data.

These XML Schemas have been deliberately constructed to match closely the Object-Relational purchase order example described above in"XML Example 2: XML Document Produced from Generic Mapping". The point is to underscore the closeness of match between the proposed constructs of XML Schema with SQL:1999-based type systems. Given such a close match, it is relatively easy to map an XML Schema to a database Object-Relational schema, and map documents that arevalid according to the above schema to row objects in the database schema. In fact, the greater expressiveness of XML Schema over DTDs greatly facilitates the mapping.

The applicability of the schema constraints provided by XML Schema is not limited to data-driven applications. There are more and more document-driven applications that exhibit dynamic behavior.

A simple example might be a memo, which is routed differently based on markup tags.

A more sophisticated example is a technical service manual for an intercontinental aircraft. Based on complex constraints provided by XML Schema, one can ensure that the author of such a manual always enters a valid part-number, and one might even ensure that part-number validity depends on dynamic considerations such as inventory levels, fluctuating demand and supply metrics, or changing regulatory mandates.

General XML: Design Issues for Data Exchange Applications

This section describes the following XML design issues for applications that exchange data.

Generating a Web Form from XML Data Stored in the Database

Use XML SQL Utility to generate a DTD based on the schema of the underlying table being queried.

Use the generated DTD as input to the XML Java Class Generator, which will generate a set of classes based on the DTD elements.

Write Java code that use these classes to generate the infrastructure behind a Web-based form. Based on this infrastructure, the Web form can capture user data and create an XML document compatible with the database schema.This data can then be written directly to the corresponding database table or object view without further processing.

Sending XML Data from a Web Form to the Database

One way to ensure that data obtained via a Web form will map to an underlying database schema is to design the Web form and its underlying structure so that it generates XML data based on a schema-compatible DTD. This section describes how to use the XML SQL Utility and the XML Parser for Java to achieve this. This scenario has the following flow:

A Java application uses the XML SQL Utility to generate a DTD that matches the expected format of the target object view or table.

The application feeds this DTD into the XML Class Generator for Java, which builds classes that can be used to set up the Web form presented to the user.

Using the generated classes, the web form is built dynamically by a JavaServer Page, Java servlet, or other component.

When a user fills out the form and submits it, the servlet maps the data to the proper XML data structure and the XML SQL Utility writes the data to the database.

You can use the DTD-generation capability of the XML SQL Utility to determine what XML format is expected by a target object view or table. To do this, you can perform a SELECT * FROM an object view or table to generate an XML result.

This result contains the DTD information as a separate file or embedded within the DOCTYPE tag at the top of the XML file.

Use this DTD as input to the XML Class Generator to generate a set of classes based on the DTD elements. You can then write Java code that use these classes to generate the infrastructure behind a Web-based form. The result is that data submitted via the Web form will be converted to an XML document that can be written to the database.

Sending XML Documents Applications-to-Application

There are numerous ways to transmit XML documents among applications. This section presents some of the more common approaches.

Here you can assume the following:

The sending application transmits the XML document

The receiving application receives the XML document

File Transfer. The receiving application requests the XML document from the sending application via FTP, NFS, SMB, or other file transfer protocol. The document is copied to the receiving application's file system. The application reads the file and processes it.

HTTP. The receiving application makes an HTTP request to a servlet. The servlet returns the XML document to the receiving application, which reads and processes it.

Web Form. The sending application renders a Web form. A user fills out the form and submits the information via a Java applet or Javascript running in the browser. The applet or Javascript transmits the user's form in XML format to the receiving application, which reads and processes it. If the receiving application will ultimately write data to the database, the sending application should create the XML in a database compatible format. One way to do this using Oracle XML products is described in the section Sending XML Data from a Web Form to a Database.

Advanced Queuing. An Oracle database sends an XML document via Net Services, HTTP or SMTP, and JDBC to the one or more receiving applications as a message through Oracle Advanced Queueing (AQ). The receiving applications dequeue the XML message and process it.

Loading XML into a Database

You can use the following options to load XML data or DTD files into Oracle9i:

Use PL/SQL stored procedures for LOB, such as DBMS_LOB

Write Java (Pro*C, C++) custom code

Use SQL*Loader

Use Oracle interMedia

XML SQL Utility (XSU)

You can also use Oracle9i Internet File System (9iFS) to put an XML document into the database. However, it does not support DTDs. It does however support XML Schema, the standard that will replace DTDs.

Figure 2-2 The LOBFILE Syntax

LOB data can be lengthy enough that it makes sense to load it from a LOBFILE. In LOBFILEs, LOB data instances are still considered to be in fields (predetermined size, delimited, length-value), but these fields are not organized into records (the concept of a record does not exist within LOBFILEs). Therefore, the processing overhead of dealing with records is avoided. This type of organization of data is ideal for LOB loading.

There is no requirement that a LOB from a LOBFILE fit in memory. SQL*Loader reads LOBFILEs in 64K chunks. To load physical records larger than 64K, you can use the READSIZE parameter to specify a larger size.

It is best to load XMLType columns or columns containing XML data in CLOBs, using LOBFILEs.

When the XML is valid. If the XML data in the LOBFILE is large and you know that the data is valid XML, then use direct-path load since it bypasses all the XML validation processing.

When the XML needs validating. If it is imperative that the validity of the XML data be checked, then use conventional path load, keeping in mind that it is not as efficient as a direct-path load.

A conventional path load executes SQL INSERT statements to populate tables in an Oracle database. A direct path load eliminates much of the Oracle database overhead by formatting Oracle data blocks and writing the data blocks directly to the database files.

A direct-path load does not compete with other users for database resources, so it can usually load data at near disk speed. Considerations inherent to direct path loads, such as restrictions, security, and backup implications, are discussed in Chapter 9 of Oracle9i Utilities.

Business applications scenarios for each of these two areas are described later in this chapter.

Content and Document Management with Oracle XML-Enabled Technology

Customizing Presentation of Data

XML is increasingly used to enable customized presentation of data for different browsers, devices, and users. By using XML documents along with XSL stylesheets on either the client, middle-tier, or server, you can transform, organize, and present XML data tailored to individual users for a variety of client devices, including the following:

Graphical and non-graphical Web browsers

Personal digital assistants (PDAs), such as the Palm Pilot

Digital cell phones and pagers

In doing so, you can focus your business applications on business operations, knowing you can accommodate differing output devices easily.

Using XML and XSL also makes it easier to create and manage dynamic Web sites. You can change the look and feel simply by changing the XSL stylesheet, without having to modify the underlying business logic or database code. As you target new users and devices, you can simply design new XSL stylesheets as needed. This is illustrated in Figure 2-4.

Problem

Company X has numerous document repositories of SGML and XML marked up text fragments. Composite documents must be published dynamically.

Solution

The bottom line is that the database application design must begin with a good database design. In other words, Company X must first use good data modeling and design guidelines. Then object views can more readily be created against the data.

Use XMLType to store the documents in XML format, where the relational data is updatable. Use Oracle9i's Internet File System (9iFS) as the data repository interface. 9iFS helps implement XML data repository management and administration tasks.

Company X can use XSL stylesheets to assemble the document sections or fragments and deliver the composite documents electronically to users. One suggested solution is to use Arbortext and EPIC for single sourcing and authoring or multichannel publishing. Multichannel publishing facilitates producing the same document in many different formats, such as HTML, PDF, WORD, ASCII text, SGML, and Framemaker.

Problem

A large news distributor receives data from various news sources. This data must be stored in a database and sent to all the distributors and users on demand so that they can view specific and customized news at any time, according to their contract with the news distributor. The distributor uses XSL to normalize and store the data in a database. The stored data is used to back several Websites and portals. These Websites and portals receive HTTP requests from various wired and unwired clients.

Problem

Company X needs data interactively delivered to a thin client.

Solution

Queries are sent from the client to databases whose output is rendered dynamically through one or more XSL stylesheets, for sending to the client application. The data is stored in a relational database in LOBs and materialized in XML.

Main Tasks Involved

Oracle XML Components Used

XML Parser for Java and XSLT Processor

Business-to-Business and Business-to-Consumer Messaging

A challenge for business application developers is to tie together data generated by applications from different vendors and different application domains. Oracle XML-enabled technology makes this kind of data exchange among applications easier to do by focusing on the data and its context without tying it to specific network or communication protocols.

Using XML and XSL transformations, applications can exchange data without having to manage and interpret proprietary or incompatible data formats.

Consider the following business-to-business and business-to-consumer (B2B/B2C) messaging scenarios that use Oracle XML components:

Problem

Company X needs to build an online shopping cart, for products coming from various vendors. Company X wants to receive orders online and then based upon which product is ordered, transfer the order to the correct vendor.

Solution

Use XML to deliver an integrated online purchasing application. While a user is completing a new purchase requisition for new hardware, they can go directly to the computer manufacturer's Web site to browse the latest models, configuration options, and prices. The user's site sends a purchase requisition reference number and authentication information to the vendor's Web site.

At the vendor site, the user adds items to their shopping cart, then clicks on a button to indicate that they are done shopping. The vendor sends back the contents of the shopping cart to the Company X's application as an XML file containing the part numbers, quantities, and prices that the user has chosen.

Items from the shopping cart are automatically added to the new purchase requisition as line items.

Customer orders (in XML) are delivered to the appropriate vendor databases for processing. XSL is used to transform and divide the shopping cart for compliant transfers. Data is stored in a relational database and materialized using XML. See Figure 2-7.

Problem

A client/server and server/server application stores a data resource and inventory in a database repository. This repository is shared across enterprises. Company X needs to know every time the data resource is accessed, and all the users and customers on the system need to know when and where data is accessed.

Solution

When a resource is accessed or released this triggers an availability XML message. This in turn transforms the resource, using XSL, into multiple client formats according to need. Conversely, a resource acquisition by one client sends an XML message to other clients, signalling its removal. Messages are stored in LOBs. Data is stored in a relational database and materialized in XML. See Figure 2-8.

Problem

Company X needs several applications to communicate and share data to integrate the business work flow and processes.

Solution

XML is used as the message payload. It is transformed via the XSLT Processor, enveloped and routed accordingly. The XML messages are stored in an AQ Broker Database in LOBs. Oracle Workflow is used to facilitate management of message and data routing and transformation. This solution also utilizes content management, here presentation customization using XSL stylesheets. See Figure 2-9.

Main Tasks Involved

The user or application places a request. The resulting data is pulled from the corporate database using XSU.

Data is transformed by XSLT Processor and sent to the AQ Broker.

AQ Broker reads this message and determines accordingly what action is needed. It issues the appropriate response to Application 1, 2, and 3, for further processing.