The Extensible Markup Language, or XML, is a technique of
using a document, such as a text file, to describe information and make that
information available to whatever and whoever can take advantage of it. The
description is done so the document can be created by one person or company and
used by another person or another company without having to know who first
created the document. This is because the document thus created
is not a program, it is not an application: it is just a text-based document.

Because XML is very flexible, it can be used in regular
computer applications, in databases, in web-based systems (Internet), in
communication applications, in computer networks, in scientific applications,
etc. XML is standardized by the W3C (http://www.w3c.org) organization. XML is released through
an XML Recommendation document with a version.

Creating an XML File

To create an XML file, in the document, you can use a text
editor (such as Notepad) and type units of
code using normal
characters. The XML document is made of units called
entities. These entities are spread on various lines of the document as you
judge them necessary and as we will learn. XML has strict rules as to how the
contents of the document should or must be structured.

An XML file is first of all a normal text-based
document that has a .xml extension. Therefore, however you create
it, it must specify that extension. When saving
the file, you can include the name of the file in double-quotes:

You can also first set the Save As Type combo box to
All Files and then enter the name of the file with the .xml extension.

Many other applications allow creating an XML file or
generating one from an existing file. There are also commercial editors
you can get or purchase to create an XML file.

After an XML document has been created and is available, in
order to use it, you need a program that can read, analyze, and interpret it.
This program is called a parser. The most popular parser used in Microsoft
Windows applications is MSXML, published by Microsoft.

Of course, after creating and saving an XML file, you can
change (edit) it as you judge it necessary.

Opening an XML File

Whether you created an XML file or someone else did, you can
open it to view its contents. The easiest way to open an XML file is to
use a text editor, such as Notepad. If you use a different type of application,
it should provide a File -> Open option for you to open the file

Another way you can display an XML file is in a browser. To do this, if you see the file
in Windows Explorer or in My Documents, you can double-click it. Here is an
example:

Introduction to Writing XML Code

Markup

When you create an XML file, there are standard rules you should
(must) follow in order to have a valid document. The standards for an XML file
are defined by the W3C Document Object Model (DOM).

A markup is an instruction that defines XML. The fundamental
formula of a markup is:

<tag>

The left angle bracket "<" and the right angle
bracket ">" are required.
Inside of these symbols, you type a word or a group of words of your choice, using regular
characters of the English alphabet and sometimes non-readable characters such as
?, !, or [. The combination of a left angle bracket "<", the right
angle bracket ">", and what is inside of these symbols is called a
markup. There are various types of markups we will learn.

The Document Type Declaration (DTD)

As mentioned above, XML is released as a version. Because
there can be various versions, the first line that can be processed in an XML
file must specify the version of XML you are using. At the time of this writing,
the widely supported version is 1.0. When creating an XML file, you
should (should in 1.0 but must in 1.1) specify what version your file is
conform with, especially if you are using a version higher than 1.0. For this
reason, an XML file should start (again, must, in 1.1), in the top section, with a line known as an XML
declaration. It starts with <?xml version=, followed by the version you are
using, assigned as a string, and followed by ?>. An example of such a line
is:

<?xml version="1.0"?>

Encoding Declaration

As mentioned already, the tags are created using
characters of the alphabet and conform to the ISO standard. This is known as the encoding declaration.
For example, most of
the characters used in the US English language are known as ASCII. These
characters use a combination of 7 bits to create a symbol (because the
computer can only recognize 8 bits, the last bit is left for other uses).
Such an encoding is specified as UTF-8. There are other standards such as
UTF-16 (for wide, 2-Byte, characters).

To specify the encoding you are using, type encoding
followed by the encoding scheme you are using, which must be assigned as a
string. The encoding is specified in the first line. Here is an
example:

<?xml version="1.0" encoding="utf-8"?>

Practical
Learning: Introducing XML

Start Notepad (or a text editor)

In the empty document, type the following:

<?xml version="1.0" encoding="utf-8"?>

To save the document, on the maine menu, click File -> Save

Locate and display the C:\ drive in the top combo box (you can select
another drive if you want)

Click Create Folder

Type exercises as the name of the new folder and press
Enter (you can give another name if you want)

Display the exercises folder (or the folder you will use) in the top
combo box

Set the Save As Type combo box to All Files (*.*)

Set the file name to students.xml

Click Save

XML Stylesheet

Introduction

Throughout our lessons, we will learn how to create XML
files and populate their content. We saw that, after creating the file, you can
display it in a browser. That display is fine for an XML developer, but is not
realistic for a normal reader. To make the document easily readable in a normal
text format, you can provide instructions to it. These instructions inform the
browser as to how to display the various parts of the document. One solution is
to use a style sheet.

Creating an XML Stylesheet

An XML style sheet is created like a cascading style sheet,
using the rules of that language. The file should be saved with the .css
extension.

Using an XML Stylesheet

After creating the style sheet, you can use it by
referencing it in the XML document. To do this, in the XML document, under the XML declaration,
create the <? ?> delimiters. Inside these delimiters,
start with xml-stylesheet as in:

<?xml-stylesheet?>

You must specify the name of the file that contains the
style sheet(s) to use. To do this, type href followed by the name of the file
and its extension in double-quotes. Here is an example:

<?xml-stylesheet href="example.css"?>

You must also specify the type of file it is. This is done
by adding type="text/css". Here is an example:

<?xml-stylesheet href="example.css" type="text/css"?>

XML Tag Creation

Introduction

Earlier, we mentioned that XML worked through markups. A
simple markup is made of a tag created between the left angle bracket "<" and
the right angle bracket ">". Just creating a
markup is not particularly significant. You must give it meaning. To do this,
you can type a number, a date, or a string on the right side of the right angle bracket
">" symbol.
The text on the right side of ">" is referred to
as the item's text. It is also called a value.

After specifying the value of
the markup, you must close it: this is a rule not enforced in HTML but must be
respected in XML to make it "well-formed". To close a tag, use the same formula of creating a tag with
the left angle bracket "<", the tag, and the right angle bracket ">" except that, between < and the tag, you
must type a forward slash. The formula to use is:

<tag>some value</tag>

The item on the left side of the "some value"
string, in this case
<tag>, is called the opening or start-tag. The item on the right side of
the "some value" string, in this case </tag>, is called the closing or
end-tag.
Like<tag> is a markup, </tag> also is called a markup.

With XML, you create your own tags
with custom names. This means that a typical XML file is made of various items. Here is an example:

When creating your tags, there are various rules you
must observe with regards to their names. Unlike HTML, XML is very restrictive with its rules. For example, XML is case-sensitive. This means that
CASE, Case, and case are three different words. Therefore,
you must pay close attention to what you write inside of the < and the >
delimiters.

Besides
case sensitivity, there are
some rules you must observe when naming the tags of your markups:

The name of a tag must be in one word, no space in the name

The name must start with an alphabetic letter or an underscore -
Examples are <Country> or <_salary>

The first letter or underscore that starts a name can be followed by:

Letters - Example: <OperatingSystem>

Digits - Example: <L153>

Hyphens - Example: <TV-Rating>

Underscores - Example: <Chief_Accountant>

The name of a tag cannot start with xml, XML or any combination of X
(uppercase or lowercase), followed by M (uppercase or lowercase), and
followed by L (uppercase or lowercase)

In future sections, we will learn that, with some markups,
you can include non-readable characters between the angle brackets. In fact, you
will need to pay close attention to the symbols you type in a
markup. We will also see how some characters have special meaning.

Practical Learning: Creating XML

Click the end of the first line in the document and press Enter

Type <students> and press Enter

Type </students> and press Enter:

<?xml version="1.0" encoding="utf-8"?>
<students>
</students>

Save the file

The Root

Every XML document must have one particular tag that, either
is the only tag in the file, or acts as the parent of all the other tags of the same
document. This tag is called the root. Here is an example of a file that has
only one tag:

<rectangle>A rectangle is a shape with 4 sides and 4 straight angles</rectangle>

This would produce:

If there are more than one tag in the XML file, one of them
must serve as the parent or root. Otherwise, you would receive an error. Based
on this rule, the following XML code is not valid:

<rectangle>A rectangle is a shape with 4 sides and 4 straight angles</rectangle>
<square>A square is a rectangle whose 4 sides are equal</square>

This would produce:

To correct this type of error, you can change one of
the existing tags to act as the root. In the following example, the
<rectangle> tag acts as the parent:

<rectangle>A rectangle is a shape with 4 sides and 4 straight angles
<square>A square is a rectangle whose 4 sides are equal</square></rectangle>

This would produce:

Alternatively, you can create a tag that acts as the parent
for the other tags. In the following example, the <geometry> tag acts as
the parent of the <rectangle> and of the <square> tags:

<geometry><rectangle>A rectangle is a shape with 4 sides and 4 straight angles
</rectangle><square>A square is a rectangle whose 4 sides are equal</square></geometry>

<?xml version="1.0" encoding="utf-8"?><geometry><rectangle>A rectangle
is a shape with 4 sides and 4 straight angles</rectangle><square>A
square is a rectangle whose 4 sides are equal</square></geometry>

The Structure of an XML Tag

Empty Tags

We mentioned that, unlike HTML, every XML tag must be
closed. We also saw that the value of a tag was specified on the right side of
the right angle bracket of the start tag. In some cases, you will create a tag
that doesn't have a value or, may be for some reason, you don't provide a value to
it. Here is an example:

<dinner></dinner>

This type of tag is called an empty tag. Since there is no
value in it, you may not need to provide an end tag but it still must be closed.
Although this writing is allowed, an alternative is to close the start tag
itself. To do this, between the tag name and the right angle bracket, type an
empty space followed by a forward slash. Here is an example:

<dinner />

Both produce the same result or accomplish the same role.

White Spaces

Consider the following example:

<?xml version="1.0" encoding="utf-8"?><geometry><rectangle>A rectangle
is a shape with 4 sides and 4 straight angles</rectangle><square>A
square is a rectangle whose 4 sides are equal</square></geometry>

We typed various items on the same
line. If you are creating a long XML document, although creating various items on the same line is acceptable, this technique can make it (very)
difficult to read. One way you can solve this problem is to separate tags with
empty spaces. Here is an example:

All these are possible and acceptable because the XML parser
doesn't consider the empty spaces or end of line. Therefore, to make your code
easier to read, you can use empty spaces, carriage-return-line-feed
combinations, or tabs inserted in various sections. All these are referred to as
white spaces.

Nesting Tags

Most XML files contain more than one tag. We saw that a tag
must have a starting point and a tag must be closed. One tag can be included in another tag: this is referred to as
nesting. A tag that is created inside of another tag is said to be nested. A tag
that contains another tag is said to be nesting. Consider the following example:

In this example, you may want the English tag to be nested
in the Smile tag. To nest one tag inside of another, you must type the nested
tag before the end-tag of the nesting tag. For example, if you want to nest the
English tag in the Smile tag, you must type the whole English tag before the
</Smile> end tag. Here is an example:

<Smile>Please smile to the camera<English>Welcome to our XML Class</English></Smile>

To make this code easier to read, you can use white spaces
as follows:

An XML file appears as an upside-down tree: it has a root
(in this case <Videos>), it can have branches (in this case <Video>), and
it can have leaves (an example in this case is <Title>). As we have seen so far, all of
these objects are created using the same technique: a tag with a name (such as
<Title>) and an
optional value. Based on their similarities, each of these objects is called a
node.

Introduction to Node Types

To make XML as complete and as efficient as possible, it can
contain various types of nodes. These are also referred
to as node types. XML has lotst of them as we will see in the next lessons.