Σχόλια 0

Το κείμενο του εγγράφου

Navigating, reading and writing XML documents by using theSystem.xml namespace and the classes XmlDocument,

XmlReader, XmlWriter and XpathNavigator.



Validating XMLby

applying Schemas



Transforming XML by

applying XSLT transformation



XML andADO.NET



XML serialization

Topics at a glance

not included in this course:



XML and the .NET Framework



Using XML with Linq



XML and WCF

2-

Introducing xml and the .NET Framework

a.

What is XML

XML

stands for

eXtensible Markup Language

and is amarkup language used to desrribe data.

It offers a standardized way to represent textual data.

The XML data doesn't perform anything on its own;to process that data,you need to use a piece ofsoftware called a

parser. unlike Hypertext MarkupLanguage (HTML), which focuses on how to presentdata, XML focuses on how to represent data. XMLconsists of user-

defined tags, which means you arefree to define and use your own tags in an XMLdocument.

The following illustrates anexample:

<?xml version=”1.0”>

<customers>

3

<customer id=”id1”>

<name>n1</name>

<phone>09876534</phone>

</customer>

<customer id=”id2”>

<name>n2</name>

<phone>08876534</phone>

</customer>

</customers>

b.

Benefits of XML



XML is a standard: integrating cross-platformapplications

is now easy. For example a VB6 applicationcan now share data with a JAVA application running onUNIX



XML is self-describing



XML is extensible



Can be processed easily than

.CSV files



Can be used to create specialized vocabularies like XMLSchemas, WML,

WAP and SOAP.

c.

XML Parsers

The software that processes XML documents is called Parser,and theyallow youparse for syntax errors,read, write andmanipulate XML documents. They can be classified in twocategories depending on how they can process XML data:



DOM (Document Object Model)parsers.

(XmlDocumentclass)



SAX parsers.

(Simple API for XML)

(e.g

:XmlTextReader,XmlTextWriter)

4

DOMparsersprocess XML as a tree structure, they areRead/Write, allow random access to any node, but, they needto load the entire XML document in memory.

SAX parsersdo not read the entire XML document intomemory at once. They scan the document sequentially andraise events. We can handle these events to read thedocument. SAX parsers are read-only parsers and they can beused to optimize memoryallocation.

Note:

.NET3.0 introduced LINK (Language Integrated Query)

that offers a new way to read and write XML documents.

d.

XSLT

(Extensible Stylesheet LanguageTransformations)

XML Solves the problem of data representation and exchange.However whenwe need to convertthis XML data into anotherXML data or into a new format (e.g. HTML) understood by theother application, we need to use XSLT.

e.

XPATH

It

consists of various XPATH expressions

and functions that wecan use to look for and select elements and attributesmatching certain criteria.

f.

The .NET FRAMEWORK

It is a platform for building Windows and Web applications,components and services by using a variety of languages.Figure A.1 depict the .NET Stack

5

Fig. 1

The CLR

is the heart of the .NET Framework. It provides theexecuting environment to all the .Net applications, so in orderto run any .Net application the CLR must be installed. The CLRdoes many things for your application:

(W3C) andit is a standard to represent the content and model ofdocuments (HTML, XML, CSS, XML Validation handlers,..)across all programming language and tools.

The DOM interfacesupply a complete in-memory tree representation of the XMLdocument. So in

a typical XML Document we might get astructure that looks like:

7

Because of this model, all DOM structures can be treatedeither as their generic type,Node, or as their specifictype(Element, attribute, tesxt,..). many of the navigationmethods thatwe will explore, are on the basic of Nodeinterface, so we can walk up and down the tree withoutworrying about the specific structure type.

DOM defines itsown list structures like nodelists rather than .net collections.The following figure shows a simple UML-style model for DOMcore interfaces and classes.

8

3-

Manipulating XMLDocuments by Using DOM

In this paragraph we will cover the following topics:



System.Xml namespace classes related to DOM



Knowing when to use DOM



Reading an XML Document by usingDOM



Writing XML Documents by using DOM

a.

Using the DOM Parser

At the heart of DOM manipulation we can find theclassXmlDocument.

It loads the XML document and builds its treerepresentation. DOM is best suited in the following scenarios:



You want a read/write access to the XML Document



Random access



Small document size

b.

Opening an existing XML document for parsing

The XmlDocument class allows you to open XML documents inthree common ways:

1.

Specify the path or the URI

of the XML file:

string

filepath ="C:\\...\\...\\catalog.xml”;

XmlDocument

doc

=new

XmlDocument();

doc.Load(filepath);

2.

Use a stream like FileStream that contains the XML Data:

string

filepath ="C:\\...\\...\\catalog.xml”;

9

FileStream

fs =new

FileStream(filepath,FileMode.Open);

XmlDocument

doc

=new

XmlDocument();

doc.Load(fs);fs.Close();

3.

By using an XmlTextReader: as we will see later

4.

A string in memory that contains the XML Data

string

str

="<book genre='g1' isbn='11'>"

+

"<title>tl1</title>"

+

"</book>"

XmlDocument

doc

=new

XmlDocument();

doc.LoadXml(str);

Practice 1

1-

Create a windows application

2-

Create an XML file that contains the following:

<?xml

version="1.0"

encoding="utf-8"

?>

<catalog>

<supplier

name="n1"

uid="id1"

pw="pw1"

/>

<batch

batch_type="cat_new">

<product

prod_code="c1"

prod_type="t1">

<desc>desc1</desc>

<saleinfo

saleprice="10.3"></saleinfo>

<supplier

id="s1">s1</supplier>

<supplier

id="s2">s2</supplier>

</product>

<product

prod_code="c11"

prod_type="t11">

<desc>desc10</desc>

<saleinfo

saleprice="10.3"></saleinfo>

<supplier

id="s4">s4</supplier>

<supplier

id="s5">s5</supplier>

</product>

</batch>

<batch

batch_type="cat_upd">

<product

prod_code="c2"

prod_type="t1">

<desc>desc11</desc>

<saleinfo

saleprice="10.3"></saleinfo>

<supplier

id="s11">s11</supplier>

<supplier

id="s2">s2</supplier>

</product>

</batch>

<batch

batch_type="cat_del">

<product

prod_code="c3"

prod_type="t1"/>

</batch>

</catalog>

10

3-

Write the application that loads the XmlDocument asdescribed in the following image

c.

Navigating Through an XML Document

An XML Document consists of one or more nodes, and nodescan be nestedinside other nodes. Such nested nodes arecalled child nodes.

TheXmlNode

node has a collection calledChildNodes

thatcontains a list of child nodes of the current node. The XmlNodeis the base class of most of the DOM classes. Further, this classhasmoreproperties suchParentNode,FirstChild,LastChild,NextSibling,PreviousSibling,HasChildNodes,Attributes…

theemployees.xml file into a TreeView as depicted in thefollowing Figure:

12

We note that ID is an attribute

Solution

d.

Looking For Specific Elements and Node

Often we are not interested in the entire XML Documentloaded in memory but part of it. This requires us to search fora specific element or node for further processing. There areseveral methods used tosearch the XML Document.



Retrieving specificelements

using theGetElementsByTagName() method

(Returns anXmlNodeList)



Retrieving specificelements

using the GetElementById()method. (Returns an XmlElement)



Selecting specificnodes

using the SelectNodes()

method



Selecting a single specificnode

using theSelectSingleNode() method

(Returns XmlNode)

13

a-

GetElementsByTagName

This method accepts the name of the tag (excluding <and >) and returns in a collection of XmlNode, all thenodes matching that tag name.

By using the XML Document in Practice 1we can write:

XmlNodeList

list;

list=doc.GetElementsByTagName("product");

foreach

(XmlNode

xin

list)

{

Label1.Text += x.Attributes[0].Value;

}

Practice 3

Solution

14

b-

GetElementById

Often we want to search for a specific elementknowing its unique ID. This may not be performed bythe XmlDocument unless we bind the XML Documentto its DTD or Schema, where we havedefined theunique ID.

Note

:

The DOM implementation must haveinformation which defines which attributes are of typeID. Although attributes of type ID can be defined ineither XSD schemas or DTDs, this version of theproduct only supports those definedin DTDs.Attributes with the name "ID" are not of type ID unlessso defined in the DTD.

Example

Given the following file.xml with its DTD:

<!DOCTYPE root [

<!ELEMENT root ANY>

<!ELEMENT Person ANY>

<!ELEMENT Customer ANY >

<!ELEMENT Team EMPTY>

<!ATTLIST Person SSN ID #REQUIRED>]>

<root>

<Person SSN='A111'>Fred</Person>

<Person SSN='A222'>

Tom</Person>

<Customer>C1</Customer>

<Customer>C2</Customer>

<Team>TM1</Team>

</root>

15

The code can be as follows:

using System;

using

System.Xml;

public class Sample

{

public static void Main()

{

XmlDocument doc = new XmlDocument();

doc.Load("file.xml");

//Get the first element with an attribute of type ID

//and value of A111.

//This displays the node:

//

<Person SSN="A111">Fred</Person>.

XmlElement elem = doc.GetElementById("A111");

Console.WriteLine( elem.OuterXml );

//Get the first element with an attribute of type ID//and value of A222.

//This displays Tom

elem = doc.GetElementById("A222");

Console.WriteLine( elem.ChildNodes[0].InnerText

);

}

}

Note

:

The XmlElement class inherits the XmlNodeclass

c-

SelectNodes

The XmlDocument has a method called SelectNodesthat accepts then XPath pattern to accomplish complex

16

search within the

DOM tree. It returns XmlNodeListcontaining the matching nodes.

Example

To describe how the SelectNodes method works wewill develop a windows application that uses XPath tofilter the result. The follwing figure show theapplication’s form

17

Solution

Where:

XmlNodeList list=null;

As we can see, the XPath filter will filter the employeestag to only obtain the requiredfirst name

orlast name.

18

The code of the Button will be:

d-

SelectSingleNode

This method is very similar toSelectNodes but itreturns

the first matching XmlNode.

As an example we can repeat the previous code andreplace the XmlNodeList list; by XmlNode node;

19

Practice 4

Considering the following XML file:

<?xml

version="1.0"

encoding="utf-8"

?>

<products>

<product>

<pid>1</pid>

<productname>fff</productname>

<price>45</price>

</product>

<product>

<pid>2</pid>

<productname>ggg</productname>

<price>56</price>

</product>

<product>

<pid>3</pid>

<productname>hhh</productname>

<price>12</price>

</product>

</products>

By usingfor

and foreach

loops, write the code thatloads

a ComboBox by the productname.

e.

Modifying XML Documents

by using XmlDocument,

XmlNode

and/orXmlElement

So far, we have seen how to read XML documents,how tonavigate through them and how to searchthem on the basis of the tag name.

XML Namespaces allow you to identify elements aspart of a single group (a namespace) by uniquelyqualifying element and attribute names used in an XMLdocument. Each namespace is identified by a URI(Uniform Resource Identifier). This allows developer tocombine information from different data structures ina single XML document without causing ambiguity andconfusion among element names.

For example, assume that you havetwo XMLfragments, one related to employees and the otherrelated to customers. Further assume that bothfragments contain a tag called name.The

problem isthat when you mix them together, you will obtainambiguity for the <name> tag.

XML namespacescome

in handy in such situations.

the following exampleshows the employees_with_ns.xml with itsnamespace:

<?xml

version="1.0"

encoding="utf-8"

?>

<emp:employees xmlns:emp=”http//www.somedomain.com”>

<emp:employee employeeid=”1”>

<emp:firstname>fn1/<emp:firstname>

<emp:lastname>ln1</emp:lastname>

<emp:homephone>45</emp:homephone>

<emp:notes>n1</emp:notes>

</emp:employee>

<emp:employee employeeid=”2”>

<emp:firstname>fn2/<emp:firstname>

<emp:lastname>ln2</emp:lastname>

<emp:homephone>46</emp:homephone>

<emp:notes>n2</emp:notes>

</emp:employee>

<emp:employee employeeid=”1”>

<emp:firstname>fn3/<emp:firstname>

<emp:lastname>ln3</emp:lastname>

<emp:homephone>47</emp:homephone>

22

<emp:notes>n3</emp:notes>

</emp:employee>

</emp:employees>

The following code will extracts the namespace URI, theprefix and the localname of the document as follows:

Homework:repeat the paragraphs c, d and e by using theemployees_with_ns xml file

g.

Understanding Events of the XMLDocument class

Whenever you modify an XML document, the class raises preand post events that are lunched prior and after the actualoperation.

23

Each of theevents receives an event argument parameter oftype XmlNodeChangedEventArgs that provides the followingproperties:

To understand these events we will modify and test theprevious application as follows:

24

1-

Form Load

2-

The event handlers

3-

Test the module

25

4-

Reading and Writing XML Documents

The previous chapter has introduced the DOM Tree parserXmlDocument. In the following section we will explore XML readerand writer classes.

class allows you to iterate through the document and accessthe required content rather than raising events. Thusit

is moreflexible

from a development point of view, it does not load theentire document in memory resulting in asmall memoryfootprint, and, because it is read only, it is faster too. LikeXmlReader is also theXmlWriter, it is a forward only modeland offers write-only functionality.

These classes areabstract classes. The System.Xml offers twoclassesXmlTextReader and XmlTextWriter that inherit

fromXmlReader and XmlWriter.

Additional reason to use these classes is when we do not needto access various part of the document randomly.

b.

Reader classes



XmlTextReader

26



XmlValidatingReader: it can validate an XML documentagainst a DTD or XML schema



XmlNodeReader: allows you to read XML data from theDOM tree. The constructor of XmlNodeReader takes aparameter of type XmlNode that could be obtained as aresult of an XPath query or directly from a DOMdocument.

checks whether the currentnode is a content (non-white space text,CDATA,Element,

30

EndElement,EntityReference, orEndEntity) node. If thenode is not a content node, the reader skips ahead to thenext content node or end of file. It skipsover nodes of thefollowing type:ProcessingInstruction,

Comment,Whitespace.

//this code will parse only Elements, EndElements andText

string

filepath ="C:\\..\\catalog1.xml";

XmlTextReader

myrdr =new

XmlTextReader(filepath);

while

(myrdr.Read())

{

Console.Write("movetocontent="

+myrdr.MoveToContent().ToString() +" Name= "

+myrdr.Name+" ");

Console.WriteLine("Value="+myrdr.Value);

}

myrdr.Close();

Practice 7

Develop a windows application that will display a tree ofvarious elements and their values. The program mustignore whitespaces and must process only element nodesby testing:myrdr.NodeType==XmlNodeType.Element.

Use the employees file described inpractice 5. The outputmust be as follows:

Solution

31

c-

Improving Performance by using Name Tables

Whenever XmlTextReader parses any XML file,itcreates

a list of element names found in thatdocument. If we save this list in a table and if wesupply the same table as a parameter of a new readerthat will parse an XML file having similar structure wewill increase the performance.

Example:

d.

MovingBetweenElements

There are some additional methods of XmlTextReaderthat allow you to move between elements

and readcontent.

a-

ReadSubTree() Method

ReadSubtree

can be called only on element nodes.When the entire sub-tree has been read, calls to theRead

method returnsfalse. When the newXmlReader

has been closed, the originalXmlReader

will bepositioned on theEndElement

node of the sub-tree.Thus, if you called theReadSubtree

method on the

32

start tag of the book element, after the sub-tree hasbeen read and the newXmlReader

has been closed,theoriginalXmlReader

is positioned on the end tag

ofthe book element.

You should not perform any operations on the originalXmlReader

until the newXmlReader

has been closed.This action is not supported and can result inunpredictable behavior.

This is useful if you need to pass data to anothercomponent for processing and you wish to limit howmuch of your data the component can access.Whenyou passanXmlReader

returned by theReadSubtree

method to another application, the application canaccess only that XML element, rather than the entireXML document. From the other side, this methodcomes in handy whenyou want to jump to a specificnode rather than sequentially moving there.

Note:

if the reader is on any simple node,ReadSubTree returns this node andits entire

siblingwithin the parent node.

Example:

string

filepath ="C:\\..\\catalog1.xml";

XmlTextReader

myrdr =new

XmlTextReader(filepath);

XmlReader

rdr=null;

while

(myrdr.Read())

{

myrdr.MoveToContent();

if

(myrdr.NodeType ==XmlNodeType.Element)

{

if

(myrdr.Name=="product")

{

//Method1

myrdr.MoveToFirstAttribute();

33

if

(myrdr.Value =="2")

{

myrdr.MoveToElement();

//MessageBox.Show

(myrdr.Name.ToString());

rdr = myrdr.ReadSubtree();

break;

}

//Method2

//myrdr.ReadToFollowing("product");

////myrdr.MoveToFirstAttribute();

////MessageBox.Show(myrdr.Value);

//rdr = myrdr.ReadSubtree();

//break;

}

}

}

while

(rdr.Read())

{

if

(rdr.NodeType.ToString() =="Text")

{

//note that there is no Name only Type andValue

MessageBox.Show("Type="

+rdr.NodeType.ToString()

+" Name= "

+ rdr.Name +" Value= "

+

rdr.Value);

}

}

rdr.Close();

MessageBox.Show("Complete with myrdr");

myrdr.MoveToContent();

while

(myrdr.Read())

{

if

(myrdr.NodeType.ToString() =="Text")

{

MessageBox.Show("Type="

+ myrdr.

NodeType.ToString()

+" Name= "

+ myrdr.Name +

" Value= "

+

myrdr.Value);

}

}

myrdr.Close();

b-

ReadToFollowing(“name”) method

Reads until the named element is found. See exampleabove.

c-

ReadToDescendant(“name”)

method

34

Reads until the named element is found only if it is adescendant of the current node, whereas theReadToFollowing() jumps to the first occurrence of thespecified element, be it a descendant or not.

d-

ReadToNextSibling(“name”)

method

It moves the readerfrom the current element to thenext element at the same level.

e-

Skip() method

Like ReadToNextSibling(), but it advances the reader tothe next possible element (whitespace, commentelement…).

e.

Reading Content

In our previous examples we used value property andthe ReadElementString() method (practice 7) to readcontent from an element. In this section we are goingto see a few more ways to read the content.

a-

ReadInnerXml()

It reads all the XML content inside the current nodeand returns it as a string:MessageBox.Show(myrdr.ReadInnerXml());

b-

ReadString()

35

It returns all the text from the element until anymarkup is encountered.

f.

Writing XML documents

The XmlTextWriter()class represents a writer that provides afast, non-cached, forward-only way ofgenerating streams orfiles containing XML data that conforms to the W3C ExtensibleMarkup Language (XML) 1.0 and the Namespaces in XMLrecommendations.

The following examples illustrate how we can use the class.

Example 1

//no indentation in the xml file. the xml tags are all onthe same line

string

filepath ="C:\\..\\txtwriter1.xml";

XmlTextWriter

mywriter =new

XmlTextWriter(filepath,null);

//indentation is added

mywriter.Formatting =Formatting.Indented;

mywriter.Indentation =2;

mywriter.IndentChar ='\t';

//or writing to the console

//XmlTextWriter mywriter = new XmlTextWriter(Console.Out);

mywriter.WriteStartDocument();

mywriter.WriteComment("chi");

mywriter.WriteStartElement("customerdetails");

mywriter.WriteAttributeString("acccounttype","saving");

//w.WriteElementString("customer", "10");

mywriter.WriteStartElement("accountnumber");

mywriter.WriteString("1");

mywriter.WriteEndElement();

mywriter.WriteStartElement("name");

mywriter.WriteString("2");

mywriter.WriteEndElement();

mywriter.WriteStartElement("city");

mywriter.WriteString("3");

mywriter.WriteEndElement();

36

mywriter.WriteEndElement();

mywriter.WriteEndDocument();

mywriter.Close();

OUTPUT: txtwriter1.xml

<?xml version="1.0"?>

<!--chi-->

<customerdetails acccounttype="saving">

<accountnumber>1</accountnumber>

<name>2</name>

<city>3</city>

</customerdetails>

Example2

string

filepath ="C:\\..\\txtwriter2.xml";

XmlTextWriter

w =new

XmlTextWriter(filepath,null);

w.WriteStartDocument();

w.Formatting =Formatting.Indented;

w.Indentation = 8;

w.WriteStartElement("customersdetails");

//create both anelement and writes its value

w.WriteElementString("customer","10");

w.WriteElementString("sold","100.3");

w.WriteEndElement();

w.WriteEndDocument();

w.Close();

Example3: using filestream

string

filepath ="C:\\..\\txtwriter3.xml";

FileStream

fs =new

FileStream(filepath,FileMode.Create);

XmlTextWriter

mywriter =new

XmlTextWriter(fs,System.Text.Encoding.UTF8);

mywriter.Formatting =Formatting.Indented;

mywriter.Indentation = 4;

mywriter.WriteStartDocument();

mywriter.WriteComment("chi");

mywriter.WriteStartElement("customerdetails");

mywriter.WriteAttributeString("acccounttype","saving");

mywriter.WriteStartElement("accountnumber");

mywriter.WriteString("1");

mywriter.WriteEndElement();

mywriter.WriteStartElement("name");

mywriter.WriteString("2");

mywriter.WriteEndElement();

mywriter.WriteStartElement("city");

mywriter.WriteString("3");

37

mywriter.WriteEndElement();

mywriter.WriteEndElement();

mywriter.WriteEndDocument();

mywriter.Close();

fs.Close();

mywriter.Close();

Practice 8

Develop a windows applicationhaving thefollowing form:

that allows you to export data from anytable of SQL Server into an XML file.

Solution

g.

Includingnamespace

The following example shows how we can include namespaceswith the XmlTextWriter

In the previous section we have provided a method for reading andwriting data by using XmlDocument, XmlReader and XmlWriter.These classes allow you to access the XML documents but theyhardly provide a way to query and retrieve data from XMLdocuments. The System.Xml.XPath class is designed to do just that.

a.

Overview of XPATH

XPath provides

a way to query and select a aprt of an XMLdocument. To work with XPath we have to understand thefollowing:



Location path



Axis



NodeTests



Predicates

Loaction path

Various parts of an XML document such as elements andattributes have a location. The location is indicated by aspecific XPath syntax called thelocation path,whichallows youto select a set of nodes. A location path consists of anaxis, anode test

and

predicates.

The general syntax of a location pathcan be formally defined as follows:

Axis::node-test/Axis::node-test[]/../Axis::node-test[predicate]

We can use one or more Axis::node-test.

41

Axis

An axis is astarting pointto apply node test and predicates.The following table lists the available axis:

NodeTest

It contains a series of nodes, that is, a path.

Predicates

Are Boolean expressions that are used to further filter thenodes selected by the axis and the nodetest.

Examples

Given the following bibilio.xml file:

<?xml

version="1.0"

encoding="utf-8"

?>

<biblio>

<livre

id="2"

pub="pub2">livre1

<titre>t2</titre>

<auteur>a2

<a

id="15">v1

<aa>aa</aa>

</a>

<b></b>

<c

id2="">v2</c>

<titre>t2</titre>

</auteur>

</livre>

42

<livre

id="3"

pub="pub3">

<titre>t3</titre>

<auteur>a3</auteur>

<e>ve</e>

</livre>

<livre

id="4"

pub="pub4">

<titre>t4</titre>

<auteur>a4

<auteur>in_a4</auteur>

</auteur>

<a>it is a

<b>

<auteur>a5</auteur>

</b>

</a>

</livre>

<livre>

<titre>t5</titre>

<auteur>a5</auteur>

</livre>

</biblio>

Evaluate the following XPathlocations:

1.

child::biblio/child::livre[titre='t2']/following::livre

It returns allall the “livre” following the “livre” having a“titre=t2”

child of bibilio child of the document:

<livre

id="3"

pub="pub3">

<titre>t3</titre>

<auteur>a3</auteur>

<e>ve</e>

</livre>

<livre

id="4"

pub="pub4">

<titre>t4</titre>

<auteur>a4

<auteur>in_a4</auteur>

</auteur>

<a>it is a

<b>

<auteur>a5</auteur>

</b>

</a>

</livre>

<livre>

<titre>t5</titre>

<auteur>a5</auteur>

</livre>

43

2.

child::biblio/child::livre[titre='t2']/following::livre[titre='t4']

Itreturns all

the “livre”having “titre=t4”following the“livre” having a “titre=t2”

child of bibilio child of thedocument:

<livre

id="4"

pub="pub4">

<titre>t4</titre>

<auteur>a4

<auteur>in_a4</auteur>

</auteur>

<a>it is a

<b>

<auteur>a5</auteur>

</b>

</a>

</livre>

3.

child::biblio/child::livre[auteur='a3']/child::auteur/ancestor::*

it returns all the ancestorof the “auteur” child of“livre[auteur=a3]” child of bibilio child of document: