Main Menu

Google Ads

Proudly Hosted on

About Me

Hi, I 'm Aditya, the guy behind this website and many other. This site acts as my web playground, where I share all about me, my work and my knowledge.

I have over 8 yrs hands on experience in PHP, Mysql, JavaScript, open sources CMS like Joomla, Wordpress etc. During these 8 years, I have worked on more than 200 projects and/or websites but could not spare time for my blog.

Archive for the ‘XML’ Category

PHP5 and XML

While PHP has offered XML support since its early versions, that support improved exponentially with the introduction of PHP5. Because the PHP4 support for XML was somewhat limited, such as offering only a SAX-based parser enabled by default and the PHP4 DOM not implementing the W3C standard, PHP XML developers reinvented the wheel, so to speak, with PHP5 and complied with commonly used standards.

New for XML in PHP5

PHP5 includes totally rewritten and new extensions, including the SAX parser, the DOM, SimpleXML, XMLReader, XMLWriter, and the XSLT processor. All these extensions are now based on the libxml2.

Along with the SAX support improved from PHP4, PHP5 also supports both the DOM according to W3C standard and the SimpleXML extension. SAX, DOM, and SimpleXML are all enabled by default. If you are familiar with the DOM from other languages, you will have an easier time coding with similar functionality in PHP than before.

Reading, manipulating, and writing XML in PHP5

SimpleXML, in combination where necessary with the DOM, is the ideal choice for developers working with straightforward, predictable, and relatively small XML documents to read, manipulate, and write XML in PHP5.

Of the many APIs available in PHP5, the DOM and SimpleXML are the most familiar, in the case of the DOM, and the easiest to code, in the case of SimpleXML.And for the most common situations, like those you are dealing with here, the most functional.

DOM extension

The Document Object Model (DOM) is a W3C standard set of objects for representing HTML and XML documents, a standard model of how you can combine these objects, and a standard interface for accessing and manipulating them. Many vendors support the DOM as an interface to their proprietary data structures and APIs, which gives the DOM model a lot of authority with developers due to its familiarity. The DOM is easy to understand and utilize since its structure in memory resembles the original XML document. To pass on information to the application, DOM creates a tree of objects that duplicates exactly the tree of elements from the XML file, with every XML element being a node in the tree. The DOM is a tree-based parser. Because DOM builds a tree of the entire document, it uses a lot of memory and processor time. Therefore, performance issues make it impractical to parse large documents with DOM. The key use of the DOM extension in the context of this article is its ability to import SimpleXML format and output DOM format XML, or the reverse, for use as a string or XML file.

SimpleXML

The SimpleXML extension is the tool of choice for parsing an XML document. The SimpleXML extension requires PHP5 and includes interoperability with the DOM for writing XML files and built-in XPath support. SimpleXML works best with uncomplicated, record-like data, such as XML passed as a document or string from another internal part of the same application. Provided that the XML document isn't too complicated, too deep, and lacks mixed content, SimpleXML is easier to code than the DOM, as its name implies. It is also more reliable if you work with a known document structure.

The DOM in action

The DOM is the W3C DOM specification that you work with in a browser and manipulate with JavaScript. It has all the same methods, so you will use familiar coding techniques. Listing 2 illustrates the use of the DOM to create an XML string and XML document, formatted for your viewing pleasure.

SimpleXML in action

The SimpleXML extension is the tool of choice for parsing an XML document. The SimpleXML extension includes interoperability with the DOM for writing XML files and built-in XPath support. SimpleXML is easier to code than the DOM, as its name implies.

For those of you who might be new to PHP, Listing 6 formats a test XML file as an include for your convenience.

Listing 6. Test XML file formatted as a PHP include called example.php in the following code samples

On the other hand, you might want to extract a multi-line address. When multiple instances of an element exist as children of a single parent element, normal iteration techniques apply. Listing 8 demonstrates this functionality.

To compare an element or attribute with a string or pass it into a function that requires a string, you must cast it to a string using (string). Otherwise, by default, PHP treats the element as an object, as Listing 10 demonstrates.

About XML

Extensible Markup Language (XML) is described as both a markup language and a text based data storage format, depending on who you talk to. It is a subset of Standard Generalized Markup Language (SGML); it offers a text-based means to apply and describe a tree-based structure to information. XML serves as the basis for a number of languages/formats, such as Really Simple Syndication (RSS), Mozilla's XML User Interface Language (XUL), Macromedia's Maximum eXperience Markup Language (MXML), Microsoft's eXtensible Application Markup Language (XAML), and the open source Java XML UI Markup Language (XAMJ). As the many flavors of XML demonstrate, XML is a big deal. Everyone wants to get on the XML bandwagon.

Writing XML

XML's basic unit of data is the element. Elements are delimited by a start tag, such as , and an end tag, such as . If you have a start tag, you must have an end tag. If you fail to include an end tag for each start tag, your XML document is not well-formed, and parsers will not parse the document properly. Tags are usually named to reflect the type of content contained in the element. You would expect an element named book to contain a book title, such as Great American Novel (see Listing 1). The content between the tags, including the white spaces, is referred to as character data.

XML element and attribute names can consist of the upper case alphabet A-Z, the lower case alphabet a-z, digits 0-9, certain special and non-English characters, and three punctuation marks, the hyphen, the underscore, and the period. Other punctuation marks are not allowed in names.

XML is case sensitive. In this example, and describe two different elements. Either is an acceptable element name. It's probably not a good idea to use and to describe two different elements, as the possibility of clerical error seems high.

Each XML document contains one and only one root element. The root element is the only element in an XML document that does not have a parent. In the example above, the root element is . Most XML documents contain parent and child elements. The element has one child, . The element has four children, , , and . The element has three child elements, each of which is a element. Each element has two child elements, and .

In addition to the nesting of elements that create the parent-child relationships, XML elements can also have attributes. Attributes are name-value pairs attached to an element's start tag. Names are separated from values by an equal sign, =. Values are enclosed by single or double quotation marks. In Listing 1 above, the element possesses two attributes, "bestseller" and "bookclubs". There are different schools of thought among XML developers about the use of attributes. Most information contained in an attribute could be contained in a child element. Some developers insist that attribute information should be metadata, namely information about the data, and not the data itself. The data itself should be contained in elements. The choice of whether to use attributes or not really depends on the nature of the data and how data will be extracted from the XML.

Strengths of XML

One of XML's good qualities is its relative simplicity. You can write XML with basic text editors and word processors, no special tools or software required. The basic syntax for XML consists of nested elements, some of which have attributes and content. An element usually consists of two tags, a start tag and an end tag, each of which is bracketed by an open and a close < /tag >. XML is case sensitive and does not ignore white space. It looks a lot like HTML, which is familiar to a lot of people, but, unlike HTML, it allows you to name your tags to best describe your data. Some of XML's advantages are its self-documenting, human, and machine-readable format, its support for Unicode, which allows for internationalization in human language support, and its stringent syntax and parsing requirements. Unfortunately, UTF-8 is problematic in PHP5; this shortcoming is one of the forces driving the development of PHP6.

Weaknesses of XML

XML is wordy and redundant, with the attendant consequences of being large to store and a huge consumer of bandwidth. People are supposed to be able to read it, but it's hard to imagine a human trying to read an XML file with 7 million nodes. The most basic parser functionality doesn't support a wide array of data types; therefore, irregular or unusual data, which is common, is a primary source of difficulty.

Well-Formed XML An XML document is well-formed if it follows all of XML's syntax rules. If a document is not well-formed, it is not XML, in a technical sense. An HTML tag such as
is unacceptable in XML; the tag should be written
to be well-formed XML. A parser won't parse XML properly if it is not well-formed. Additionally, an XML document must have one and only one root element. Think of the one root element as being like an endless file cabinet. You have one file cabinet, but there are few limits as to what and how much you can fit into the file cabinet. There are endless drawers and folders into which you can stuff information.