I wrote a couple of functions to: - create a DOMDocument from a file - parse the namespaces in it - create a XPath object with all the namespaces registered - load the schemalocations - validate the file on the main schema (the one without prefix)

I found the xml2array function below very useful, but there seems to be a bug in it. The $item variable was never getting set. I've expanded this out to be a bit more readable, and the corrected code is :

Most email clients ignore stylesheets in HTML formatted emails. The best way to ensure your HTML is formatted correctly by a broad spectrum of email clients, including webmail implementations as Gmail, is to use inline style attributes. The following function uses DOM to parse an inline stylesheet, and will replace element class and id attributes with inline style attributes, and add inline style attributes for generic tag stylesheet rules. It will remove the stylesheet and any used class and id attributes as these are defunct for most email clients. It is a fairly lightweight function and does not support CSS inheritance, but will work for simple stylesheets e.g.:

The project I'm currently working on uses XPaths to dynamically navigate through chunks of an XML file. I couldn't find any PHP code on the net that would build the XPath to a node for me, so I wrote my own function. Turns out it wasn't as hard as I thought it might be (yay recursion), though it does entail using some PHP shenanigans...

// Recursively get the XPath for the parent.return( getNodeXPath( $parentNode ) . "/{$node->nodeName}[{$$nodeName}]" ); } else {// Hit the root node! Note that the slash is added when // building the XPath, so we return just an empty string.return( "" ); } }?>

Being an experienced ASP developer I was wondering how to replace textual content of a node (with msxml this is simply acheived by setting the 'text' property of a node). Out of frustration I started to play around with SimpleXml but I could not get it to work in combination with xPath.

Note that these DOM functions expect (and presumably return) all their data in UTF-8 character encoding, regardless of what PHP's current encoding is. This means that text nodes, attribute values etc, should be in utf8.

This applies even if you're generating an XML document which is not ultimately in utf8.

This module is not included by default either in the CentOS 4 "centosplus" repository. For those using PHP5 on CentOS 4, a simple "yum --enablerepo=centosplus install php-xml" will do the trick (this will install both the XML and DOM modules).

Some key examples:* concise summary of the class heirachy (1.1.1)* clarification that DOM level 2 doesn't allow for population of internal DTDs* explanation of DOMNode->normalize()* explanation of the DOMImplementation class

will result in PHP/DOM downloading the DTD file from W3C site when parsing your document. It will add extra delay to your script - I experienced that $dom->load()'s total time was from 1 to 16 seconds.

I wrote a framework to implement the StyleSheet interfaces as specified on the W3C website. The code is written in PHP, and is NOT a complete implementation. Use it how ya like. I was planning on adding the CSSStyleSheet interfaces as well. Feel free to ask.

A function among several others to parse a google results page, I wrote this some time ago - google has probably changed their site since then, but I thought this might be helpful to someone.

I'm moving servers, but I will probably throw this up on my blog when I get it back up.

<?php

function googleResult($listItem) {// given a LIST ITEM element, this will validate, and return an array for that LI entry as an inline result from google. /* * <li class='g w0'> * <h3 class='r'> * <a href='the URL' class='l'> * Description <em>description</em> * </a> * </h3> * </li> * UPDATE: This function will now look for any subcontainer that has an href, it doesn't have to be an H3 this will make it work with a few more formatted search results. */

I had the hardest time updating a complex XML document. Here's a quick example on how to do it.

<?php

// Load the XML from a file.$xml = "a2062.xml"; // This is an XFDL form previously unencoded and ungzipped.$dom = new DOMDocument();$dom->preserveWhiteSpace = false;$dom->Load($xml);

// Create an XPath query.// Note: you must define the namespace if the XML document has defined namespaces.$xpath = new DOMXPath($dom);$xpath->registerNamespace('xfdl', "http://www.PureEdge.com/XFDL/6.5");

if ($testNode->nodeName==$node->nodeName and $testNode->parentNode->isSameNode($node->parentNode) and $testNode->childNodes->length>0) {//echo "{$testNode->parentNode->nodeName}-{$testNode->nodeName}-{}<br/>";$nodeTagIndex++; }

// append discards the DOMDocumentFragment and just adds its child nodes, but ownerDocument is maintained.echo get_class($el)."<br/>";echo get_class($doc->documentElement)."<br/>";echo "<xmp>".$doc->saveXML()."</xmp>";?>

In response to..."If you create your own custom element extending DOMElement and append him in place of the document element, you cannot access to any new members newly defined in your custom class via DOMDocument::$documentElement."

... it is not a bug, it is a feature. The DOMDocument::$documentElement property name may be misleading but according to the DOM Level 2 Core specification it is a convenience attribute meant to access the root element of your DOMDocument. See here: http://www.w3.org/TR/DOM-Level-2-Core/core.html#i-Document

When trying to extend the DOMDocument and DOMElement classes, I found a very annoying bug concerning DOMDocument::$documentElement.

If you create your own custom element extending DOMElement and append him in place of the document element, you cannot access to any new members newly defined in your custom class via DOMDocument::$documentElement.

In my situation, I cannot use DOMDocument::registerNodeClass() because the document element is not necessarily the base class for all the elements in my document.

After searching around, I found a pretty odd way to fix this problem. It seems that you have to stock a reference to your appended document element in an user-defined (and persistent) variable (in other words, not only in DOMDocument::$documentElement). See below:

The following can take a XML_TEXT_NODE node and return the contents in an array. Yanick's contribution rocks - butit overwrote with duplicates only keeping the last linein the returned array. All the other functions i tested from various sources failed to handle text nodes correctly. Hope this helps someone. It is adapted from code on this site.

Also, the function performs no type of error checking on your array and will throw a DOMException if a key value you used in your array contains invalid characters for a proper DOM tag. This function is untested for "deep" multidimensional arrays.

I developed a group of functions that make it very easy to extract any information you want from any page you load from the internet. All based on the DOMDocument object.You can read the entire documentation and download the source code here:http://www.tintetoner-shop.de/DomUtilities/