User Contributed Notes 20 notes

The "XML2Assoc" functions noted here should be used with caution... basically they are duplicating the functionality already present in SimpleXML. They may work but they won't scale.

Their are two main uses cases for parsing XML, each suited to either XMLReader or SimpleXML.

1. SimpleXML is an excellent tool for easy access to an XML document tree using native PHP data types. It starts to flounder with massive (> 50M or so) XML documents, as it reads the entire document into memory before it can be processed. SimpleXML will just laugh at you then die when your server runs out of memory (or it will cause a load spike).

2. Aside from the reasoning behind massive XML documents, if you have to deal with massive XML documents, use XMLReader to process them. Don't try and gather an entire XML document into a PHP data structure using XMLReader and a PHP xml2assoc() function, you are reinventing the SimpleXML wheel.When parsing massive XML documents using XMLReader, gather the data you need to perform an operation then perform it before skipping to the next node. Do not build massive data structures from a massive XML document, your server (and it's admins) will not like you.

Take care about how to use XMLReader::$isElementEmpty. I don't know if it is a bug or not, but $isElementEmpty is set for the current context and NOT just for the element. If you move your cursor to an attribute, $isElementEmpty will ALWAYS be false.

Improved algorithm based on Sergey Aikinkulov's. The problem was that it would overwrite nodes if they had the same tag name. Because of that <a><b/><b/><a> would be read as if <a><b/><a/>. This algorithm handles it better and outputs an easy to understand array:

As japos mentioned. Take care how you use isEmptyElement. After you are done looping through the attributes: isEmptyElement will be false. You can use moveToElement() to move the cursor back to the element and then you can use isEmptyElement like normal again.

Sometimes you have an unusual URL that doesn't actually point to an xml file but still returns xml as output (Like the Battlefield Heroes generated syndication urls). Using get_file_contents(url) you can retrieve the xml data from these urls and pass it as a variable for processing as an XML String.

Unfortunately simpleXML or xml DOM cannot process all xml strings. Some have error boxes added to the end of them (such as Battlefield Heroes syndicated news). These boxes cause an end of file sort of error and closes out the script. XMLReader grabs data from these strings without error.

For those of you getting xml files that do not contain duplicate elements (in the same element), the following converter converts to arrays with key/value mapping (thus overwriting duplicate elements!):