User Contributed Notes 6 notes

libxml2 contains much more useful method readString() that will read and return whole text content of element. You can call it after receiving start tag (XMLReader::ELEMENT). You can use this PHP code to emulate this method until PHP will directly call underlying libxml2 implementation.

This kind of worked in that I ended up with an array of all the data I wanted, but the array I constructed was twice as large as I expected and every other entry was empty. Took me a while to debug, but finally figured out that checking <?php $xml->name === 'row' ?> matches both <row> and </row>, so the check should really be something more like:

> I would have liked to use the next() function instead, but as I needed to parse 2 different subtrees, I couldn't figure out how to find all the columns, reset the pointer, and then find all the rows.

If like myself you have been turning the interwebz upside down looking for a solution for this issue:PHP Warning: XMLReader::read(): /tmp/xml_feed.xml:4183934: parser error : Input is not proper UTF-8, indicate encoding !

For some reason, this warning breaks the execution - is it a fatal error in disguise?

After days of frustration I found it!!!!tidy -xml -o output.xml -utf8 -f error.log input.xml

You can invoque tidy using exec, It takes several seconds to convert a 250Mb feed, but it worthy the time.

In my case the issue was with latin1 charset, and for some reason I had to pass the xml through tidy 2 times - first time around creates new errors, second time it fixes everything.

I know invalid xml should be fixed by xml creators, but it works differently in the real world.