For a couple of years the code below has been working OK parsing an external XML feed and inserting data into a database. The size of the file has grown considerably since the date on which the script was written and I find an increasing amount of errors (PHP Warning: SimpleXMLElement::__construct() warning : failed to load external entity) logged on the server.

I would like to ask for advice and assistance in updating what I have to hopefully improve its performance. The server that supplies the feed does seem to take a while to return any content, presumably it has to receive a request for the specific content for the user then render the XML and return it. In a browser it can take up to 10 seconds to do this even if the request is set to return just one set of results. I am guessing that this is the reason for the errors, the whole thing must time out before the feed is loaded into memory.

PHP's [XMLReader extension is probably what you want. It lets you parse XML without having to load the whole file into memory. You might also find this [URL="http://www.sitepoint.com/forums/showthread.php?720925-Parsing-a-large-(1-GB)-nested-XML-file"]sitepoint thread](http://php.net/manual/en/book.xmlreader.php) helpful reading.

ColinHughes
—
2013-07-05T22:40:35Z —
#3

Thanks for the reply.

I did actually read the thread you suggested, interestingly it was Anthony Sterling who helped me put together the existing code.

Can I combine XMLReader and simpleXML I wonder? The latter seems so much easier to work with.

fretburner
—
2013-07-05T22:46:44Z —
#4

ColinHughes said:

Can I combine XMLReader and simpleXML I wonder? The latter seems so much easier to work with.

That seems to be what they're doing in the last code examples in the thread - using XMLReader to navigate the whole file, and SimpleXML to load in small sections to process. What's the structure of the XML file you're working with? Is it deeply nested, or the size is just down to the sheer number of records?

ColinHughes
—
2013-07-05T23:00:36Z —
#5

If I had scrolled down to the next example I would have noticed that I will take a look at it and see if it can be of use. Thanks.

ColinHughes
—
2013-07-08T16:07:03Z —
#6

OK so I now have the following which is working. Appreciate a second or even third pair of eyes to check it over though

Plus there is a delete statement added onto the end of all that. I hope you can appreciate why I wanted to speed this whole thing up now! I really need some guidance to turn this whole thing into an efficient script.

TIA

Colin

fretburner
—
2013-07-08T17:39:23Z —
#7

Hi Colin,

I think I'd be tempted to move the DB queries to separate functions to keep the main loop tidy and easy to read:

Thanks Fretburner I will take a look at this and report back. Hang in there please, I appreciate your help.

Colin

ColinHughes
—
2013-07-09T14:46:49Z —
#9

Sorry but this is not working. The loop through the property nodes does not work.

fretburner
—
2013-07-09T14:48:58Z —
#10

ColinHughes said:

Sorry but this is not working. The loop through the property nodes does not work.

Hi Colin, do you have any more details? What exactly is the error?

ColinHughes
—
2013-07-09T15:07:52Z —
#11

If I echo out the sql it just returns 1 property instead of multiples.

fretburner
—
2013-07-09T15:19:16Z —
#12

ColinHughes said:

If I echo out the sql it just returns 1 property instead of multiples.

Sure, because you're looping through the XML file and pulling out one property to process at a time. Are you saying that you want to batch up the database queries to execute them together?

ColinHughes
—
2013-07-09T15:32:46Z —
#13

OK I have an error with the images.

There was an error adding images [You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(' ', 'http://someimages.com/image' at line 1]

The property id (just shows as ' ') is not being inserted into the query, so I think the error is because the on duplicate update fails.

query: (' ', 'http://someimages.com/image.asp?Id=X1000227', '0', '4491-2') Not sure why the propertyid is not being inserted because it forms the first part of the PID '4491-2' and that is clearly working?

fretburner
—
2013-07-09T15:36:25Z —
#14

Sorry, my fault.. it's down to a typo in the code I posted. In the add_images function

I spent ages looking through this and even when you showed me it I still had to look hard to see the missing underscore LOL. I still get the error though?

There was an error adding images [You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '('4491', 'http://someimages.com/image' at line 1]

fretburner
—
2013-07-09T16:14:40Z —
#16

Is it possible to post the whole SQL statement that is causing the problem?

Thanks to fretburner for the guidance. I will test this later to see if it performs faster than the old simplexml only parser. If you see a mistake of feel that you could improve on this then please chip in. I need this to run as fast and as efficiently as possible.

Thanks

ColinHughes
—
2013-07-09T20:29:38Z —
#19

Almost there but I have a problem with deleting sold properties. Code below:

The main problem is that you're looping over all the properties that need deleting, but you're only actually executing the last query as your $db->query() call is outside of the loop.

Also, I'm not sure if your first query will work correctly, as you're calling the mysqli_query but passing in $db which is a mysqli object. It would probably be better to be consistent and stick with the OO interface:

It's a little more complicated to handle query errors, as you're executing more than one query. You could always return an array of any errors from the function so you can log them or output them to the screen from your main script;