2. Divide file into small chunks

This part is a pain in the ass. And it requires a working knowledge of XML (or HTML) markup. Open your favorite text editor. I prefer Coda or Sublime Text, but just about any will do. DO NOT USE MS WORD or any other “word processing” application. It must be a basic text editor like Notepad, etc.

The first 20-ish lines are all instructions. They’re not important. However, the <?xml version="1.0" encoding="UTF-8" ?> tag on the top line is very important. Wherever you see the <rss version="2.0"... and the <channel> tags, that’s where the content starts.

Now, here comes the hard part.

3. Make sure chunks are properly formatted

Back to that working knowledge of markup thing. Every chunk has to start and end with the proper tags. Make sure your chunks follow this pattern:

<?xml version="1.0" encoding="UTF-8" ?>

<rss version="2.0"...

<channel>

<wp:wxr_version>1.2</wp:wxr_version>

Your 5,000-ish lines of content, starting with <item> and ending with </item>

</channel>

</rss>

You can have many, many <item>...</item> tags within your 5,000 lines―and in fact, you definitely should. But your content MUST be wrapped within <item>...</item> tags (plus the others described above) or the import will break.

You will have to copy and paste these tags into the top and bottom of each file (or start with one file like this and paste your content into the middle). It’s a pain and it must be done properly. If you have a problem, it’s probably because this step was done wrong.

Here’s an example XML document I’ve edited to match the pattern above. Download and inspect it in your favorite text editor. Make sure yours looks similar in format.

Drawbacks of this method

My solution isn’t perfect. It has two drawbacks, one bigger than the other: it’s tedious and it might break your data’s relationships.

It’s slow and tedious

It’s a total pain to go through a 100,000-line file and break it into 5,000-line chunks. And, as far as I know, it can’t be automated (at least not by the average WordPress user). But if you’ve got an hour or two and the patience of a saint, the tedium isn’t the end of the world.

It might break your meta data/relationships

This one could be a deal-breaker for some. Here’s the problem: if you have two pages in your site: Page ID #10 and Page ID #20. For whatever reason, you created 20 after 10, but 20 is the parent of 10. Read that carefully: 20, the higher ID and newer page, is actually the parent of 10, the older page. (I know that sounds weird. It doesn’t matter why. Maybe you added a category-level landing page [20] to list all sub-pages [10, 12, 14, etc.] Whatever, just go with me on this.)

Well, if you import a chunk that defines page 10 but cuts off before defining page 20, you’ll be telling WordPress to import/create page 10 and its parent is 20…but you haven’t imported 20 yet. That relationship is invalid. WordPress sees the invalid relationship and just skips right over it. They SHOULD throw an error or at least warn you of the problem, but it’s rare and most people won’t care so they probably just didn’t bother.

Now, when you do get around to importing/creating page 20, that relationship should be restored right? Well, the truth is I don’t know. I think so. But stranger things have happened. And I’ve dealt on more than one occasion with a list of imported pages whose parent/child relationships are all bonkers. So I’m blaming it on this little gem right here.

The real problem/cause

The problem here is not with uploading the XML file. It’s with processing it. Once uploaded, WordPress passes the XML file into a PHP processor script. The processor parses through each line of the XML document and handles the data accordingly. Namely: it inserts records into the database according to the XML doc’s content and meta definitions (author, publish date, status, tags, etc.).

It’s that markup-parse/database-insert process that causes a problem. It just takes too long. 2MB of plain text—like in an XML file—is a TON of text. The server takes forever to process all the data in the 2MB file. And most servers have a timeout limit. They don’t let the process run long enough.

Technically, the failure isn’t WordPress’ fault. It’s the server’s. But of course WordPress’ system files can’t foresee or control that, so they have no way of alerting the user when it happens―there’s no WP error message for that.

So the “2MB is way less than 7MB” is deceiving. The 7MB limit is just for the file upload. It has nothing to do with processing the data contained within the file.

I’ve found that 5,000 lines of XML markup―usually 300-400 KB―is the perfect size for a WordPress import. The file uploads instantly and the PHP import process only runs for 10-20 seconds, well within most server process time limits. The import of each 400KB file runs smoothly and then you can move on to the next small chunk and do it all over again.

Your move, Automattic

I believe that the WP dev team should fix this problem. Granted, they can’t predict the behavior of every server, but they could build some sort of sectioning and redundancy into the import process.

Test the XML document for its size.

If it exceeds, say, 500 KB, run the import in waves.

Automattically (see what I did there) break the large XML doc into chunks.

Lather, rinse, repeat.

WordPress could do this for you. But right now it doesn’t. And I think that’s a shame. Hopefully this tutorial saves some headaches.