Reducing the memory usage of the XML-RPC Endpoint in WordPress

WordPress uses the XML-RPC remote publishing interface in order to provide a standardized way to transfer data between a 3rd party client, like a mobile app, and the CMS. Nowadays many popular clients use this capability, useful for transferring posts, pages, and comments, but unfortunately it requires a lot of memory on the server side if used to publish picture or video. I wanted to reduce the memory usage during the parsing process of XML-RPC requests in WordPress, and make the upload of big pictures and videos available on the XML-RPC Endpoint of cheap shared hosting installations with strict php memory limit.

A quick recap. WordPress uses a conventional approach to XML-RPC request parsing, it relies on the basic idea that the whole XML-RPC request is loaded into memory ($HTTP_RAW_POST_DATA), and it often stores intermediate values into memory. Since a single XML-RPC request document might have a large picture or a video in it, the parsing process could take a lot of memory. For instance, when you upload a picture of 3Mb, you may need to upload at least 4Mb of data, due to the base64 encoding process that will increase the image size by approximately 35%. Hence the full parsing procedure requires at least 7Mb of memory: a 4Mb variable that holds the XML-RPC document and 3Mb for the variable that holds the data of the image. That’s not good. If you want upload a short video of 100Mb, recorded on your mobile device, you may have to set a really high value for the ‘memory_limit’ option, that could not be possible on shared hosting.

That said, I though the XML-RPC parsing process could be improved by using an XML Streaming parser, that reads the data directly from the input stream, and stores intermediate values on disk. In details I thought to get rid of $HTTP_RAW_POST_DATA, read the request from the input stream php://, parse it by using a Streaming Parser, store intermediate values on disk, and pack these changes into a plugin.

Unfortunately I got through a closed road, since PHP always populates the variable $HTTP_RAW_POST_DATA on POST requests with “text/xml” content-type. Even if you set the directive always_populate_raw_post_data to Off, PHP populates that variable. The only exception are requests with content-type of “application/x-www-form-urlencoded” or “multipart/form-data”. I investigated more, and seems that there isn’t a simple way to get rid of $HTTP_RAW_POST_DATA without modifying the PHP code. Shared hosting installation could not get huge benefits from this plugin, since they can’t install a modified version of PHP, anyway it could help a little bit, because it doesn’t store intermediate value in memory but uses the disk space.

The plugins does the following:

Gets the php input stream (php://input)

Reads from the input stream, and parse the content by chunck (Get rid of $HTTP_RAW_POST_DATA).

Doesn’t store any partial values, or the final parsed value in memory. It uses a tmp file in the tmp directory (Only for base64 data).

Changes the function mw_newMediaObject to use a path to the input file on disk rather accept the whole content as parameter.

Introduces a new function that copy the uploaded file to the right location.

If you want to test it here is the link to the code. (Note: This plugin is only an ‘alpha’ version, do not use in production). I don’t think I will develop it further, since a REST(ful) API, was already published on WordPress.com, and I hope it will be soon adopted on core too.

Special thanks to Luca Ercoli that assisted me with the php core stuff.