Background

On server this works like getting a bunch of functions from your application and exporting it with HTTP.

On client this works like connecting to a XML-RPC server, finding out what function it delivers and constructing a so called server proxy — an object having a method for every function exported by an XML-RPC server.

Calling the methods of the server proxy connects to the server using HTTP, passes arguments and transport the result back to the client. So basically this works AS you have a remote located object locally available.

The data encoding between client and server is defined in XML-RPC specification and is a language based on XML (but you actually never touch it, the XML is converted to objects by libraries).

Overview

We want to run an XML-RPC server exposing a class in PHP and an XML-RPC client in Python to communicate with the XML-RPC server.

Traditionally we would need to have an HTTP server for the PHP XML-RPC server, because HTTP is used as the XML-RPC transport. But digging a bit into the specification, you'll discover, that none HTTP-specific parts of the protocol are used. It's just used as a line to transport the XML data.

So you may wonder if it's possible to use XML-RPC with transport other than HTTP. In short, yes. But you may need to hack around the XML-RPC libraries (because they usually suppose you'll want to use HTTP).

Set_include_path line adds the /path/to/zf/library directory to PHP path, so you can import the Zend_XmlRpc_Server class (located in /path/to/zf/library/Zend/XmlRpc/Server.php file).

Then there is an instance of Zend_XmlRpc_Server created, then there is MyClass attached as the class for myclass XMLRPC namespace. This means the repeat method is to be called via the XML-RPC as myclass.repeat.

If you place the file on your server and have it under some URL, for example:

Omitting the HTTP protocol

Probably you have both Python and PHP scripts to be run on the same machine, so the HTTP part is quite useless and an additional point of failure.

As I already stated, the HTTP is only a transport and you can replace it (with some cost) with some other transport.

I came into an idea to use stdout/stdin as the transport, so Python would execute a PHP script (command line interface) and pass the XML-RPC request to the script's stdin. PHP would then have to get the XML-RPC request from stdin instead of from HTTP request.

The change is passing an instance of Zend_XmlRpc_Request_Stdin to $server->handle(). This is all needed. Guys from Zend Framework already predicted such a use.

Then, the client part.

Xmlrpclib allows passing a custom transport in case you want to implement some proxies or other thing. We'll make a transport, that instead of making a HTTP connection, runs a PHP script, passes the request to its stdin and gets the response from stdout:

Only public methods are exposed to the XML-RPC clients, so you can hide some logic inside of private or protected methods and only expose what you need from given classes.

This solution is a quick way to actually use some of your well-working PHP code in your fancy-new and elegant Python application. This can help if you want to make a filesystem with Python-FUSE, but want to data be taken from PHP application.

Did it help you?

This seemed quite easy at first having nice Lucene implementation in PHP — included in Zend Framework and indeed during tests it was fast, simple and powerful. But this was tested on about 100,000 documents (document is a Wikidot page or forum thread) and we have about 2,500,000 documents in Wikidot now. And this is where the problem begins.

After indexing roughly 1,800,000 documents there were some problems with memory consumed by the indexing process (500 MB merory limit was not enough in SOME cases).

Even earlier I realized that the search times weren't good enough. This is why I implemented the searching part in Java, which is the native platform for the Lucene indexer. This sped things up.

Do you think indexing a document in just a second is fast? I though this is a good result. Indexing a document takes about 0.2 s when having small amount of documents in the index already. But when you have 400,000 documents in index, adding another document to the index takes about 0.4 s. And having even this "good" indexing time (below a second), indexing the whole Wikidot would take at least a few days.

This leads me to a conclusion, that Wikidot is really BIG.

A similar situation also applied to the user uploaded files. There was a problem of a limit of filesystem reached, which was about 32,000 directories max in a single directory. Having all user-uploaded files in a directory structure of one-directory-per-wiki, this resulted in a problem when having more than 32,000 wikis.

Replicating this structure to another machine (also known as live-backup of user-uploaded files) was also quite a challenge, because we've reached a limit of directory watches in the kernel-level filesystem-monitoring system (inotify).

It all shows, that things that seem easy are not necessarily easy because of the high scale of the Wikidot, which touches some limits on nearly every piece of software we use. But this is also a great chance to really test those projects and how they react to such a high load.