Sunday, July 01, 2007

When seeing that the MySQL Proxy was released, I decided to try to experiment with it since I see strong potential with this tool, both for replication and for other uses (recall that I'm a replication guy, so this is my primary focus). I'm actually on vacation, but this will of course not stop me from tinkering with things (I know, I'm just a hopeless case in this aspect ;) ).

After reporting a minor bug, I managed to build and run it with some sample scripts. I'm using Kubuntu Feisty, and had some initial problems, but it was actually pretty straightforward. I'll repeat the steps anyway, in case anybody else have problems.

Get the source from the repository

svn co http://svn.mysql.com/svnpublic/mysql-proxy/ mysql-proxy

Make sure you have all packages necessary. Several of the packages below were not installed for me.

Set up all the stuff necessary for the autotools (Autoconf, Automake, and Libtool)

./autogen.sh

Run configure, but make sure to tell the configuration script that it should use Lua 5.1

./configure --with-lua=lua5.1

Build the proxy

make

Some applications

After having experimented a little, I see some of the potential applications of the MySQL Proxy, but there are things missing to make these scenarios possible. Just to give some ideas, I will just present the ideas and last present what I see as missing pieces to make these possible.

Vertical and horizontal partitioning

If it was possible to parse the query and decompose it into it's fragments, it could be possible to separate the query into two queries and send them off to different servers. The result sets could then be composed to form a new result set that is then delivered to the final client. This can, of course, be accomplished thorough other means, but if you take a look at the next item, you have a variation that is not that simple to handle.

It could be interesting to handle horizontal partitioning in the case that data for, e.g., a user is stored in different machines depending on geographic location. This is something that is interesting for companies like Flickr, Google, and YouTube since contacting a server near yourself geographically significantly improves response times.

BLOB Streaming

In order to handle BLOB streaming, as I outlined in a previous post, it could be left to the proxy to build a final result set where the BLOB Locators (URIs in my example) are replaced with the actual BLOBs by contacting a dedicated BLOB server that holds the BLOB and building the final result set inside the proxy.

Pre-heating slave threads

By placing the proxy in between a master and a slave, it could be possible to pre-heat the slave by issuing a SELECT query through a client connection to the slave. This is the oracle algorithm presented by Paul Tuckfield from YouTube, but the solution in this case is simpler since it is not necessary to read the events from the relay log, but they can instead be caught "in the air" and acted upon.

This actually requires some parsing of the replication stream, so it might be better to handle this by embedding Lua or Perl into the slave threads. (Personally, I would prefer Perl. Not because it is a better language, or easier to embed, but because I've been using Perl on an almost daily basis since 1988. OTOH, Lua is designed to be efficient and map internal structures to the language and seems very easy to work with, so we'll see what happens.)

The missing pieces

Others have focused on what can be done with the MySQL Proxy, but I see some omissions that, if implemented, would turn the MySQL Proxy into an incredibly flexible tool.

It should be possible to parse and rewrite a query before sending it to the server. This is already possible, but a library to parse and build SQL queries would help a lot here.

It should be possible to rewrite the result set in a convenient manner. Jan just added the ability to rewrite the packet that is sent back, so the proxy is heading in that direction, but a more convenient interface would be an advantage here as well.

It should be possible to keep connections to several servers active, and decide what server to send the query to on a per-query basis (even sending queries to all servers, or different queries to different servers). AIUI, there is some rudimentary support for it right now, but not to the extent that I describe here.

It should be possible to send several query-result sequences to servers for each query-result sequence sent to the proxy. This will make it possible to act on the result of a response to one server, and dynamically decide, e.g., what server to contact next in order to get the data that forms the final result set.

All-in-all, the MySQL Proxy is showing incredible potential and I, for one, will see what I can contribute with in order to make it even better.