Saturday, May 31, 2008

Performance Testing of Distributed File Systems at Google

Posted by Rajat Jain and Marc Kaplan, Infrastructure Test Engineering

Google is unique in that we develop most of our software infrastructure from scratch inside the company. Distributed filesystems are no exception, and we have several here at Google that all serve different purposes. One such filesystem is theGoogle File System(GFS) which is used to store almost all data at Google. Although, GFS is the ultimate endpoint for much of the data at Google, there are many other distributed file systems built on top of GFS for a variety of purposes (seeBigtable, for example -- but several others also exist) with developers constantly trying to improve performance to meet the ever-increasing demands of serving data at Google. The challenge to the teams testing performance of these filesystems is that running performance tests, analyzing the results, and repeating over and over is very time consuming. Also, since each filesystem is different, we have traditionally had different performance testing tools for the different filesystems, which made it difficult to compare performance between the filesystems, and led to a lot of unnecessary maintenance work on the tools.

In order to streamline testing of these filesystems, we wanted to create a new framework that is capable of easily performance testing the filesystems at Google. The goals of this system were as follows:

Generic:The testing framework should be generic enough to test any type of file-system inside Google. In having a generic framework, it will be easier to compare the performance of different filesystems across many operations.

Ease of use:The framework should be easy enough to use so that software developers can design and run their own tests without any help from the test team.

Scalable:Testing can be done at various scales depending on the scalability of the FS. The framework can issue any number of operations simultaneously. So, for a testing a Linux file system, we might only issue 1000 parallel requests, while for the Google File System, we might want to issue requests at a much larger scale.

Extensible:

Firstly, it should be easy to add a new kind ofoperationin the framework, if its developed in future (For example,RecordAppendoperation in GFS).

Also, the framework should allow the user to easily generate complex types of loadscenarioson the server. For example, we might want to have a scenario in which we issue FileCreateoperations simultaneously withRead,Write, andDeleteoperations. Thus, we want a good mix of operations but not in a randomized way, so that we can have benchmark results.

Unified testing:The framework should be stand-alone or independentieit should be a one-stop solution to setup, run the tests and monitor the results.

We developed a framework which allows us to achieve all the above mentioned goals. We used the Google's genericFileAPI for writing the framework, since every file system can be tested just by changing thefile namespacein which the testing data will be generated (e.x. /gfs vs. /bigtable). Following Google's standard, we developed aDriver+Workersystem. TheDriverco-ordinates the overall test, by reading configuration files to set up the test, automatically launching different number of workers depending on the load, monitoring the health of workers, collecting performance data from each worker and calculating the overall performance. TheWorkerclass is the one which loads the file systems with appropriate operations. A worker is an abstract class and a new child class can be created for each file operation, which gives us the flexibility to add any operation we want in the future. A separateWorkerinstance is launched on a different machine depending on the load that we want to generate. It is simple to run more or less workers on remote machines simply by changing the config file.

The test is divided into various phases. In a phase, we can run a single operation N number of times (with a given concurrency) and collect performance data. So, we can run a create phase followed by a write phase followed by a read phase. We can also have multiple sub-phases inside a phase, which gives us the ability to generate many different simultaneous operations on the system. For example, in a phase, we might add three subphases create, write and delete, which will issue all the different kinds of operations simultaneously on remote client machines against the distributed filesystem.

It is instructive to look at an example config file for an idea of how the load is specified against this filesystem:

So in the example above, we launch 200 shards (which all run of different client machines) that all do creates of files with a prefix of metadata_perf, and suffixes based upon the index of the worker shard. In practice, the user of the performance test passes a flag into the performance test binary that specifies a base path to use: i.e. /gfs/cell1/perftest_path, and the resulting files will be /gfs/cell1/perftest_path/worker.i/metadata_perf.j, for i=1 until i=#shards, and j=1, until j=count.

In the example above, we simultaneously do stats and opens of the files that were initially created in the create phase. Different workers execute these, and then report their results to the driver.

On conclusion of the test, the driver prints a performance test results report that details the aggregate results of all of the clients, in terms of MB/s for data intensive ops, ops/s for metadata intensive ops, and latency measures of central tendency and dispersion.In conclusion, Google's genericFileAPI, use of Driver & Workers andthe concept of phaseshave been very useful in the development of the performance testing framework and hence making performance testing easier. Almost as important, the fact that this is a simple script-driven method of testing complex distributed filesystems has resulted in an ease of use that has given both developers and testers, the ability to quickly experiment and iterate, resulting in faster code development and better performance overall.

5 comments:

I'm not sure I understand. Are you saying you've developed a scriptable framework that sits over the tops of an existing API? Is the Google File API the workhorse in this case? Are you able to shed some light on how you actually apply load to the filesystems below... For example, I had this same problem in comparing alternative vendor solutions for a SAN, in the sense of establishing a credible benchmark for comparison. Is this what your framework resolves ie. the point of comparison against many different file systems? Would be keen to find out more.

This is not a scriptable framework that sits on the top of Google File API. Instead it leverages the fact that Google's File API is generic to handle all kinds of file systems inside Google. Hence all the file operations in our system uses the Google's File API.

The load applied to the file system is in the form of the config file, samples of which have been mentioned in the post.

This tool is indeed being used to benchmark performance numbers, which can be used to compare different file systems.

Ok Rajat, thanks for that. I think I understand. The linkage I was missing is how a config file actually triggers the load to occur. I'm assuming the Google File API does this work in this case. I also assume the API is proprietary.

In my situation I've had to rely on 3rd party tools such as iometer to apply load to a filesystem. And this is where I've had difficulty achieving parity in results, in terms of comparison. For example if I was to apply load to 2 different vendor solutions for a SAN (both with differing architectures), I've found it difficult to 'compare' benchmark results, as each vendor can argue that the manner in which load is applied differs. I think because you have a single API this makes your comparison easier.

What caught my attention with your post is that you're comparing results for distributed file systems (implying different architectures) and that you'd somehow found a way to provide comparable results across all. I think what I'm reading here is that the G file API is what is making that comparison possible, and that your config files have just simplified in effect the orchestration of a load test.