The basic installation instructions are suitable for development by a single user, but when setting up Galaxy for a multi-user production environment, there are some additional steps that should be taken for the best performance.

Uses SQLite (a serverless database), so you don’t have to run/configure a database server for quick or basic development. However, while SQLite supports concurrent access it does not support multiple concurrent writes, which can reduce system throughput.

Uses a built-in HTTP server, written in Python. Much of the work performed by this server can be moved to nginx or Apache, which will increase performance.

Runs all tools locally. Moving to a cluster will greatly increase capacity.

Galaxy ships with this default configuration to ensure the simplest, most error-proof configuration possible when doing basic development. As you’ll soon see, the goal is to remove as much work as possible from the Galaxy process, since doing so will greatly speed up the performance of its remaining duties. This is due to the Python Global Interpreter Lock (GIL), which is explained in detail in the Advanced Configuration section.

Many of the following instructions are best practices for any production application.

Create a NON-ROOT user called galaxy. Running as an existing user will cause problems down the line when you want to grant or restrict access to data.

Start with a fresh checkout of Galaxy, don’t try to convert one previously used for development. Download and install it in the galaxy user home directory.

Galaxy should be a managed system service (like Apache, mail servers, database servers, etc.) run by the galaxy user. Init scripts, OS X launchd definitions and Solaris SMF manifests are provided in the contrib/ directory of the distribution. You can also use the --daemon and --stop-daemon arguments to run.sh to start and stop by hand, but still run detached. When running as a daemon the server’s output log will be written to paster.log instead of the terminal, unless instructed otherwise with the --log-file argument.

Give Galaxy its own database user and database to prevent Galaxy’s schema from conflicting with other tables in your database. Also, restrict Galaxy’s database user so it only has access to its own database.

Make sure Galaxy is using a clean Python interpreter. Conflicts in $PYTHONPATH or the interpreter’s site-packages/ directory could cause problems. Galaxy manages its own dependencies for the framework, so you do not need to worry about these. The easiest way to do this is with a virtualenv:

The steps to install Galaxy mostly follow those of the regular instructions at Admin/GetGalaxy. The difference is that after performing the groundwork above, you should initialize the configuration file (cpconfig/galaxy.ini.sampleconfig/galaxy.ini) and modify it as outlined below before starting the server. If you make any changes to this configuration file while the server is running, you will have to restart the server for the changes to take effect.

Two options are set in the sample config/galaxy.ini which should not be enabled on a production server. You should set both to False:

debug=False - Disable middleware that loads the entire response in memory for displaying debugging information in the page. If left enabled, the proxy server may timeout waiting for a response or your Galaxy process may run out of memory if it’s serving large files.

Disable filter-with=gzip. Leaving the gzip filter enabled will cause UI failures because of the way templates are streamed once debug is set to False. You will still be able (and are encouraged) to enable gzip in the proxy server.

During deployment, you may run into problems with failed jobs. By default, Galaxy removes files related to job execution. You can instruct Galaxy to keep files of failed jobs with: cleanup_job=onsuccess

The most important recommendation is to switch to an actual database server. By default, Galaxy will use SQLite, which is a serverless simple file database engine. Since it’s serverless, all of the database processing occurs in the Galaxy process itself. This has two downsides: it occupies the aforementioned GIL (meaning that the process is not free to do other tasks), and it is not nearly as efficient as a dedicated database server. There are other drawbacks, too. When load increases with multiple users, the risk of transactional locks also increases. Locks will cause (among other things) timeouts and job errors. If you start with SQLite and then later realize a need for a database server, you’ll need to migrate your database or start over. Galaxy does not provide an internal method to migrate data from SQLite, and although free conversion tools are available on the web, this process is non-trivial.

For this reason, Galaxy also supports PostgreSQL and MySQL. PostgreSQL is much preferred since we’ve found it works better with our DB abstraction layer, SQLAlchemy.

To use an external database, you’ll need to set one up. That process is outside the scope of this document, but is usually simple. For example, on Debian and Redhat-based Linuxes, one may already be installed. If not, it should be an apt-getinstall or yuminstall away. On Mac OS X, there are installers available from the PostgreSQL website.

Once installed, create a new database user and new database which the new user is the owner of. No further setup is required, since Galaxy manages its own schema. If you are using a UNIX socket to connect the application to the database (this is the standard case if Galaxy and the database are on the same system), you’ll want to name the database user the same as the system user under which you run the Galaxy process.

To configure Galaxy, set database_connection in Galaxy’s config file, config/galaxy.ini. The syntax for a database URL is explained in the SQLAlchemy documentation.

It’s worth noting that some platforms (for example, Debian/Ubuntu) store database sockets in a directory other than the database engine’s default. If you’re connecting to a database server on the same host as the Galaxy server and the socket is in a non-standard location, you’ll need to use these custom arguments (these are the defaults for Debian/Ubuntu, change as necessary for your installation):

If you are using MySQL and encounter the “MySQL server has gone away” error, please note the database_engine_option_pool_recycle option in config/galaxy.ini. If this does not solve your problem, see this post on the Galaxy Development mailing list.

If you are using MySQL please make sure the database output is in UTF-8, otherwise you may encounter python TypeErrors.

Galaxy contains a standalone web server and can serve all of its content directly to clients. However, some tasks (such as serving static content) can be offloaded to a dedicated server that handles these tasks more efficiently. A proxy server also allows you to authenticate users externally using any method supported by the proxy (for example, Kerberos or LDAP), instruct browsers to cache content, and compress outbound data. Also, Galaxy’s built-in web server does not support byte-range requests (required for many external display applications), but this functionality can be offloaded to a proxy server. In addition to freeing the GIL, compression and caching will reduce page load times.

Galaxy is a framework that runs command-line tools, and if properly configured, can run these tools on a compute cluster. Without a cluster, you’ll be limited to the number of cores in your server, minus those needed to run Galaxy itself. Galaxy currently supports TORQUE PBS, PBS Pro, Platform LSF, and Sun Grid Engine clusters, and does not require a dedicated or special cluster configuration. Tools can even run on heterogeneous cluster nodes (differing operating systems), as long as any dependencies necessary to run the tool are available on that platform.

Using a cluster will also net you a fringe benefit: When running tools locally, they are child processes of the Galaxy server. This means that if you restart the server, you lose contact with those jobs, and they must be restarted. However on the cluster, if the Galaxy server restarts, the jobs will continue to run and finish. Once the Galaxy job manager starts up, it’ll resume tracking and finishing jobs as if nothing had happened.

Configuration is not difficult once your cluster is set up. Details can be found on the cluster page.

When datasets are deleted from a history or library, it is simply marked as deleted and not actually removed, since it can later be undeleted. To free disk space, a set of scripts can be run (e.g. from cron) to remove the data files as specified by local policy. See the Purge histories and datasets page for instructions.

File sizes have grown very large thanks to rapidly advancing sequencer technology, and it is not always practical to upload these files through the browser. Thankfully, a simple solution is to allow Galaxy users to upload them via FTP and import those files in to their histories. Configuration for FTP is explained on the File Upload via FTP page.

As already mentioned, unloading work from the Galaxy process is important due to the Python Global Interpreter Lock (GIL). The GIL is how Python ensures thread safety, and it accomplishes this by only allowing one thread to control execution at a time. This means that regardless of the number of cores in your server, Galaxy can only use one. However, there’s a solution: Run multiple Galaxy processes and use the proxy server to balance across all of these processes. In practice, Galaxy is split into job handler and web server processes. Job handlers do not service any user requests directly via the web. Instead, they watch the database for new jobs, and upon finding them, handle the preparation, monitoring, running, and completion of them. Likewise, the web server processes are free to deal only with serving content and files to web clients.

Full details on how to configure scaling and load balancing can be found in the scaling documentation.

For those readers who’ve already been running Galaxy on a cluster, a bit of information was recently added to the cluster documentation regarding running the data source tools on the cluster (contrary to the default configuration). Running all tools on the cluster is strongly encouraged, so if you have not done this, please check out the new information.

PostgreSQL can store results more efficiently than Galaxy, and as a result, reduce Galaxy’s memory footprint. When a query is made, the result will remain on the Postgres server and Galaxy can retrieve only the rows it needs. To enable this, set database_engine_option_server_side_cursors=True in the Galaxy config.

If your server logs errors about the database connection pool size, you may need to increase the default minimum and maximum number of pool connections, 5 and 10. These config file options are database_engine_option_pool_size and database_engine_option_max_overflow.

Finally, if you are using Galaxy <= release_2014.06.02, we recommend that you instruct Galaxy to use one database connection per thread, to avoid connection overhead and overuse. This can be enabled with database_engine_option_strategy=threadlocal.

By default, Galaxy receives file uploads as a stream from the proxy server and then writes this file to disk. Likewise, it sends files as a stream to the proxy server. This occupies the GIL in that Galaxy process and will decrease responsiveness for other operations in that process. To solve this problem, you can configure your proxy server to serve downloads directly, involving Galaxy only for the task of authorizing that the user has permission to read the dataset. If using nginx as the proxy, you can configure it to receive uploaded files and write them to disk itself, only notifying Galaxy of the upload once it’s completed. All the details on how to configure these can be found on the Apache and nginx proxy instruction pages.