Most of high traffic or complex Drupal sites use Apache Solr as the search engine. It is much faster and more scaleable than Drupal's search module.

In this article, we describe one way of many for having a working Apache Solr installation for use with Drupal 7.x, on Ubunutu Server 12.04 LTS. The technique described should work with Ubunut 14.04 LTS as well.

In a later article, now published at: article, we describe how to install other versions of Solr, using the Ubuntu/Debian way.

Objectives

For this article, we focus on having an installation of Apache Solr with the following objectives:

Use the latest stable version of Apache Solr

Least amount of software dependencies, i.e. no installation of Tomcat server, and no full JDK, and no separate Jetty

Least amount of necessary complexity

Least amount of software to install and maintain

A secure installation

This installation can be done on the same host that runs Drupal, if it has enough memory and CPU, or it can be on the database server. However, it is best if Solr is on a separate server dedicated for search, with enough memory and CPU.

Installing Java

We start by installing the Java Runtime Environment, and choose the headless server variant, i.e. without any GUI components.

sudo aptitude updatesudo aptitude install default-jre-headless

Downloading Apache Solr

Second, we need to download the latest stable version of Apache Solr from a mirror near you. At the time of writing this article, it is 4.7.2. You can find the closest mirror to you at Apache's mirror list.

Extracting Apache Solr

Next we extract the archive, while still in the /tmp directory.

tar -xzf solr-4.7.2.tgz

Moving to the installation directory

We choose to install Solr in /opt, because it is supposed to contain software that is not installed from Ubuntu's repositories, using the apt dependency management system, nor tracked for security updates by Ubuntu.

sudo mv /tmp/solr-4.7.2 /opt/solr

Creating a "core"

Apache Solr can serve multiple sites, eached served by a "core". We will start with one core, called simply "drupal".

cd /opt/solr/example/solrsudo mv collection1 drupal

Now edit the file ./drupal/core.properties and change the name= to drupal, like so:

name=drupal

Copying the Drupal schema and Solr configuration

We now have to copy the Drupal Solr configuration into Solr. Assuming your site is in installed in /var/www, these commands achieve the tasks:

Setting Apache Solr Authentication, using Jetty

By default, a Solr installation listens on the public Ethernet interface of a server, and has no protection whatsoever. Attackers can access Solr, and change its settings remotely. To prevent this, we set password authentication using the embedded Jetty that comes with Solr. This syntax is for Apache Solr 4.x. Earlier versions use a different syntax.

The following settings work well for a single core install, i.e. search for a single Drupal installation. If you want multi-core Solr, i.e. for many sites, then you want to fine tune this to add different roles to different cores.

Then edit the file: /opt/solr/example/etc/jetty.xml, and add this section:

Configuring Drupal's Apache Solr module

After you have successfully installed, configured and started Solr, you should configure your Drupal site to interact with the Solr seserver. First, go to this URL: admin/config/search/apachesolr/settings/solr/edit, and enter the information for your Solr server. You should use the URL as follows:

Over the past few years we've moved away from using Subversion (SVN) for version control and we're now using Git for all of our projects. Git brings us a lot more power, but because of its different approach there are some challenges as well.

Git has powerful branching and this opens up new opportunities to start a new branch for each new ticket/feature/client-request/bug-fix. There are several different branching strategies: Git Flow is common for large ongoing projects, or we use a more streamlined workflow. This is great for client flexibility — a new feature can be released to production immediately after it's been approved, or you can choose to bundle several features together in an effort to reduce the time spent running deployments. Regardless of what branching model you choose you will run into the issue where stakeholders need to review and approve a branch (and maybe send it back to developers for refinement) before it gets merged in. If you've got several branches open at once that means you need several different dev sites for this review process to happen. For simple tasks on simple sites you might be able to get away with just one dev site and manually check out different branches at different times, but for any new feature that requires database additions or changes that won't work.

Another trend in web development over the past few years has been to automate as many of the boring and repetitive tasks as possible. So we've created a Drush command called Site Clone that can do it all with just a few keystrokes:

Copies the codebase (excluding the files directory) with rsync to a new location.

Creates a new git branch (optional).

Creates a new /sites directory and settings.php file.

Creates a new files directory.

Copies the database.

Writes database connection info to a global config file.

It also does thorough validation on the input parameters (about 20 different validations for everything from checking that the destination directory is writable, to ensuring that the name of the new database is valid, to ensuring that the new domain can be resolved).

There are a few other things that we needed to put in place in order to get this working smoothly. We've set up DNS wildcards so that requests to a third-level subdomain end up where we want them to. We've configured Apache with a VirtualDocumentRoot so that requests to new subdomains get routed to the appropriate webroot. Finally we've also made some changes to our project management tool so that everyone knows which dev site to look at for each ticket.

Once you've got all the pieces of the puzzle you'll be able to have a workflow something like:

Stakeholder requests a new feature (let's call it foo) for their site (let's call it bar.com).

Developer clones an existing dev site (bar.advomatic.com) into a new dev site (foo.bar.advomatic.com) and creates a new branch (foo).

Developer implements the request.

Stakeholder reviews on the branch dev site (foo.bar.advomatic.com). Return to #3 if necessary.

Merge branch foo + deploy.

@todo decommission the branch site (foo.bar.advomatic.com).

Currently that last step has to be done manually. But we should create a corresponding script to clean-up the codebase/files/database/branches.

Using a Reverse Proxy and/or a Content Delivery Network (CDN) has become common practice for Drupal and other Content Management Systems.

One inconvenient aspect of this is that your web server no longer gets the correct IP address, and neither does your application. The IP address is that of the machine that the reverse proxy is running on.

In Drupal, there is code in core that tries to work around this, by looking up the IP address in the HTTP header HTTP_X_FORWARDED_FOR, or a custom header that you can set.

For example, this would be in the settings.php of a server that runs Varnish on the same box.

Only for the application, what about the web server?

But, even if you solve this at the application level (e.g. Drupal, or WordPress), there is still the issue that your web server is not logging the correct IP address. For example, you can't analyze the logs to know which countries your users are coming from, or identify DDoS attacks.

Apache RPAF module

What this Apache module does is extract the correct IP address, and uses that for Apache logs, as well hand over the correct IP address of the client in PHP's variable: $_SERVER['REMOTE_ADDR']

To install RPAF on Ubuntu 12.04 or later, use the command:

aptitude install libapache2-mod-rpaf

If you run the reverse proxy (e.g. Varnish) on same server as your web server and application, and do not use a CDN, then there is no need to do anything more.

However, if you run the reverse proxy on another server, then you need to change the RPAFproxy_ips line to include the IP addresses of these servers. For example, this will be the addresses for your Varnish servers which are front ending Drupal, then they are front ended by the CDN.

You do this by editing the file /etc/apache2/mods-enabled/rpaf.conf.

For example:

RPAFproxy_ips 10.0.0.3 10.0.0.4 10.0.0.5

CDN Client IP Header

If you are using a CDN, then you need to find out what HTTP header the CDN uses to put the client IP address, and modify RPAF's configuration accordingly.

For example, for CloudFlare, the header is CF-Connecting-IP

So, you need to edit the above file, and add the following line:

RPAFheader CF-Connecting-IP

Drupal Reverse Proxy settings no longer needed

And finally, you don't need any of the above Reverse Proxy configuration in settings.php.

WhatIn the Drupal and broader web community, there is a lot of attention towards the performance of websites.

While "performance" is a very complex topic on its' own, let us in this posting define it as the speed of the website and the process to optimize the speed of the website (or better broader, the experience of the speed by the user as performance.

WhyThis attention towards speed is for two good reasons. On one hand we have the site that is getting bigger and hence slower. The databases get bigger with more content and the the codebase of the website is added with new modules and features. While on the other hand, more money is being made with websites for business even if you are not selling goods or run ads.

Given that most sites run on the same hardware for years, this results in slower websites, leading to a lower pagerank, less traffic, less pages per visit, lower conversion rates. And in the end, if you a have a business case for your website, lower profits. Bottemline: If you make money online, you are losing this due to a slow website.When it comes to speed there are many parameters to take in to account, it is not "just" the average pageloading time. First of all the average is a rather useless metric without taking the standard deviation into account. But apart from that, it comes down to what a "page" is.

A page can be just the HTML file (can be done in 50ms)A page can be the complete webpage with all the elements (for many sites around the 10seconds)A page can be the complete webpage with all elements including third party content. Hint: did you know that for displaying the Facebook Like button, more Javascript is downloaded then the entire jQuery/backbone/bootstrap app of this website, non cacheable!And a page can be anything "above the fold"

And then there are more interesting metrics then these, the time to first byte from a technologic point of view for example. But not just technical PoV. There is a website one visits every day that optimzes its' rendable HTML to fit within 1500 bytes.So ranging from "First byte to glass" to "Round trip time", there are many elements to be taken into account when one measures the speed of a website. And that is the main point: webperformance is not just for the frontenders like many think, not just for the backenders like some of them hope, but for all the people who control elements elements in the chain involved in the speed. All the way down to the networking guys (m/f) in the basement (hint sysadmins: INITCWND has a huge performance impact!) Speed should be in your core of your team, not just in those who enable gzip compression, aggregate the Javascript or make the sprites.Steve Souders (the webperformance guru) once stated in his golden rule that 80-90% of the end-user response time is spent on the frontend.

Speedy to the rescue?This 80% might be matter of debate in the case of a logged in user in a CMS. But even if it is true. This 80% can be reduced by 80% with SPDY.SPDY is an open protocol introduced by Google to overcome the problems with HTTP (up to 1.1 including pipeling, defined in 1999!) and the absence of HTTP/2.0. It speeds up HTTP by generating one connection between the client and the server for all the elements in the page served by the server. Orginally only build in chrome, many browsers now support this protocol that will be the base of HTTP/2.0. Think about it and read about it, a complete webpage with all the elements -regardless of minifying and sprites- served in one stream with only once the TCP handshake and one DNS request. Most of the rules of traditional webperf optimalisation (CSS aggregation, preloading, prefetching, offloading elements to different host, cookie free domains), all this wisedom is gone, even false, with one simple install. 80% of the 80% gone with SPDY, now one can focus on the hard part; the database, the codebase. :-)

The downside of SPDY is however that is is hard to troublshoot and not yet avaliable in all browsers. It is hard to troubleshoot since most implementations use SSL, the protocol is multiplexed and zipped by default and not made to be read by humans unlike HTTP/1.0. There are however some tools that make it possible to test SPDY but most if not all tools you use every day like ab, curl, wget will fail to use SPDY and fallback like defined in the protocol to HTTP/1.0

So more users, less errors under load and a lower page load time. What is there not to like about SPDY?

DrupalThat is why I would love Drupal.org to run with SPDY, see this issue on d.o/2046731. I really do hope that the infra team will find some time to test this and once accepted, install it on the production server.

Performance as a ServiceOne of the projects I have been active in later is ProjectPAAS, bonus point if you find the easteregg on the site :-) . ProjectPAAS is a startup that will test a Drupal site, measure on 100+ metrics, analyse the data and give the developer an opinionated report on what to change to get a better performance. If you like these images around the retro future theme, be sure to checkout the flickr page, like us on facebook, follow us on twitter but most of all, see the moodboard on pinterest

Pinterest itself is doing some good work when it comes to performance as well. Not just speed but also the perception of speed.

Pinterest does lazyload images but also displays the prominent color as background in a cell before the image is loaded, giving the user a sense of what to come. For a background on this see webdistortion

If you are lazyloading images to give your user faster results, be sure to checkout this module we made; lazypaas, currently a sandbox project awaiting approval. It does extract the dominant (most used) color of an image and displays the box where the image will be placed with this color. And if you use it and did a code review, be sure to help it to get it to a real Drupal module.

From 80% to 100%Lazyloading like this leads to better user experience. Because even when 80% of the end-user response time is spent on the frontend, 100% of the time is spend in the client, most ofthen the browser. The only place where performance should be measured and the only page where performance matters. Hence, all elements that deliver this speed should be optimized, including the webserver and the browser.

; All relative paths in this configuration file are relative to PHP's install; prefix (/usr). This prefix can be dynamicaly changed by using the; '-p' argument from the command line.

; Include one or more files. If glob(3) exists, it is used to include a bunch of; files from a glob(3) pattern. This directive can be used everywhere in the; file.; Relative path can also be used. They will be prefixed by:; - the global prefix if it's been set (-p arguement); - /usr otherwise;include=etc/fpm.d/*.conf

; If this number of child processes exit with SIGSEGV or SIGBUS within the time; interval set by emergency_restart_interval then FPM will restart. A value; of '0' means 'Off'.; Default Value: 0;emergency_restart_threshold = 0

; Interval of time used by emergency_restart_interval to determine when; a graceful restart will be initiated. This can be useful to work around; accidental corruptions in an accelerator's shared memory.; Available Units: s(econds), m(inutes), h(ours), or d(ays); Default Unit: seconds; Default Value: 0;emergency_restart_interval = 0

; Time limit for child processes to wait for a reaction on signals from master.; Available units: s(econds), m(inutes), h(ours), or d(ays); Default Unit: seconds; Default Value: 0;process_control_timeout = 0

; Send FPM to background. Set to 'no' to keep FPM in foreground for debugging.; Default Value: yes;daemonize = yes

; Multiple pools of child processes may be started with different listening; ports and different management options. The name of the pool will be; used in logs and stats. There is no limitation on the number of pools which; FPM can handle. Your system will tell you anyway :)

; Start a new pool named 'www'.; the variable $pool can we used in any directive and will be replaced by the; pool name ('www' here)[www]

; Per pool prefix; It only applies on the following directives:; - 'slowlog'; - 'listen' (unixsocket); - 'chroot'; - 'chdir'; - 'php_values'; - 'php_admin_values'; When not set, the global prefix (or /usr) applies instead.; Note: This directive can also be relative to the global prefix.; Default Value: none;prefix = /path/to/pools/$pool

; The address on which to accept FastCGI requests.; Valid syntaxes are:; 'ip.add.re.ss:port' - to listen on a TCP socket to a specific address on; a specific port;; 'port' - to listen on a TCP socket to all addresses on a; specific port;; '/path/to/unix/socket' - to listen on a unix socket.; Note: This value is mandatory.listen = /tmp/phpfpm.sock

; List of ipv4 addresses of FastCGI clients which are allowed to connect.; Equivalent to the FCGI_WEB_SERVER_ADDRS environment variable in the original; PHP FCGI (5.2.2+). Makes sense only with a tcp listening socket. Each address; must be separated by a comma. If this value is left blank, connections will be; accepted from any ip address.; Default Value: any;listen.allowed_clients = 127.0.0.1

; Set permissions for unix socket, if one is used. In Linux, read/write; permissions must be set in order to allow connections from a web server. Many; BSD-derived systems allow connections regardless of permissions.; Default Values: user and group are set as the running user; mode is set to 0666;listen.owner = nobody;listen.group = nobody;listen.mode = 0666

; Unix user/group of processes; Note: The user is mandatory. If the group is not set, the default user's group; will be used.user = nobodygroup = nobody

; Choose how the process manager will control the number of child processes.; Possible Values:; static - a fixed number (pm.max_children) of child processes;; dynamic - the number of child processes are set dynamically based on the; following directives:; pm.max_children - the maximum number of children that can; be alive at the same time.; pm.start_servers - the number of children created on startup.; pm.min_spare_servers - the minimum number of children in 'idle'; state (waiting to process). If the number; of 'idle' processes is less than this; number then some children will be created.; pm.max_spare_servers - the maximum number of children in 'idle'; state (waiting to process). If the number; of 'idle' processes is greater than this; number then some children will be killed.; Note: This value is mandatory.pm = dynamic

; The number of child processes to be created when pm is set to 'static' and the; maximum number of child processes to be created when pm is set to 'dynamic'.; This value sets the limit on the number of simultaneous requests that will be; served. Equivalent to the ApacheMaxClients directive with mpm_prefork.; Equivalent to the PHP_FCGI_CHILDREN environment variable in the original PHP; CGI.; Note: Used when pm is set to either 'static' or 'dynamic'; Note: This value is mandatory.pm.max_children = 50

; The number of child processes created on startup.; Note: Used only when pm is set to 'dynamic'; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2pm.start_servers = 20

; The desired minimum number of idle server processes.; Note: Used only when pm is set to 'dynamic'; Note: Mandatory when pm is set to 'dynamic'pm.min_spare_servers = 5

; The desired maximum number of idle server processes.; Note: Used only when pm is set to 'dynamic'; Note: Mandatory when pm is set to 'dynamic'pm.max_spare_servers = 35

; The number of requests each child process should execute before respawning.; This can be useful to work around memory leaks in 3rd party libraries. For; endless request processing specify '0'. Equivalent to PHP_FCGI_MAX_REQUESTS.; Default Value: 0pm.max_requests = 500

; The URI to view the FPM status page. If this value is not set, no URI will be; recognized as a status page. By default, the status page shows the following; information:; accepted conn - the number of request accepted by the pool;; pool - the name of the pool;; process manager - static or dynamic;; idle processes - the number of idle processes;; active processes - the number of active processes;; total processes - the number of idle + active processes.; max children reached - number of times, the process limit has been reached,; when pm tries to start more children (works only for; pm 'dynamic'); The values of 'idle processes', 'active processes' and 'total processes' are; updated each second. The value of 'accepted conn' is updated in real time.; Example output:; accepted conn: 12073; pool: www; process manager: static; idle processes: 35; active processes: 65; total processes: 100; max children reached: 1; By default the status page output is formatted as text/plain. Passing either; 'html', 'xml' or 'json' as a query string will return the corresponding output; syntax. Example:; http://www.foo.bar/status;http://www.foo.bar/status?json;http://www.foo.bar/status?html;http://www.foo.bar/status?xml; Note: The value must start with a leading slash (/). The value can be; anything, but it may not be a good idea to use the .php extension or it; may conflict with a real PHP file.; Default Value: not set;pm.status_path = /status

; The ping URI to call the monitoring page of FPM. If this value is not set, no; URI will be recognized as a ping page. This could be used to test from outside; that FPM is alive and responding, or to; - create a graph of FPM availability (rrd or such);; - remove a server from a group if it is not responding (load balancing);; - trigger alerts for the operating team (24/7).; Note: The value must start with a leading slash (/). The value can be; anything, but it may not be a good idea to use the .php extension or it; may conflict with a real PHP file.; Default Value: not set;ping.path = /ping

; This directive may be used to customize the response of a ping request. The; response is formatted as text/plain with a 200 response code.; Default Value: pong;ping.response = pong

; The access log format.; The following syntax is allowed; %%: the '%' character; %C: %CPU used by the request; it can accept the following format:; - %{user}C for user CPU only; - %{system}C for system CPU only; - %{total}C for user + system CPU (default); %d: time taken to serve the request; it can accept the following format:; - %{seconds}d (default); - %{miliseconds}d; - %{mili}d; - %{microseconds}d; - %{micro}d; %e: an environment variable (same as $_ENV or $_SERVER); it must be associated with embraces to specify the name of the env; variable. Some exemples:; - server specifics like: %{REQUEST_METHOD}e or %{SERVER_PROTOCOL}e; - HTTP headers like: %{HTTP_HOST}e or %{HTTP_USER_AGENT}e; %f: script filename; %l: content-length of the request (for POST request only); %m: request method; %M: peak of memory allocated by PHP; it can accept the following format:; - %{bytes}M (default); - %{kilobytes}M; - %{kilo}M; - %{megabytes}M; - %{mega}M; %n: pool name; %o: ouput header; it must be associated with embraces to specify the name of the header:; - %{Content-Type}o; - %{X-Powered-By}o; - %{Transfert-Encoding}o; - ....; %p: PID of the child that serviced the request; %P: PID of the parent of the child that serviced the request; %q: the query string; %Q: the '?' character if query string exists; %r: the request URI (without the query string, see %q and %Q); %R: remote IP address; %s: status (response code); %t: server time the request was received; it can accept a strftime(3) format:; %d/%b/%Y:%H:%M:%S %z (default); %T: time the log has been written (the request has finished); it can accept a strftime(3) format:; %d/%b/%Y:%H:%M:%S %z (default); %u: remote user;; Default: "%R - %u %t \"%m %r\" %s";access.format = %R - %u %t "%m %r%Q%q" %s %f %{mili}d %{kilo}M %C%%

; The timeout for serving a single request after which the worker process will; be killed. This option should be used when the 'max_execution_time' ini option; does not stop script execution for some reason. A value of '0' means 'off'.; Available units: s(econds)(default), m(inutes), h(ours), or d(ays); Default Value: 0;request_terminate_timeout = 0

; The timeout for serving a single request after which a PHP backtrace will be; dumped to the 'slowlog' file. A value of '0s' means 'off'.; Available units: s(econds)(default), m(inutes), h(ours), or d(ays); Default Value: 0;request_slowlog_timeout = 0

; Chroot to this directory at the start. This value must be defined as an; absolute path. When this value is not set, chroot is not used.; Note: you can prefix with '$prefix' to chroot to the pool prefix or one; of its subdirectories. If the pool prefix is not set, the global prefix; will be used instead.; Note: chrooting is a great security feature and should be used whenever; possible. However, all PHP paths will be relative to the chroot; (error_log, sessions.save_path, ...).; Default Value: not set;chroot =

; Chdir to this directory at the start.; Note: relative path can be used.; Default Value: current directory or / when chroot;chdir = /var/www

; Redirect worker stdout and stderr into main error log. If not set, stdout and; stderr will be redirected to /dev/null according to FastCGI specs.; Note: on highloaded environement, this can cause some delay in the page; process time (several ms).; Default Value: nocatch_workers_output = yes

; Additional php.ini defines, specific to this pool of workers. These settings; overwrite the values previously defined in the php.ini. The directives are the; same as the PHP SAPI:; php_value/php_flag - you can set classic ini defines which can; be overwritten from PHP call 'ini_set'.; php_admin_value/php_admin_flag - these directives won't be overwritten by; PHP call 'ini_set'; For php_*flag, valid values are on, off, 1, 0, true, false, yes or no.

While inspecting a site that had several performance problems for a client, we noticed is that memory usage was very high. From the "top" command, the RES (resident set) field was 159 MB, far more than what it should be.

We narrowed down the problem to a view that is in a block that is visible on most pages of the site.

But the puzzling part is that the view was configured to only returned 5 rows. It did not make sense for it to use that much memory.

However, when we traced the query, it was like so:

SELECT node.nid, ....
FROM node
INNER JOIN ...
ORDER BY ... DESC

No LIMIT clause was in the query!

When executing the query manually, we found that it returned 35,254 rows, with 5 columns each!

Using the script at the end of this article, we were able to measure memory usage at different steps. We inserted a views embed in the script and measured memory usage:

We did not dig further, but it could be that because a field of type "Global: PHP" was used, views wanted to return the entire data set and then apply the PHP to it, rather than add a LIMIT to the query before executing it.

So, watch out for those blocks that are shown on many web pages.

Baseline Memory Usage

As a general comparative reference, here are some baseline figures. These are worse case scenarios, and assume APC is off, or that this measurement is running from the command line, where APC is disabled or non-persistent. The figures would be less from Apache when APC is enabled.

These figures will vary from site to site, and they depend on many factors. For example, what modules are enabled in Apache, what modules are enabled in PHP, ...etc.

Drupal 6 with 73 modules

Before boot: 0.63 MB
After boot: 22.52 MB
Peak memory: 22,52 MB

Drupal 7 site, with 105 modules

Before boot: 0.63 MB
After boot: 57.03 MB
Peak memory: 58.39 MB

Drupal 7 site, with 134 modules

Before boot: 0.63 MB
After boot: 58.79 MB
Peak memory: 60.28 MB

Drupal 6 site, with 381 modules

Before boot: 0.63 MB
After boot: 66.02 MB

Drupal 7 site, pristine default install, 29 modules

Now compare all the above to a pristine Drupal 7 install, which has 29 core modules installed.

PHPMyAdmin version included is very outdated and missing out on some nice features

MySQL version is stuck at 5.1

PHP requires a workaround to work in the terminal

Built-in PEAR doesn't play nice

So, after reading about VirtualDocumentRoot in this article I decided to take the plunge and try replacing my MAMP setup with the built-in Apache and PHP and using Homebrew to complement whatever was missing. Since I hardly use Drupal 6 anymore, I figured I can live without PHP 5.2.

What started out as a "I wonder if I can get this to work" 10-minute hobby thing turned out into a full-day task and after finally getting everything to work I'm really satisfied with the results. Since I ran into a bunch of pitfalls along the way I thought I'd document my findings for future reference and also to share with others thinking about doing the same thing.

Despite being fully aware that we can actually and effortlessly build a full LAMP stack using Homebrew, I figured since OSX ships with Apache and PHP let's use them!

Step 1 - Backup

This one is a no-brainer. Backup all your stuff, especially MySQL databases as you will have to import them manually later on.

Step 2 - Enable built-in Apache

Open MAMP and stop all running servers but don't uninstall MAMP just yet! Ok, now go to the "Tools" menu and click on "Enable built-in Apache server". Next, go to "System Preferences -> Sharing" and enable the "Web Sharing" checkbox.

Update for Mountain Lion: The "Web Sharing" option doesn't exist anymore! In order to get things working check out the excellent instructions in this article.

If you're having issues with the checkbox not staying on, don't worry it's a simple fix: replace your /etc/apache2/http.conf with the default version:

Step 3 - Enable built-in PHP

This one is easy, edit your new http.conf file and uncomment the following line:

1

#LoadModule php5_module libexec/apache2/libphp5.so

Step 4 - Install MySQL

Just download MySQL and install it! You can then go to "System Preferences -> MySQL" to start the server and set it up to start up automatically.

Now, all we need to do is add the mysql directory to our PATH and we're good. You can set this up either in your ~/.bashrc or .~/profile: (know that if you have a .bashrc then your .profile won't get used, so pick one or the other)

1

exportPATH="/usr/local/mysql/bin:$PATH"

Step 5 - Install PHPMyAdmin

This is an optional but highly-recommended step. I'm not going to bother rewriting this as there's a great article here that gives precise and up-to-date instructions. After following the instructions you should be able to access PHPMyAdmin at http://localhost/~username/phpmyadmin.

In order to persist your settings in PHPMyAdmin, you will need to do the following:

Create a "pma" user and give it a password "pmapass" (or whatever)

Import the "create_tables.sql" file found in your PHPMyAdmin's example folder

Grant full access to the "pma" user for the "phpmyadmin" database

Now edit your config.inc.php and add the following:

1
2
3
4
56
7
8
9
1011
12
13
14
1516
17
18
19
2021
22
23
24

// Setup user and pass.$cfg['Servers'][$i]['controluser']='pma';$cfg['Servers'][$i]['controlpass']='pmapass';// Replace this with your password.// Setup database.$cfg['Servers'][$i]['pmadb']='phpmyadmin';// Setup tables.$cfg['Servers'][$i]['bookmarktable']='pma_bookmark';$cfg['Servers'][$i]['relation']='pma_relation';$cfg['Servers'][$i]['table_info']='pma_table_info';$cfg['Servers'][$i]['table_coords']='pma_table_coords';$cfg['Servers'][$i]['pdf_pages']='pma_pdf_pages';$cfg['Servers'][$i]['column_info']='pma_column_info';$cfg['Servers'][$i]['history']='pma_history';$cfg['Servers'][$i]['designer_coords']='pma_designer_coords';$cfg['Servers'][$i]['tracking']='pma_tracking';$cfg['Servers'][$i]['userconfig']='pma_userconfig';$cfg['Servers'][$i]['hide_db']='information_schema';$cfg['Servers'][$i]['recent']='pma_recent';$cfg['Servers'][$i]['table_uiprefs']='pma_table_uiprefs';// For good measure, make it shut up about Mcrypt.$cfg['McryptDisableWarning']=true;

At this point everything should be ok, but I couldn't get it to work. I found that logging into PHPMyAdmin as "pma" and then logging out made everything work. I won't bother to try and understand this, just happy it worked.

Step 6 - Install Homebrew

Ok, now things start to get fun! All you need to do to install Homebrew can be found in the very nice installation instructions.

Once Homebrew is installed, run brew doctor from your terminal and fix any errors that appear, it's pretty straightforward. If you are getting a weird error about "Cowardly refusing to continue at this point /", don't worry. I had to update my XCode version and install the Command Line Tools from within the XCode preferences to make brew work without complaining.

Step 7 - Configure VirtualDocumentRoot

Once again, there's great a great article on this already so I won't bother rewriting this step.

Are you still with me? Don't forget to add Google DNS or Open DNS or something else in your network preferences after 127.0.0.1 or you won't be able to access anything outside your local environment.

At this point you should probably test out accessing http://localhost/~username in your browser and make sure everything works. Also create a "test.dev" folder in your Sites directory and make sure you can access it via http://test.dev.

If, like myself, you received some access denied errors, you might have to complement the Directory setting added in your http.conf:

If you skipped Step 8 and don't have PEAR installed, don't worry, there's plenty of information in the README to get Drush up and running.

Step 10 - Install XDebug

Ok, we are almost there! The last thing we really need to get a fully-functional development environment is XDebug. Unfortunately, despite many articles mentioning installation via Homebrew, I couldn't find the formula, so let's compile and install it ourselves.

Updated: There's an XDebug brew formula in @josegonzalez's homebrew-php. Just follow the steps in 11 but replace "xhprof" with "xdebug".

Once again, there's a great article for this. If you get an "autoconf" error you might need to install it using Homebrew:

1

brew installautoconf

Now phpize should work. Don't worry if it gives you a warning, just follow the steps in the article and it'll work anyway!

Now we should make a couple tweaks to our php.ini to take advantage of XDebug:

1
2
3
4
56
7
8
9
1011
12
13
14
1516

# General PHP error settings.
error_reporting = E_ALL | E_STRICT # This is nice for development.
display_errors = On # Make sure we can see errors.
log_errors = On # Enable the error log.html_errors = On # This is crucial for XDebug# XDebug specific settings.[xdebug]zend_extension="/usr/lib/php/extensions/no-debug-non-zts-20090626/xdebug.so"# Enable the XDebug extension.# The following are sensible defaults, alter as needed.
xdebug.remote_enable=1
xdebug.remote_handler=dbgp
xdebug.remote_mode=req
xdebug.remote_host=127.0.0.1xdebug.remote_port=9000

Afterwards just restart Apache and run php -v or open a phpinfo() script and you will see that XDebug appears after the "Zend Engine" version.

Step 11 - Install XHProf

This step is optional and I had forgotten about it until Nick reminded me, so props to him! :)

XHProf is a PHP profiler which integrates nicely with the Devel Drupal module and here's how to install it using Homebrew:

[xhprof]extension="/usr/local/Cellar/php53-xhprof/0.9.2/xhprof.so"# Make sure this is the path that homebrew prints when its done installing XHProf.
;This is the directory that XHProf stores it's profile runs in.xhprof.output_dir=/tmp

Now restart apache and look for "xhprof" in your phpinfo() to confirm all went well.

Bonus Steps - APC, Intl and UploadProgress

If you added josegonzalez's excellent homebrew-php (step 11), you can easily install some extra useful extensions:

Step 12 - Cleanup

Ok, we're done! Now go to PHPMyAdmin and re-create your databases and users and also import the backups you made from MAMP's version of PHPMyAdmin. Assuming you followed all the steps successfully, it is now safe to completely uninstall MAMP.

Good for you! :) Now you can go look over the shoulder of a coworker that uses MAMP, smirk and shake your head in disapproval.

In this video we set up a LAMP stack web server for local development on an Ubuntu desktop (version 11.10). We will walk through installing and using tasksel to get the web server installed, then we'll add phpmyadmin to the mix. Once we have the server up and running, we'll look at where the web root is and how to add websites. Finally we wrap up with a quick look at how to start, stop, and restart our server when needed.

This video shows you how to download and do the initial setup of MAMP, which is Apache, MySQL and PHP for Macintosh. It shows some basic configuration tweaks to change the port from 8888 to the default of 80 so that you can just visit the localhost in the browser and get your Drupal installation to appear. It also provides a general orientation to MAMP, and some other initial configuration setting changes.

This video walks through the process of downloading and doing the initial configuration of WampServer, which is Apache, MySQL and PHP for Windows. It shows how to turn it on and off, as well as how to get files to show up from the localhost server.

Drupal can power any site from the lowliest blog to the highest-traffic corporate dot-com. Come learn about the high-end of the spectrum with this comparison of techniques for scaling your site to hundreds of thousands or millions of page views an hour. This Do it with Drupal session with Nate Haug will cover software that you need to make Drupal run at its best, as well as software that acts as a front-end cache (a.k.a Reverse-Proxy Cache) that you can put in-front of your site to offload the majority of the processing work. This talk will cover the following software and architectural concepts:

Configuring Apache and PHP

MySQL Configuration (with Master/Slave setups)

Using Memcache to reduce database load and speed up the site

Using Varnish to serve up anonymous content lightning fast

Hardware overview for high-availability setups

Considering nginx (instead of Apache) for high amounts of authenticated traffic

Recently I posted an Apache configuration file that allows you to detect mobile devices and pass a query parameter to the back-end that informs your web application what type of device is accessing your website. This without the need for device detection in your back-end (e.g. Drupal).

From time to time I get questions about caching strategies and mobile devices. Definitely when you want to create different responses in the back end based on the device AND you have strong caching strategies, things get very tricky.

An example scenario in Drupal (but can relate to any CMS) is where you boost performance by storing the generated pages as static files (see Boost).

These solutions rely heavily on Apache in a way that Apache delivers the static files in case these files exist. Otherwise it passes the request to the CMS.

In order to provide a way to server different responses to mobile devices I created following Apache script:

# Detect if a "device" query parameter is set in the request.
# If so, the value for that attribute forces the requesting device type.
# If the parameter is available, set a cookie to remember for which device pages have to be generated in subsequent requests.
# Assume three categories:
# - mobile-high (high end mobile devices)
# - mobile-low (low end mobile devices)
# - desktop (desktop browser)
RewriteCond %{QUERY_STRING} ^.*device=(mobile-high|mobile-low|desktop).*$
RewriteRule .* - [CO=device:%1:.yourdomain.tld:1000:/,E=device:%1,S=2]
# now skip 2 RewriteRules till after the user agent detection
# If there is no "device" attribute, search for a cookie.
# If the cookie is available add ?device=<device-type> to the request
RewriteCond %{HTTP_COOKIE} ^.*device=(.*)$
RewriteRule (.*) $1?device=%1 [E=device:%1,S=2,QSA]
#now skip till after the user agent detection
# If no cookie or device attribute are present, check the user agent by simple user agent string matching (android and iphone only here)
RewriteCond %{HTTP_USER_AGENT} ^.*(iphone|android).*$ [NC]
RewriteRule (.*) $1?device=mobile-high [E=device:mobile-high,QSA,S=1]
# Detect the user agent for lower end phones (nokia, older blackberries, ...)
RewriteCond %{HTTP_USER_AGENT} ^.*(nokia|BlackBerry).*$ [NC]
RewriteRule (.*) $1?device=mobile-low [E=device:mobile-low,QSA]

This scripts makes sure that every request has an extra "device" query parameter that defines the requesting device (desktop/mobile-high/mobile-low). This information can be used by the back end or any caching mechanisms!

I also added a %{ENV:device} variable so that within the Apache script more logic can be placed based on the device.

Applying to the Boost module

The above snippet works together with Boost by replacing %{server_name} by %{ENV:device}/%{server_name} in the Boost apache configuration.

Recently I started to notice I very high number of LRU Nuked Objects on my websites, which was essentially wipping the entire Varnish Cache. I run Varnish with 4GB File cache and site ocntent is mostly served by external "Poor Man CDN". So, in theory my site content should not be anything near 4GB, however, Varnish Cache was running out of memory and "Nuking" cached objects.

Sizing your cache

Watch the n_lru_nuked counter with varnishstat or some other tool. If you have a lot of LRU activity then your cache is evicting objects due to space constraints and you should consider increasing the size of the cache.

Either I can increase the Varnish Cache size or be a little smarter in handling the resource.

I am using the Drupal's Compress Page feature along with Aggrigate JS and CSS features. However, Drupal will only compress the HTML output, which can result to quite large addgiated CSS and JS files. If Varnish is caching the uncompressed output, this will result in considerably more memory usage.

A much better and effective solution is to let Apache's mod_deflate handle the compression instead of Drupal. Drupal compression is oftenten a prefered choice as Drupal would compress>cache once compared to Apache compressing the files on every request. However, if you are using Varnish, which will handle the cache element, Apache would only have to do the hard work once. Also, the added advantage is that you have a granular control over which files are getting compressed.

Apple OS X comes with Apache and PHP built-in but need some tweaking to work. It also does not come with MySQL. Because of this, many developers have chosen to use MacPorts, Homebrew, or MAMP to install new binaries for Apache, PHP, and MySQL. However, doing this means your system would have multiple copies of Apache and PHP on your machine, and could create conflicts depending on how your built-in tools are configured. This tutorial will show you how to get the built-in versions of Apache and PHP running with an easy to install version of MySQL.

Additionally, it will show how to set up and install Drush, and phpMyAdmin. Setting up a new website is as easy as editing a single text file and adding a line to your /etc/hosts file. All of our changes will avoid editing default configuration files, either those provided by Apple in OS X or from downloaded packages, so that future updates will not break our customizations.

Apache

Apple uses what appears to be the FreeBSD "version" of Apache 2.2, and includes various sample files. One of the sample files is a virtual hosts file that we will copy to our ~/Sites folder for easy editing. Launch the terminal and get started:

$ cp /etc/apache2/extra/httpd-vhosts.conf ~/Sites

In order to prevent future major OS updates from breaking our configuration, we'll be editing as few existing files as possible. Luckily, by default, OS X's httpd.conf file (the main configuration file for Apache) includes all *.conf files in another folder, /etc/apache2/other:

We then create a symbolic link in /etc/apache2/other to our newly-copied httpd-vhosts.conf from our ~/Sites folder:

$ sudo ln -s ~/Sites/httpd-vhosts.conf /etc/apache2/other

The ~/Sites/httpd-vhosts.conf is now where we will add all new virtual hosts. We will place our site root folders in ~/Sites and define them in this file. But first, we need to make some changes (or, download this edited httpd-vhosts.conf file and change fname in /Users/fname to the appropriate path for your system). Begin by adding this text to the top of the file, which will enable the ~/Sites folder and subfolders to be accessed by Apache (again, replace /Users/fname with your appropriate path for home directory):

# Ensure all of the VirtualHost directories are listed below

Options Indexes FollowSymLinks MultiViews
AllowOverride All
Order allow,deny
Allow from all

Add these lines anywhere in the file to ensure that http://localhost still serves the default folder:

# For localhost in the OS X default location

ServerName localhost
DocumentRoot /Library/WebServer/Documents

We will cover setting up a Virtual Host later, so let's reload our Apache settings to activate our new configuration:

$ sudo apachectl graceful

Or, go to the Sharing Preference Pane and uncheck/[re]check the Web Sharing item to stop and start Apache.

PHP

As of OS X 10.6.6, OS X ships with PHP 5.3.3, but it is not enabled in Apache by default. By looking at /etc/apache2/httpd.conf, you can see that the line to load in the PHP module is commented out:

Since we are trying to avoid editing default configuration files where possible, we can add this information, along with the configuration to allow *.php files to run, without the comment at the beginning of the line, in the same folder that we put the symlink for httpd-vhosts.conf:

PHP on OS X does not come with a configuration file at php.ini and PHP runs on its defaults. There is, however, a sample file ready to be copied and edited. The lines below will copy the sample file to /etc/php.ini (optionally, download the edited file and insert at /etc/php.ini) and add some developer-friendly changes to your configuration.

OS X's PHP does ship with pear and pecl, but they are not in the $PATH, nor are there binaries to add or symlink to your path. The first two commands will perform a channel-update for pear and again for pecl, the second two will add an alias for each for your current session, and the last two will add the aliases permanently to your local environment (you may get PHP Notices that appear to be errors but they are simply notices. You can change error_reporting in php.ini to silence the Notices):

We do a lot of Drupal development, and a commonly-added PHP PECL module for Drupal is uploadprogress. With the new alias we just set, adding it is easy. Build the PELC package, add it php.ini, and reload the PHP configuration into Apache:
Note: PECL modules require a compiler such as gcc to build. Install Xcode to build PECL packages such as uploadprogress.

MySQL is the only thing that doesn't not ship with OS X that is needed to run a "LAMP" website like Drupal or WordPress. MySQL/Sun/Oracle provides pre-built MySQL binaries for OS X that don't conflict with other system packages, and even provides a nice System Preferences pane for starting/stopping the process and a checkbox to start MySQL at boot.

Go to the download site for OS X pre-built binaries at: http://dev.mysql.com/downloads/mysql/index.html#macosx-dmg and choose the most appropriate type for your system. For systems running Snow Leopard, choose Mac OS X ver. 10.6 (x86, 64-bit), DMG Archive (on the following screen, you can scroll down for the download links to avoid having to create a user and password).

Open the downloaded disk image and begin by installing the mysql-5.1.51-osx10.6-x86_64.pkg file (or named similarly to the version number you downloaded):

Once the Installer script is completed, install the Preference Pane by double-clicking on MySQL.prefPane. If you are not sure where to install the preference pane, choose "Install for this user only":

Note:

The preference pane is now installed and you can check the box to have MySQL start on boot or start and stop it on demand with this button. To access this screen again, load System Preferences. Click "Start MySQL Server" now:

The MySQL installer installed its files to /usr/local/mysql, and the binaries are in /usr/local/mysql/bin which is not in the $PATH of any user by default. Rather than edit the $PATH shell variable, we add symbolic links in /usr/local/bin (a location already in $PATH) to a few MySQL binaries. Add more or omit as desired. This is an optional step, but will make tasks like using Drush or performing a mysqldump in the command line much easier:

Next we'll run mysql_secure_installation to set the root user's password and other setup-related tasks. This script ships with MySQL and only needs to be run once, and it should be run with sudo. As you run it, you can accept all other defaults after you set the root user's password:

$ sudo /usr/local/mysql/bin/mysql_secure_installation

If you were create a simple php file containing "phpinfo();" to view the PHP Information, you will see that Apple-built PHP looks for the MySQL socket at /var/mysql/mysql.sock:

The problem is that the MySQL binary provided by Oracle puts the socket file in /tmp, which is the standard location on most systems. Other tutorials have recommended rebuilding PHP from source, or changing where PHP looks for the socket file via /etc/php.ini, or changing where MySQL places it when starting via /etc/my.cnf, but it is far easier and more fool-proof to create a symbolic link for /tmp/mysql.sock in the location OS X's PHP is looking for it. Since this keeps us from changing defaults, it ensures that a PHP update (via Apple) or a MySQL update (via Oracle) will not have an impact on functionality:

After installing MySQL, several sample my.cnf files are created but none is placed in /etc/my.cnf, meaning that MySQL will always load in the default configuration. Since we will be using MySQL for local development, we will start with a "small" configuration file and make a few changes to increase the max_allowed_packet variable under [mysqld] and [mysqldump] (optionally, use the contents of this sample file in /etc/my.cnf). Later on, we can edit /etc/my.cnf to make other changes as desired:

To load in these changes, go back to the MySQL System Preferences pane, and restart the server by pressing "Stop MySQL Server" followed by "Start MySQL Server." Or, you can do this on the command line (if you added this symlink as shown above):

$ sudo mysql.server restart

Shutting down MySQL
... SUCCESS!
Starting MySQL
... SUCCESS!

phpMyAdmin

At this point, the full "MAMP stack" (Macintosh OS X, Apache, MySQL, PHP) is ready, but adding phpMyAdmin will make administering your MySQL install easier.

Now that all of the components to run a website locally are in place, it only takes a few changes to ~/Sites/httpd-vhosts.conf to get a new local site up and running.

Your website code should be in a directory in ~/Sites. For the purposes of this example, the webroot will be at ~/Sites/thewebsite.

You need to choose a "domain name" for your local site that you will use to access it in your browser. For the example, we will use "myproject.local". The first thing you need to do is add a line to /etc/hosts to direct your "domain name" to your local system:

$ sudo sh -c "cat >> /etc/hosts <<'EOF'
127.0.0.1 myproject.local
EOF

If you have been following the rest of this guide and used a copy of /etc/apache2/extra/httpd-vhosts.conf in ~/Sites/httpd-vhosts.conf, you can delete the first of the two example Virtual Hosts and edit the second one. You can also delete the ServerAdmin line as it is not necessary.

Change the ServerName to myproject.local (or whatever you entered into /etc/hosts) so Apache will respond when you visit http://myproject.local in a browser.

Change the DocumentRoot to the path of your webroot, which in this example is /Users/fname/Sites/thewebsite

It's highly recommended to set a CustomLog and an ErrorLog for each virtual host, otherwise all logged output will be in the same file for all sites. Change the line starting with CustomLog from "dummy-host2.example.com-access_log" to "myproject.local-access_log", and for ErrorLog change "dummy-host2.example.com-error_log" to "myproject.local-error_log"

Save the file and reload the Apache configuration as shown in Step 6 under the Apache section, and visit http://myproject.local in your browser.

Going forward, all you have to do to add a new website is duplicate and edit the to section underneath your existing configuration, make new edits, edit /etc/hosts again, and reload Apache.

After having run janaksingh.com website on Drupal 6 with Apache+PHP+MySql, I wanted to move to Pressflow so I could harness the added advantages of Varnish.

This is not an in-depth installation guide or a discussion about Varnish or Pressflow, but quick setup commands for my own reference. Please see links at the end if you wish to explore Varnish or Pressflow setup in greater depth.

Setup Varnish VCL File for Pressflow

backend default{
.host="127.0.0.1";
.port="8080";
.connect_timeout= 600s;
.first_byte_timeout= 600s;
.between_bytes_timeout= 600s;}
sub vcl_recv {if(req.request!="GET"&&
req.request!="HEAD"&&
req.request!="PUT"&&
req.request!="POST"&&
req.request!="TRACE"&&
req.request!="OPTIONS"&&
req.request!="DELETE"){/* Non-RFC2616 or CONNECT which is weird. */return(pipe);}if(req.request!="GET"&& req.request!="HEAD"){/* We only deal with GET and HEAD by default */return(pass);}// Remove has_js and Google Analytics cookies.
set req.http.Cookie= regsuball(req.http.Cookie,"(^|;\s*)(__[a-z]+)=[^;]*","");// To users: if you have additional cookies being set by your system (e.g.// from a javascript analytics file or similar) you will need to add VCL// at this point to strip these cookies from the req object, otherwise// Varnish will not cache the response. This is safe for cookies that your// backed (Drupal) doesn't process.//// Again, the common example is an analytics or other Javascript add-on.// You should do this here, before the other cookie stuff, or by adding// to the regular-expression above.// Remove a ";" prefix, if present.
set req.http.Cookie= regsub(req.http.Cookie,"^;\s*","");// Remove empty cookies.if(req.http.Cookie ~ "^\s*$"){
unset req.http.Cookie;}if(req.http.Authorization|| req.http.Cookie){/* Not cacheable by default */return(pass);}// Skip the Varnish cache for install, update, and cronif(req.url ~ "install\.php|update\.php|cron\.php"){return(pass);}// Normalize the Accept-Encoding header// as per: http://varnish-cache.org/wiki/FAQ/Compressionif(req.http.Accept-Encoding){if(req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$"){# No point in compressing these
remove req.http.Accept-Encoding;}
elsif (req.http.Accept-Encoding ~ "gzip"){
set req.http.Accept-Encoding ="gzip";}else{# Unknown or deflate algorithm
remove req.http.Accept-Encoding;}}// Let's have a little grace
set req.grace= 30s;return(lookup);}
sub vcl_hash {if(req.http.Cookie){
set req.hash+= req.http.Cookie;}}// Strip any cookies before an image/js/css is inserted into cache.
sub vcl_fetch {if(req.url ~ "\.(png|gif|jpg|swf|css|js)$"){// This is for Varnish 2.0; replace obj with beresp if you're running// Varnish 2.1 or later.
unset obj.http.set-cookie;}}
sub vcl_error {// Let's deliver a friendlier error page.// You can customize this as you wish.
set obj.http.Content-Type ="text/html; charset=utf-8";
synthetic {"
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html><head><title>"} obj.status "" obj.response {"</title><style type="text/css">#page {width: 400px; padding: 10px; margin: 20px auto; border: 1px solid black; background-color: #FFF;}
p {margin-left:20px;}
body {background-color:#DDD; margin: auto;}</style></head><body><div id="page"><h1>Page Could Not Be Loaded</h1><p>We're very sorry, but the page could not be loaded properly. This should be fixed very soon, and we apologize for any inconvenience.</p>
<hr /> <h4>Debug Info:</h4>
<pre>
Status: "} obj.status {"
Response: "} obj.response {"
XID: "} req.xid {"
</pre>
<address><a href="http://www.varnish-cache.org/">Varnish</a></address>
</div>
</body>
</html>
"};
deliver;
}

This is an Apache Directive that I've never had to use before, but it came in very handy for a very specific problem.

There was already an apache redirect (RewriteRule + RewriteCond) in place, but the destination URL was case sensitive! That's not normally a problem, but it was for an ad server, and the variables were coming in as uppercase, but needed to be lowercase after the redirect. Bad programming on the part of the ad server in my opinion, but we're not going to let that stop us! :)

RewriteMap to the rescue!

First off, the actual directive is a lot like a function definition, and it can only go in a config file or vhost, it's not allowed in a .htaccess file. Luckily the one we want to use is built in, so we just make it available with:

RewriteMap lc int:tolower

This makes the "lc" function is available in our rewrite rules. We start off with the condition and basic rule ...

To use the RewriteMap function in your RewriteRule just change the $N replacement to ${lc:$N}, and you're all set. So for example $1 becomes ${lc:$1}. In our example above, the new RewriteRule looks like this:

Acquia Launches Cloud-based Solr Search Indexing

Acquia, the start-up company founded by Dries Buytaert, the lead developer & founder of Drupal, has announced that they are now providing paid search indexing for Drupal sites on a subscription basis aimed at enterprise sites. Similar to Mollom, Acquia’s anti-spam software for CMS platforms, Acquia Search will also work for those running other open source software like WordPress, Joomla, TYPO3, etc as well as sites with proprietary code. Acquia Search is based on the Lucene and Solr distributions of Apache, and essentially works by having Acquia index your site’s content on their computers and then send it with encryption on demand to supply user queries using an integrated Acquia Search module. According to the announcement, Acquia is using Solr server farms on Amazon EC2 to power this on cloud architecture.

Many people have complained about Drupal’s core search functionality over the years, but the server requirements behind Solr and Lucene require a Java extension that most people are not equipped to manage on their existing IT architecture, staff, or budget. So Acquia is offering these search functionalities as SaaS, or Software as a Service on a remote-hosted, pre-configured basis. If you want to do it yourself, see:http://drupal.org/project/apachesolr

“Acquia Search is included for no additional cost in every Acquia Network subscription. Basic and Professional subscribers have one ‘search slice’ and Enterprise subscribers have five ‘search slices’. A slice includes the processing power to index your site, to do index updates, to store your index, and to process your site visitors’ search queries. Each slice includes 10MB of indexing space – enough for a site with between 1,000 and 2,000 nodes. Customers who exceed the level included with their subscription may purchase additional slices. A ten-slice extension package costs an additional $1,000/year, and will cover an additional 10,000 – 20,000 nodes in an index of 100MB. For my personal blog, which has about 900 nodes at the time of this writing, a Basic Acquia Network subscription ($349 USD/year) would give me all the benefits of Acquia Search, plus all the other Acquia Network services.”1

Put in this perspective, most Drupal users likely won’t be switching to Acquia Search anytime soon. But, for the most part… they have little need to. For small sites or social networks, Drupal’s core search is going to be generally sufficient. Drupal will index your site automatically on cron runs, and keep this index of keywords and nodes in a table of your MySQL database. If you are working a lot with taxonomy and CCK fields, then Faceted Search is a recommended choice: http://drupal.org/project/faceted_search

I have used Faceted Search on a number of sites and it is excellent for building a custom search engine around your site’s own custom vocabularies, hierarchies, and site structures. Faceted Search is also important in a number of Semantic Web integrations working with RDF data and other micro-tags attached to data fields. Acquia Search is designed to work in this way as well as to facilitate the number crunching involved when high traffic sites with extremely large databases of content need to sift through search archives quickly to return results from user queries. Consider the example of Drupal.org in this context – Acquia Search is the solution to managing over 500,000 nodes and millions of search queries on an extremely active site.

“Reality is that for a certain class of websites — like intranets or e-commerce websites — search can be the most important feature of the entire site. Faceted search can really increase your conversions if you have an e-commerce website, or can really boost the productivity of your employees if you have a large intranet. For those organizations, Drupal’s built in search is simply not adequate. We invested in search because we believe that for many of these sites, enterprise-grade search is a requirement… The search module shipped with Drupal core has its purpose and target audience. It isn’t right for everyone, just as Acquia Search is not for everyone. Both are important, not just for the Drupal community at large, but also for many of Acquia’s own customers. Regardless, there is no question that we need to keep investing and improving Drupal’s built-in search.”2

In summary, Acquia Search is mostly targeted at enterprise level Drupal users with extremely large databases and high traffic, and is a cloud based solution that should not only speed up the rate of return on results, it should also improve the quality of the material returned based on faceted keywords & vocabularies. For those using Acquia’s personal or small business subscription accounts, the new search should appear as an additional “free bonus” with your monthly package of services. For users, even on a small site, the efficiency of faceted search may make information more accessible for visitors.

Xavisys Sites

In honor of this week's Open Source Bridge conference, as well as in recognition of the role that open source software has played in the development of our business, we're pleased to announce that today, June 16, 2009, Code Sorcery Workshop is offering any open source contributor a free license to Meerkat, our SSH tunnel management application. We are also giving away a $250 gift certificate to the legendary Powell's Books. Read on for the details.

If you'd like a free copy of Meerkat, just leave a comment on this post linking to an open source project that you've worked on with a brief mention of what you did. It could be coding, but doesn't have to be -- it could also be documentation, helping new users, anything that contributes to the common good of the project. We'll collect all the info and send each contributor a full, unrestricted license to Meerkat, a $19.95 USD value.

However, if you'd like to instead try for the $250 USD gift certificate to Powell's Books, a purchase of Meerkat will make you eligible for this drawing. Just register Meerkat today and you will automatically be entered for the drawing. The winner will be announced in a followup post.

In both cases, you must take action by midnight Pacific Daylight Time tonight to qualify.

Meerkat is an application that adds a lot of Mac-specific value to SSH, an open source tool that ships with every Mac (as OpenSSH). And Macs themselves are built on a ton of open source software such as Apache, Postfix, CUPS, Perl, PHP, Python, Ruby, sudo, unzip, zlib, and many others. You can read more about Apple's commitment to open source as well as open source releases pertaining to Mac OS X.

I began knowingly using open source software in the mid-90s and started contributing by releasing my own projects on freshmeat in late 1999. I've always looked for ways to contribute to open source projects when I can, whether it's by bug fixes, new feature patches, documentation, or just community help. Most recently, I've been involved with the Drupal content management system.

Open source is the lifeblood of the internet. So many of the tools that we take for granted everyday have been developed in this way, by generous folks giving their time for the greater good. I am extremely thankful for the many ways that open source has enabled me to teach myself a lot of what I know today about technology, to provide economical solutions for clients who need it, and to make software better and better by degrees.

I like to develop against local virtual hosts when I work with Drupal. Here is the apache httpd.conf configuration that has served me the best so far. The first VirtualHost is the default .. .simply replicate the second VirtualHost entry and edit accordingly. A simple edit of /etc/hosts to enter the virtual host name that matches the virtual host entries in the httpd.conf and a quick reload by apache and you are good to go. I particularly enjoy the independent logging for each site.

This is likely a no-brainer for the server admins out there but I remember a time when this type of information would have been useful to me ... so here it is. This is not a production level configuration. Have suggestions for improving it?

recently i posted some encouraging performance benchmarks for drupal running on a variety of servers in amazon's elastic compute cloud. while the performance was encouraging, the suitability of this environment for running lamp stacks was not. ec2 had some fundamental issues including a lack of static ip addresses and no viable persistent storage mechanism.

amazon are quickly rectifying these problems, and recently announced elasic ip addresses; a "static" ip address that you own and can dynamically point at any of your instances.

today amazon indicated that persistent storage will soon be available. they claim that this storage will:

behave like raw, unformatted hard drives or block devices

be significantly more durable than the local disks within an amazon ec2 instance

Verifying that the Drupal installation being benchmarked recognizes the cookie and is serving authenticated Pages

ApacheBench, also known as 'ab', is a command line program bundled with the Apache Web Server that measures the performance of web servers by making HTTP requests to a user-specified URL. ApacheBench displays statistical information, such as the number of requests served per second and the amount of time taken to serve those requests, that is useful for evaluating (benchmarking) and tuning the performance of a webserver. ApacheBench does a decent job of simulating different types and levels of load on a sever.

Many Drupal websites make use of Drupal's core caching functionality, which typically improves performance for anonymous users (users who are not-logged in) by a factor of 10 times, measured in requests served per second. The cache system achieves this performance improvement by storing the results of certain database queries in designated cache tables, allowing Drupal to avoid repeating expensive or frequently run database queries. Drupal does not cache page content for authenticated users because Drupal's permissions system (and node_access restrictions) can show different content depending on a user's role and because some elements of the page are user-specific. Since authenticated users are typically served non-cached pages, requests served to them are more resource intensive and therefore take longer to conduct than requests served to anonymous users.

By default, ApacheBench requests pages as one or multiple concurrent anonymous users. Benchmarks of performance for anonymous users can be useful for a website that most users will access anonymously, but these data do not accurately describe the performance of websites that serve a significant percentage of their total pages served to authenticated users (as most community sites tend to do). By providing the Drupal session cookie information, ApacheBench can perform benchmarking tests as an authenticated Drupal user.

Accessing the cookie containing the session ID:

Note: These instructions are for the Mozilla Firefox browser.

1)Log into your Drupal installation using the account that you'd like ApacheBench to use when conducting its tests.

2)Open the Firefox preferences menu and browse to the 'Privacy' Tab

3)Click the 'Show Cookies' button

4)Browse the list of sites to find the site that you have just logged into.

5)Highlight the cookie associated with your site that has a 'Cookie Name' value beginning with 'SESS'. There should be only one of these cookies. If you have two cookies with names beginning with 'SESS' for the same Drupal website, you should clear both of these cookies (which will log you out of your Drupal site) and log back in to ensure that you are specifying the correct cookie in ApacheBench.

6)You'll see that your site's SESS cookie has both 'Name' and 'Content' values. You're going to copy each of these values individually and pass them inside of a command line parameter to ab. Note: Do not log out of your Drupal site before testing with ab, as doing so will invalidate the cookie information you have just copied!

Passing the session cookie to ApacheBench

The 'C' parameter is used with ApacheBench to specify a cookie. This parameter is case sensitive, and should not be confused with 'c' (lowercase) which specifies the number of concurrent requests ApacheBench will make. You'll describe your site's cookie using the following format:

-C 'NAME=CONTENT'

There is no space between the equals sign. Below is a completed example specifying that 400 requests will be made (-n 400) with 10 simultaneous requests (-c 10):

Verifying that Drupal accepts the cookie and is serving pages to an authenticated user

It's important to verify that Drupal recognizes and accepts the cookie that corresponds to your session. Once, while performing a set of ab authenticated Drupal user benchmarks, I adjusted a Drupal cache setting that increased the number of requests served per second for authenticated users from 5 to 60. I thought, “Wow! I've made an enormous improvement!” What I had actually done was switched from Drupal core session handling to Memcahe's session handling, which caused Drupal to not recognize my cookie. Drupal was actually serving anonymous pages, which accounted for the dramatic “performance improvement.” To avoid misunderstandings like this one, we'll use ApacheBench's verbose mode to confirm that Drupal is serving authenticated pages.

In our theme directory, we'll add a PHP snippet to page.tpl.php below the HEAD tag, that will print, inside an HTML comment tag, the username and uid (user id) of the user currently viewing the page.

Then, we'll run ApacheBench with a verbosity level of 4 (parameter -v4), allowing us to view the HTTP response code and html that the server is sending to ab.

Viewing the content of pages served to ab is also useful in determining the nature of failed responses and investigating suspiciously high numbers of requests served per second. Apache identifies certain types of failed requests by serving them with an unsuccessful (non-200) HTTP response code. (For more about HTTP response codes\status codes, see the W3's "Status Code Definitions" .) ApacheBench recognizes HTTP response codes and reports the number of responses that are marked as successful and unsuccessful by Apache. A reportedly high number of requests successfully served per second can indicate good server performance, but it's important to distinguish pages served with a 200 response code from those that are truly served successfully. In some situations, Apache is unaware of serious problems that occur in other parts of the server stack and will serve pages with an HTTP response code of 200 (successful) despite the fact these pages contain only a PHP fatal error. In these cases, Apache is “successfully” serving the PHP fatal error message.

Drupal correctly specifies non-200 HTTP response codes when it is unable to connect to the database. Viewing the content of the benchmarked page allows the person performing the testing to view the specific error message that the site is displaying. This is useful in identifying that the problem is, for example, that the MySQL max_user_connections value has been reached, as opposed to other database connection errors, such as an Access Denied error.

It's worth noting that while ab provides useful information about a server's performance, that it does not predict exactly how a server will perform under load from actual users.

amazon's elastic compute cloud, "ec2", provides a flexible and scalable hosting option for applications. while ec2 is not inherently suited for running application stacks with relational databases such as lamp, it does provide many advantages over traditional hosting solutions.

in this article we get a sense of lamp performance on ec2 by running a series of benchmarks on the drupal cms system. these benchmarks establish read throughput numbers for logged-in and logged-out users, for each of amazon's hardware classes.

we also look at op-code caching, and gauge it's performance benefit in cpu-bound lamp deployments.

the elastic compute cloud

amazon uses xen based virtualization technology to implement ec2. the cloud makes provisioning a machine as easy as executing a simple script command. when you are through with the machine, you simply terminate it and pay only for the hours that you've used.

ec2 provides three types of virtual hardware that you can instantiate. these are summarized in the table below.

this test is a "maximum" throughput test. it creates enough load to utilize all of the critical server resource (cpu in this case). the throughput and response times are measured at that load. tests to measure performance under varying load conditions would also be very interesting, but are outside the scope of this article.

the tests are designed to benchmark the lamp stack, rather than weighting it towards apache. consequently they do not load external resources. that is, external images, css javascript files etc are not loaded, only the initial text/html page. this effectively simulates drupal running with an external content server or cdn.

the benchmark runs in apache jmeter. jmeter runs on a dedicated small-instance on ec2.

benchmarking is done with op-code caching on and off. since our tests are cpu bound, op-code caching makes a significant difference to php's cpu consumption.

our testing environment

the tests use a debian etch xen instance, running on ec2. this instance is installed with:

MySQL: 5.0.32

PHP: 5.2.0-8

Apache: 2.2.3

APC: 3.0.16

Debian Etch

Linux kernel: 2.6.16-xenU

the tests use a default drupal installation. drupal's caching mode is set to "normal". no performance tuning was done on apache, mysql or php.

the results

all the tests ran without error. each of the tests resulted in the server running at close to 100% cpu capacity. the tests typically reached steady state within 30s. throughputs gained via jmeter were sanity checked for accuracy against the http and mysql logs. the raw results of the tests are shown in the table below.

note: response times are in seconds, throughputs are in pages per minute

the results - throughput

the throughput of the system was significantly higher for the larger instance types. throughput for the logged-in threads was consistently 3x lower than the logged-out threads. this is almost certainly due to the drupal cache (set to "normal").

throughput was also increased by about 4x with the use of the apc op-code cache.

the results - response times

the average response times were good in all the tests. the slowest tests yielded average times of 1.5s. again, response times where significantly better on the better hardware and reduced further by the use of apc.

conclusions

drupal systems perform very well on amazon ec2, even with a simple single machine deployment. the larger hardware types perform significantly better, producing up to 12,500 pages per minute. this could be increased significantly by clustering as outlined here.

the apc op-code cache increases performance by a factor of roughly 4x.

these results are directly applicable to other cpu bound lamp application stacks. more consideration should be given to applications bound on other external resources, such as database queries. for example, in a database bound system, drupal's built-in cache would improve performance more significantly, creating a bigger divergence in logged-out vs logged-in throughput and response times.

although performance is good on ec2, i'm not recommending that you rush out and deploy your lamp application there. there are significant challenges in doing so and ec2 is still in beta at the time of writing (Jan 08). it's not for the faint-of-heart. i'll follow up in a later blog with more details on recommended configurations.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.

the previous article showed you how to set up jmeter and create a basic test. to produce a more realistic test you should simulate "real world" use of your site. this typically involves simulating logged-in and logged-out users browsing and creating content. jmeter has some great functionality to help you do this.

as usual, all code and configurations have been tested on debian etch but should be useful for other *nix flavors with subtle modifications. also, although i'm discussing drupal testing, the method below really applies to any web application. if you aren't already familiar with jmeter, i'd strongly recommend that you read my first post before this one.

an overview

the http protocol exchanges for realistic tests are quite complex, and painful to manually replicate. jmeter kindly includes http-proxy functionality, that allows you to "record" browser based actions, which can be used to form the basis of your test. after recording, you can manually edit these actions to sculpt your test precisely.

our test - browsers and creators

as an example, let's create a test with two test groups: creators and browsers. creators are users that arrive at the site, stay logged out, browse a few pages, create a page and then leave. browsers, are less motivated individuals. they arrive at the site, log in, browse some content and then leave.

add a cookie manager to the thread group of type "compatibility". add an "http proxy server to the workbench", as follows:

modify the "content-type filter" to "text/html". your jmeter-proxy should now look like:

navigate in your browser to the start of your test e.g. your home page. clear your cookies (using the clear private data setting). open up the "connection settings option" in firefox preferences and specify a manual proxy configuration of localhost, port 8080. this should look like:

note: you can also do this using internet explorer. in ie7 go to the "connections" tab of the internet options dialog. click the "lan settings" button, and setup your proxy.

start the jmeter-proxy. record your test by performing actions in your browser: (a) browse to two pages and (b) create a page. you should see your test "writing itself". that should feel good.

now stop the jmeter-proxy. your test should look similar to:

setting up the test - simulating browsers

create another thread group above the first. call it browsers. again, add a "http request defaults" object to the thread group. check the "retrieve all embedded resources from html files" box.

add a cookie manager to the thread group of type "compatibility". start the jmeter-proxy again. record your test by performing actions: (a) login and then (b) browse three pages. your test should look like:

stop the jmeter-proxy. undo the firefox proxy.

setting up the test - cleaning up

you can now clean up the test as you see fit. i'd recommend:

change the number of threads and iterations on both thread-groups to simulate the load that you care about.

modify the login to happen only once on a thread. see the diagram below.

and optionally:

rename items to be more meaningful.

insert sensible timers between requests.

insert assertions to verify results.

add listeners to each thread group. i recommend a "graph results" and a "view results tree" listener.

your final test should look like the one below. note that i didn't clutter the example with assertion and timers:

running your test

you should now be ready to run your test. as usual, click through to the detailed results in the tree to verify that your test is doing something sensible. ideally you should do this automatically with assertions. your results should look like:

notes

the test examples that i chose intentionally avoided logged-in users creating content. you'll probably want these users to create content, but you'll likely get tripped up by drupal's form token validation, designed to block spammers and increase security. modifying the test to work around this is beyond the scope of this article, and probably not the best way to solve the problem. if someone knows of a nice clean way to disable this in drupal temporarily, perhaps they could comment on this article.

resources

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.

when making scalability modifications to your system, it's important to quantify their effect, since some changes may have no effect or even decrease your scalability. the value of advertised scalability techniques often depends greatly on your particular application and network infrastructure, sometimes creating additional complexity with little benefit.

apache jmeter is a great tool to simulate load on your system and measure performance under that load. in this article, i demonstrate how to setup a testing environment, create a simple test and evaluate the results.

as usual, all code and configurations have been tested on debian etch but should be useful for other *nix flavors with subtle modifications. also, although i'm discussing drupal testing, the method below really applies to any web application.

the testing environment

you should install and run the jmeter code on a server that has good resources and has a high-bandwidth, low-latency network access to your application server or load balancer. the maximum load that you can simulate is clearly constrained on these parameters. so are the accuracy of your timing results. therefore, for very large deployments you may need to run multiple non-gui jmeter instances on several test machines, but for most of us, a simple one test-machine configuration will suffice, i recently simulated over 12K pageviews/minute from a modest single-core server that wasn't close to capacity.

jmeter has a great graphical interface that allows you to define, run and analyze your tests visually. a convenient way to run this is to ssh to the jmeter test machine using x forwarding, from a machine running an x server. this should be as simple as issuing the command:

$ ssh -x testmachine.example.com

note, you'll need a minimal x install on your server for this. you can get one with:

$ sudo apt-get install xserver-xorg-core xorg

and then running the jmeter gui from that ssh session. jmeter should now appear on your local display, but run on the test machine itself. if you are having problems with this, skip to troubleshooting a the end of this article. this setup is good for testing a remote deployment. you can also run the gui on windows.

x forwarding can become unbearably slow once your test is running, if the test saturates your test server's network connection. if so, you might consider defining the test using the gui and running it on the command line. read more about remote testing on the apache site, and on command line jmeter later in this article.

setting up the test server - download and install java

if you don't have java 1.4 or later, then you should start by installing it. to do so, make sure you've got a line in /etc/apt/sources.list like this:

deb http://ftp.debian.org/debian/ etch main contrib non-free

if you don't then add it, and do a apt-get update. once you've done this, do:

$ sudo apt-get install sun-java5-jre

installation on vista is as easy as downloading and installing the latest zip from http://jakarta.apache.org/site/downloads/downloads_jmeter.cgi, unzipping it and running jmeter.bat. please don't infer that i'm condoning or suggesting the use of windows vista ;)

if you are having problems running jmeter, see the troubleshooting section at the end of this article.

setting up a basic test

jmeter is a very full featured testing application. we'll scratch the surface of it's functionality and setup a fairly simplistic load test. you may want to do something a bit more sophisticated, but this will at least get you started.

to create the basic test, run jmeter as described above. the first step is to create a "thread group" object. you'll use this object to define the simulated number of users (threads) and the duration of the test. right mouse click the test plan node and select:add -> thread group

specify the load that you'll exert on your system, for example, pick 10 users (threads) and a loop count (how many times each thread will execute your test). you can optionally modify the ramp up period e.g. a 10s ramp up in this example would create one new user ever second.

now add a sampler by right mouse clicking the new thread group and choosing:add -> sampler -> http request. make sure to check the box "retrieve all embedded resources from html files", to properly simulate a full page load.

now add a listener to view the detailed results of your requests. the "results tree" is a good choice. add this to your thread group by selecting: add -> listener -> view results tree. note that after you run your test, you can select a particular request in the left panel and then select the "response data" tab on the right, to verify that you are getting a sensible response from your server, as shown below.

finally let's add another listener to graph our result data. choose:add -> listener -> graph results. this produces a graph similar to the graph on the right.

running your test

controlling your test is now a simple matter of choosing the menu items: run -> start, run -> stop, run -> clear all etc. it's very intuitive. while your test is running, you can select the results graph, and watch the throughput and performance statistics change as your test progresses.

if you'd like to run your test in non-gui mode, you can run jmeter on the command line as follows:

$ jmeter --nongui --testfile basicTest.jmx --logfile /tmp/results.jtl

this would run a test defined in file basicTest.jmx, and ouput the results of the test in a file called /tmp/results.jtl. once the test is complete, you could, for example, copy the results file locally and run jmeter to visually inspect and analyse the results, with:

$ jmeter --testfile basicTest.jmx

or just run jmeter as normal and then open your test.

you may then use the listener of choice (e.g. "graph results") to open your results file and display the results.

interpreting your drupal results

most production sites run with drupal's built-in caching turned on. you can look at your performance setting in the administration page at: http://www.example.com/admin/settings/performance. this caching makes a tremendous difference to throughput, but when users are logged in, they bypass this cache.

therefore, to get a realistic idea of your site performance, it's a good idea to calibrate your system with caching on and caching off, and linearly interpolate the results to get a true idea of your maximum throughput. for example, if your throughput is 1,000 views per minute with caching, and 100 without caching and at any given point in time 50% of your users are logged in, you could estimate your throughput at (1000 + 100) / 2 = 550, that is 550 views per minute.

alternatively, you could build a more sophisticated test that simulates close-to-realistic site access including logged-in sessions. clearly, the more work you put into your load tests, the more accurate your results will be. see the followup article for details on building a more sophisticated test.

an example test - would a static file server or cdn help your application?

simply un-checking the "retrieve all embedded resources from html files" on the http request allowed me to simulate all the static resources coming from another (infinitely fast) server.

for my (image intensive) application the results were significant, about a 3x increase in throughput. clearly the real number depends on many factors including the static resources (images, flash etc) used by your application and the ratio of first time to repeat users or your site (repeat users have your content cached). it seems fair to say that this technique would significantly improve throughput for most sites and presumably page performance would be significantly improved too, especially if the static resources were cdn hosted.

troubleshooting your jmeter install

if you are having problems with your jmeter install, then:

make sure that the java version you are running is compatible i.e. 1.4 or later, by:

make sure that you have all the dependencies installed, if you get the error "cannot load awt toolkit: gnu.java.awt.peer.gtk.gtktoolkit", you might you might have to install the gcj runtime library. do this as follows:

$ sudo apt-get install libgcj7-awt

if jmeter hangs or stalls, you probably don't have the right java version installed or on your path.

if you're still having problems, take a look in the application log file jmeter.log for clues. this gets created in the directory that you run jmeter in.

if you are having problems getting x forwarding to work, make sure that it is enabled in your sshd config file e.g. /etc/ssh/sshd_config. you should have a line like:

# guardian - a script to watch over application system dependences, restarting things# as necessary: http://www.johnandcailin.com/john## this script assumes that at, logger, sed and wget are available on the path.# it assumes that it has permissions to kill and restart deamons including# mysql and apache.# # Version: 1.0: Created# 1.1: Updated logfileCheck() not to assume that files are rotated# on restart.

checkInterval=10 # MINUTES to wait between checks

# some general settingsbatchMode=false # was this invoked by a batch jobterminateGuardian=false # should the guardian be terminated

# setting for logging (syslog)loggerArgs="" # what extra arguments to the logger to useloggerTag="guardian" # the tag for our log statements

# the at queue to use. use "g" for guardian. this queue must not be used by another# application for this user.atQueue="g"

# the name of the file containing the checks to runcheckFile="./checks"

# function to print a usage message and bailusageAndBail(){ cat << EOTUsage:guardian [OPTION]...Run a guardian to watch over processes. Currently this supports apache and mysql. Otherprocesses can be added by simple modifications to the script. Invoking the guardian will runan instance of this script every n minutes until the guardian is shutdown with the -t option.Attempting to re-invoke a running guardian has no effect.

All activity (debug, warning, critical) is logged to the local0 facility on syslog.

# are we to terminate the guardian?if test ${terminateGuardian} = "true"then deleteAllAtJobs

${logDebug} "TERMINATING on user request" exit 0fi

# check to see if a guardian job is already scheduled, return 0 if they are, 1 if not.isGuardianAlreadyRunning (){ # if there are one or more jobs running in our 'at' queue, then we are running numJobs=`atq -q ${atQueue} | wc -l` if test ${numJobs} -ge 1 then return 0 else return 1 fi}

# make sure that there isn't already an instance of the guardian running# only do this for user initiated invocations.if test ${batchMode} = "false"then if isGuardianAlreadyRunning then ${logDebug} "guardian invoked but already running. doing nothing." exit 0 fifi

# see if we have a marker in the log file grep "${marker}" ${logFile} > /dev/null 2>&1 if test $? -eq 1 then # there is no marker, therefore we haven't seen this logfile before. add the # marker and consider this check passed echo ${mark} >> ${logFile} ${logDebug} "PASS: new logfile" return 0 fi

# pull out the "active" section of the logfile, i.e. the section between the # last run of the guardian and now i.e. betweeen the marker and the end of the file

in this article i discuss scaling the database tier up and out. i compare database optimization and different database clustering techniques. i go on to explore the idea of database segmentation as a possibility for moderate drupal scaling. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

first steps first - optimizing your database and application

the first step to scaling your database tier should include identifying problem queries (those taking most of the resources), and optimizing them. optimizing may mean reducing the volume of the queries by modifying your application, or increasing their performance using standard database optimization techniques such as building appropriate indexes. the devel module is a great way to find problem queries and functions.

another important consideration is the optimization of the database itself, by enabling and optimizing the query cache, tuning database parameters such as the maximum number of connections etc. using appropriate hardware for your database is also a huge factor in database performance, especially the disk io system. a large raid 1+0 array for example, may do wonders for your throughput, especially combined with a generous amount of system memory available for disk caching. for more on mysql optimization, take a look at the great o'reilly book by jeremy zawodny and derek balling on high performance mysql.

when it's time to scale out rather than up

you can only (and should only) go so far scaling up. at some point you need to scale out. ideally, you want a database clustering solution that allows you do exactly that. that is, add nodes to your database tier, completely transparently to your application, giving you linear scalability gains with each additional node. mysql cluster promises exactly this. it doesn't offer full transparency however, due to limitations introduced by the ndb storage engine required by mysql cluster. having said that, the technology looks extremely promising and i'm interested if anyone has got a drupal application running successfully on this platform. you can read more on mysql clustering on the mysql cluster website or in the the mysql clustering book by alex davies and harrison fisk.

less glamorous alternatives to mysql cluster

without the magic of mysql cluster, we've still got some, admittedly less glamorous, alternatives. one is to use traditional mysql database cluster, where all writes go to a single master and reads are distributed across several read-only-nodes. the master updates the read-only-nodes using replication.

an alternative is to segment read and write requests by role, thereby partitioning the data into segments, each one resident on a dedicated database.

these two approaches are illustrated below:

there are some significant pitfalls to both approaches:

the traditional clustering approach, introduces a replication lag i.e. it takes a non-trivial amount of time, especially under load, for writes to make it back to the read-only-nodes. this may not be problematic for very specific applications, but is problematic in the general case

the traditional clustering approach scales only reads, not writes, since each write has to be made to each node.

in traditional clustering the total effective size of your memory cache is the size of a single node (since the same data is cached on each node), whereas with segmentation it's the sum of the nodes.

in traditional clustering each node has the same hardware optimization pattern, whereas with segmentation, it can be customized according to the role it's playing.

the segmentation approach reduces the redundancy of the system, since theoretically a failure of any of the nodes takes your "database" off line. in practice, you may have segments that are non essential e.g. logging. you can, of course, cluster your segments, but this introduces the replication lag issue.

the segmentation approach relies on a thorough understanding of the application, and the relative projected load on each segment to do properly.

the segmentation approach is fundamentally very limited, since there are a limited number of segments for a typical application.

you can continue this approach on other areas of your database, dedicating several databases to different roles. for example, if one of the functions of your database is to serve as a log, why not segment all log activity onto a single database? clearly, it's important that your segments are distinct i.e. that applications don't need joins or transactions between segments. you may have auxiliary applications that do need complex joins between segments e.g. reporting. this can be easily solved by warehousing the data back into a single database to serve specifically this auxiliary application (warehousing in this case).

while i'm not suggesting that the next step in your scaling exercise necessarily should be segmentation, this clearly depends on your application and preferences, we're going to explore the idea anyway. it's my blog afterall :)

what segmentation technologies to use?

there are several open source tools that you can use to build a segmentation infrastructure. sqlrelay is a popular database-agnostic proxying tool that can be used for this purpose. mysql proxy is, as the name suggests, a mysql specific proxying tool.

in this article i focus on mysql proxy. sqlrelay (partly due to it's more general purpose nature) is somewhat difficult to configure, and inherently less flexible than mysql proxy. mysql proxy on the other hand is quick to setup and use. it has a simple, elegant and flexible architecture that allows for a full range of proxying applications, from trivial to uber-complex.

more on mysql proxy

jan kneschke's brainchild, mysql proxy is a lightweight daemon that sits between your client application (apache/modphp/drupal in our case) and the database. the proxy allows you to perform just about any transformation on the traffic, including segmentation. the proxy allows you to hook into 3 actions; connect, query and result. you can do whatever you want to in these steps, manipulating data and performing actions using lua scripts. lua is a fully featured scripting language, designed for high performance. clearly a key consideration in this application. don't worry too much about aFsc (another scripting language). it's easy to pick up. it's powerful and intuitive.

even if you don't intend to segment your databases, you might consider a proxy configuration for other reasons including logging, filtering, redundancy, timing and analysis and query modification. for example, using mysql proxy to implement a hot standby database (replicated) would be trivial.

the mysql site states clearly (as of 09Nov2007); "MySQL Proxy is currently an Alpha release and should not be used within production environments". Feeling lucky?

a word of warning

the techniques described below, including the overall method and the use of mysql proxy, are intended to stimulate discussion. they are not intended to represent a valid production configuration. i've explored this technique purely in an experimental manner. in my example below i segment cache queries to a specific database. i don't mean to imply that this is a better alternative to memcache. it isn't. anyway, i'd love to hear your thoughts on the general approach.

don't panic, you don't really need this many servers

before you get yourself into a panic over the number of boxes i've drawn in the diagram, please bear in mind that this is a canonical network. in reality you could use the same physical hardware for both loadbalancers, or, even better, you could use xen to create this canonical layout and, over time, deploy virtual servers on physical hardware as load necessitated.

now change your drupal install to point at the load balancer, rather than your data server directly i.e. edit your settings.php on your webserver(s) and point your drupal install to the mysql load balancer, rather than at your database server:

asking mysql proxy to segment your database traffic

the best way to segment a drupal databases depends on many factors, including the modules you use and the custom extensions that you have. it's beyond the scope of this exercise to discuss segmentation specifics, but, as an a example i've segmented the database into 3 segments, a cache server, a log server and a general server (everything else).

to get started segmenting, create two additional database instanaces (drupal-data-server2, drupal-data-server3), with a copy of the data from drupal-data-server3. make sure that you GRANT the mysql load balancer permission on to access each database as described above.

you'll now want to start up your proxy server, pointing to these instances. below, i give an example of a bash script that does this. it starts up the cluster and executes several sql statements, each one bound for a different member of the cluster, to ensure that the whole cluster has started properly. note that you'd also want to build something similar as a health check, to ensure that they kept functioning properly and stopping the cluster (proxy) as soon as a problem was detected.

here's the source for runProxy.sh:

:BASE_DIR=/home/johnBIN_DIR=${BASE_DIR}/mysql-proxy/sbin

# kill the server if it's runningpkill -f mysql-proxy

# make sure any old proxy instance is dead before firing up the new onesleep 1

# run the proxy server in the background${BIN_DIR}/mysql-proxy \--proxy-backend-addresses=192.168.1.26:3306 \--proxy-backend-addresses=192.168.1.27:3306 \--proxy-backend-addresses=192.168.1.28:3306 \--proxy-lua-script=${BASE_DIR}/databaseSegment.lua &

you'll notice that this script calls references databaseSegment.lua, this is the a script that uses a little regex magic to map queries to servers. again, the actual queries being mapped serve as examples to illustrate the point, but you'll get the idea.. jan has a nice r/w splitting example, that can be easily modified to create databaseSegment.lua.

most of the complexity in jan's code is around load balancing (least connections) and connection pooling within the proxy itself. jan points out (and i agree) that this functionality should be made available in a generic load-balancing lua module. i really like the idea of having this in lua scripts to allow others to easily extend it, for example, by adding a round robin alternative. keep an eye on his blog for developments. anyway, for now, let's modify his example, add a some defines and a method to do the mapping:

-- find a server registered for this query_text in the server_table for i=1, #server_table do for j=1, #server_table[i] do if string.find(query_text, server_table[i][j]) then server_to_use = i break end end end

having said that, i have run the suggested configuration in a multi-web-server, high-traffic production setting for 6 months without a glitch, and feedback on his blog gives example of other large sites doing the same thing. for even larger configurations, or if you just prefer, you might consider another method of synchronizing files between your web servers.

kris suggests rsync as a solution, and although luc stroobant points out the delete problem, i still think it's a good, simple solution. see the diagram above.

the delete problem is that you can't simply use the --delete flag on rsync. since in an x->y synchronization, a delete on node x looks just like an addition to node y.

i speculate that you can partly mitigate this issue with some careful scripting, using a source-of-truth file server to which you first pull only additions from the source nodes, and then do another run over the nodes with the delete flag (to remove any newly deleted files from your source-of-truth). unfortunately you can't do the delete run on a live site (due to timing problems if additions happen after your first pass and before your --delete pass), but you can do this as a regularly scheduled maintenance task when your directories are not in flux.

i include a bash script below to illustrate the point. i haven't tested this script, or the theory in general. so if you plan to use it, be careful.

you could call this script from cron on your data server. you could do this, say, every 5 minutes for a smallish deployment. even though that this causes a 5 minute delay in file propagation, the use of sticky sessions ensures that users will see files that they create immediately, even if there is a slight delay for others. additionally, you could schedule it with the -d flag during system downtime.

the viability of this approach depends on many factors including how quickly an uploaded file must be available for everyone and how many files you have to synchronize. this clearly depends on your application.

if you felt a waft of cold air when you read the recent highly critical drupal security announcement on arbirary code execution using install.php, you were right. your bum was hanging squarely out of the window, and you should probably consider beefing up your security.

drupal's default exposure of files like install.php and cron.php present inherent security risks, for both denial-of-service and intrusion. combine this with critical administrative functionality available to the world, protected only by user defined passwords, broadcast over the internet in clear-text, and you've got potential for some real problems.

fortunately, there are some easy and practical things you can do to tighten things up.

step one: block the outside world from your sensitive pages

one easy way to tighten up your security, is to simply block access to your sensitive pages from anyone outside your local network. this can be done by using apache's mod_rewrite. for example, you could block access to any administrative page by adding the following into your .htaccess file in your drupal directory (the one containing sites, scripts, modules etc.). the example only allows access from IPs in the range 192.*.*.* or 200.*.*.*:
<IfModule mod_rewrite.c> RewriteEngine on

step two: tunnel into your server for administrative access

now that you've locked yourself out of your server for remote administrative access, you'd better figure how to get back in. SOCKS-proxy and ssh-tunneling to the rescue! assuming that your server is running an ssh server, setup a ssh tunnel (from the machine you are browsing on) to your server as follows:

now go to your favorite browser and proxy your traffic through a local ssh SOCKS proxy e.g. on firefox 2.0 on windoze do the following:

select the tools->options (edit->preferences on linux) menu

go to the "connections" section of the "network" tab, click "settings"

set the SOCKS host to localhost port 9999

now simply navigate to your site and administer, safe in the knowledge that not only is your site's soft-underbelly restricted to local users, but all your traffic (including your precious admin password) is encrypted in transit.

your bum should be feeling warmer already.

some more rules

some other rules that you might want to consider include (RewriteCond omitted for brevity)
# allow only internal access to node editingRewriteRule ^node/.*/edit.* - [F]

the authors of drupal have paid considerable attention to performance and scalability. consequently even a default install running on modest hardware can easily handle the demands of a small website. my four year old pc in my garage running a full lamp install, will happily serve up 50,000 page views in a day, providing solid end-user performance without breaking a sweat.

when the times comes for scalability. moving of of the garage

if you are lucky, eventually the time comes when you need to service more users than your system can handle. your initial steps should clearly focus on getting the most out of the built-in drupal optimization functionality, considering drupal performance modules, optimizing your php (including considering op-code caching) and working on database performance. John VanDyk and Matt Westgate have an excellent chapter on this subject in their new book, "pro drupal development"

once these steps are exhausted, inevitability you'll start looking at your hardware and network deployment.

a well designed deployment will not only increase your scalability, but will also enhance your redundancy by removing single points of failure. implemented properly, an unmodified drupal install can run on this new deployment, blissfully unaware of the clustering, routing and caching going on behind the scenes.

incremental steps towards scalability

in this article, i outline a step-by-step process for incrementally scaling your deployment, from a simple single-node drupal install running all components of the system, all the way to a load balanced, multi node system with database level optimization and clustering.

since you almost certainly don't want to jump straight from your single node system to the mother of all redundant clustered systems in one step, i've broken this down into 5 incremental steps, each one building on the last. each step along the way is a perfectly viable deployment.

tasty recipes

i give full step-by-step recipes for each deployment, that with a decent working knowledge of linux, should allow you to get a working system up and running. my examples are for apache2, mysql5 and drupal5 on debian etch, but may still be useful for other versions / flavors.

note that these aren't battle-hardened production configurations, but rather illustrative minimal configurations that you can take and iterate to serve your specific needs.

the 5 deployment configurations

the table below outlines the properties of each of the suggested configurations:

step 0step 1step 2step 3step 4
step 5
separate web and db
no
yes
yes
yes
yes
yes
clustered web tier
no
no
yes
yes
yes
yes
redundant load balancer
no
no
no
yes
yes
yes
db optimization and segmentation
no
no
no
no
yes
yes
clustered db
no
no
no
no
no
yes
scalabilty
poor-
poor
fair
fair
good
great
redundancy
poor-
poor-
fair
good
fair
great
setup ease
great
good
good
fair
poor
poor-

in step 0, i outline how to install drupal, mysql and apache to get a get a basic drupal install up-and-running on a single node. i also go over some of the basic configuration steps that you''ll probably want to follow, including cron scheduling, enabling clean urls, setting up a virtual host etc.
in step 1, i go over a good first step to scaling drupal; creating a dedicated data server. by "dedicated data server" i mean a server that hosts both the database and a fileshare for node attachments etc. this splits the database server load from the web server, and lays the groundwork for a clustered web server deployment.
in step 2, i go over how to cluster your web servers. drupal generates a considerable load on the web server and can quickly become resource constrained there. having multiple web servers also increases the the redundancy of your deployment.
in step 3, i discuss clustering your load balancer. one way to do this is to use heartbeat to provide instant failover to a redundant load balancer should your primary fail. while the method suggested below doesn't increase the loadbalancer scalability, which shouldn't be an issue for a reasonably sized deployment, it does increase your the redundancy.

in this article i discuss scaling the database tier up and out. i compare database optimization and different database clustering techniques. i go on to explore the idea of database segmentation as a possibility for moderate drupal scaling.

the holy grail of drupal database scaling might very well be a drupal deployment on mysql cluster. if you've tried this, plan to try this or have opinions on the feasibility of an ndb "port" of drupal, i'd love to hear it.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.

one way to do this is to use heartbeat to provide instant failover to a redundant load balancer should your primary fail. while the method suggested below doesn't increase the loadbalancer scalability, which shouldn't be an issue for a reasonably sized deployment, it does increase your the redundancy. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

configure /etc/ha.d/haresources. this must be identical on apache-balance-1 and apache-balance-2. really! . this file should look like:

apache-balance-1 192.168.1.51 apache2

note:

apache-balance-1 here refers to the "preferred" host for the service

192.168.1.51 is the vip that your load balancer will appear to be on

apache2 here refers specifically to the name of the script in the directory /etc/init.d

configure /etc/ha.d/authkeys on both load balancers. if you're paranoid, see more secure options, in "configuring authkeys" here. this fiile should look like:

auth 22 crc

set authkeys permissions on both load balancers:

# chmod 600 /etc/ha.d/authkeys

configure apache to listen on the vip. edit /etc/apache2/ports.conf on both load balancers:

Listen 192.168.1.51:80

note: after this change, apache won't start on the load balancer in question, unless it has the vip. relax. that's as it should be.

theoretically you should configure each load balancer to stop apache2 starting on boot. this allows the ha daemon to take full control of starting and stopping apache. in practice i didn't need to. you might want to.

restart the ha daemons and test

restart the ha daemon on both load balancers and test:

# etc/init.d/heartbeat restart

keep an eye on the apache and heartbeat logfiles on both servers to see what is going on when you shut either loadbalancer down.

# tail -f /var/log/apache2/access.log# tail -f /var/log/ha-log

final word

this is a fairly simplistic configuration. there is more you can do on detecting abormal situations and failing over. for more information, visit http://www.linux-ha.org

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.

if you've setup your drupal deployment with a separate database and web (drupal) server (see scaling drupal step one - a dedicated data server), a good next step, is to cluster your web servers. drupal generates a considerable load on the web server and can quickly become resource constrained there. having multiple web servers also increases the the redundancy of your deployment. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

one way to do this is to use a dedicated web server running apache2 and mod_proxy / mod_proxy_balancer to load balance your drupal servers.

load balancer setup: install and enable apache and proxy_balancer

enable mod_proxy in mods-available/proxy.conf. note that i'm leaving ProxyRequests off since we're only using the ProxyPass and ProxyPassReverse directives. this keeps the server secure from spammers trying to use your proxy to send email.

<IfModule mod_proxy.c> # set ProxyRequests off since we're only using the ProxyPass and ProxyPassReverse # directives. this keeps the server secure from # spammers trying to use your proxy to send email.

ProxyRequests Off

<Proxy *> AddDefaultCharset off Order deny,allow Allow from all #Allow from .example.com </Proxy>

# Enable/disable the handling of HTTP/1.1 "Via:" headers. # ("Full" adds the server version; "Block" removes all outgoing Via: headers) # Set to one of: Off | On | Full | Block

ProxyVia On</IfModule>

configure mod_proxy and mod_proxy_balancer

mod_proxy and mod_proxy balancer serve as a very functional load balancer. however mod_proxy_balancer makes slightly unfortunate assumptions about the format of the cookie that you'll use for sticky session handling. one way to work around this is to create your own session cookie (very easy with apache). the examples below describe how to do this

first create a virtual host or use the default (/etc/apache2/sites-available/default) and add this configuration to it:

if you plan to experiment with bringing servers up and down to test them being added and removed from the cluster you should consider setting the "connection pool worker retry timeout" to a value lower than the default 60s. you could set them to e.g. 10s by changing your configuration to the one below. a 10s timeout allows for quicker test cycles.

if you've already installed drupal on a single node (see easy-peasy-lemon-squeezy drupal installation on linux), a good first step to scaling a drupal install is to create a dedicated data server. by dedicated data server i mean a server that hosts both the database and a fileshare for node attachments etc. this splits the database server load from the web server, and lays the groundwork for a clustered web server deployment. here's how you can do it. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

deployment overview

this table summaries the characteristics of this deployment choice:

scalability:
poor
redundancy:
poor
ease of setup:
good

servers

update

the recipe below uses nfs, you might want to consider using rysnc as an alterative. see the discussion in step one B -- john, 11 nov 2007
if you plan to run with a single webserver for a while, you can skip the nfs / rsync malarky until step 2 -- john, 11 nov 2007

data server: setup mysql and prepare it for remote access

install mysql. i use mysql5. you'll need to enable this for remote access. edit /etc/mysql/my.cnf and change the bind address to your local server address e.g.

# bind-address = 127.0.0.1bind-address = 192.168.1.26

now allow access to your database from your web (drupal) servers. run mysql and do:

data server: setup a shared nfs partition

moving drupal's file data (node attachments etc) to an nfs server does two things. it allows you to manage all your important data on a single server, simplifying backups etc. it also paves the way for web server clustering, where clearly it doesn't make sense to write these files onto random web servers in the cluster.

it's a good idea to verify that drupal is happy with this new arrangement by visiting the status report, e.g. by hitting http://drupal-lb1.mydomain.com/drupal/?q=admin/logs/status and making sure that it sees your nfs area as writable. you can also just upload an attachment and see what happens.

installing drupal is pretty easy, but it's even easier if you have a step by step guide. i've written one that will produce a basic working configuration with drupal5 on debian etch with php5, mysql5 and apache2. it might be a help on other configurations too. see the scalability overview for related articles.

my examples assume that you are bold (or stupid) enough to do this as root. if you are a scaredy-cat or just plan sensible, make sure you use sudo. there. i said it.

this recipe assumes that you want to install everything on the same server. if you want to install your database on another server, skip the database install below and instead follow the instructions for a separate data server. i recommend just following the instructions below and then moving off the data server later, to keep things simple.

install the dependencies

i used php5, mysql5 and apache2. i don't know what you've got installed. here's a set of dependencies that worked for me:

point drupal at your new database

finish the setup with the web tool

go to your browser and hit http://localhost/drupal, you will get redirected to http://localhost/drupal/install.php?profile=default, the install and db config page. follow the instructions. type in user=drupal, database=drupaldb, password=password. go on to create the initial (super admin) user and you are good to go.

creating a cron entry (optional)

you'll also want to add a cron entry (you'll need wget installed, apt-get install wget) like the following. it doesn't matter what user. you could put it in the root crontab (crontab -e as root):

enabling clean urls in apache (optional)

it's nice to enable drupal's clean urls. first, allow apache to access the .htaccess in the drupal directory. Add the following to your file in /etc/apache2/sites-available/ e.g. /etc/apache2/sites-available/default

now go the URL http://localhost/drupal/?q=admin/settings/clean-urls, take the test (hopefully you'll pass) and, when you do, turn on clean urls.

memory limits (optional, but might save you some pain)

i had a problem with drupal administration (and some other) pages intermittently showing as completely blank. this is often a php memory issue. check your apache error log (/var/log/apache2/error.log) to know for sure. adding the line:

ini_set('memory_limit', '12M');

to the settings file /var/www/drupal/sites/default/settings.php will fix the issue. you might want to add more than 12M.

setting up an apache virtual host (optional)

it's nice to setup an apache virtual host for your drupal site. this allows you to create custom logging, remove the /drupal/ from your urls and nicely encapsulate the directives for drupal. here's how you can do it.

create a virtual host specification in /etc/apache2/sites-available called myserver.mydomain.com that looks something like this:

sounds like a lot of work?

if you can't be bothered doing all this configuration and you've got xen setup. ask me nicely and i'll send you a pre-configured xen machine with a this base install pre-configured. if you don't have a xen configuration, consider setting one up. i've got another quick and easy recipe for xen on linux.

One of the great features of Drupal is its ability to run any number of sites from one base installation, a feature generally referred to as multisites . Creating a new site is just a matter of creating a settings.php file and (optionally) a database to go with your new site. That's it. More importantly, there's no need to set up complicated Apache Virtual hosts, which are a wonderful feature of Apache, but can be very tricky and tedious, especially if you're setting up a large number of subsites.

No worries, there is a solution.

Create a new LogFormat

Copy the LogFormat of your choice, prepend the HTTP host field, and give it a name: