What's the performance/memory cost of having many (as in hundreds) VirtualHost directives in Apache? Generally, is it a good idea?

I'm currently working on improving my company's On-Demand infrastructure. What we have is a pretty classic architecture: each customer has a subdomain, customer1.service.com for example, that leads to one of our physical servers (they are VMs actually but it is unimportant so let's call them hosts). Those hosts can serve several customers and, as our service is web-based, we have Apache sending requests to our web application. Basically Apache is an HTTP Proxy and does just that.

As our hosts are not that busy, I would like to be able to have multiple flavours of our application running on each of them and Apache sending our customers to the right application. AFAIK the only way to achieve that is to have one VirtualHost directive per subdomain/customer. As such, I'm basically asking if a hundred of those will work well or present performance issues.

Also, having one Location directive per flavour is not an option because the app is poorly written and doesn't support the extra bit in the path (as in customer1.service.com/v1/).

Feel free to propose anything that might work just better in your opinion.

I ended up creating one VirtualHost per flavour and using ServerAlias instead of ServerName. I also wrote a script that changes the ServerAlias option in Python. The script watches manages a lock file before processing anything.
–
saalaaApr 6 '11 at 16:45

2 Answers
2

Unless you are running on a 386 with 64M RAM, 100 vhosts won't be a performance or memory issue. In general, I wouldn't worry until you are at the 10,000+ vhost situation.

Ignoring all the considerations raised by Richard, I wondered what the run-time performance cost was for each request. It looks like the performance penalty increases linearly with the number of name-based vhosts and (eventually) linearly with the number of IP-based vhosts.

There is a hash table (size 256 for apache 2.2.17) hashed by IP address. Each bucket contains a linked list of vhosts that may be able to handle the request. The details are in vhost.c.

Without profiling the code it is hard to say what % of the execution time is spent match virtual hosts.

I have just checked this file and my reading confirms your analysis. In a typical situation, there would be three such VirtualHost with say a hundred ServerAlias each. So it won't really be a performance issue. Honestly I didn't think about checking there in the first place so thanks for the pointer. I guess that's what I should have done in the first place. Use the source as they say :)
–
saalaaApr 6 '11 at 19:41

Larger httpd.conf configuration files can cause additional delays during Apache restarts, rebuilds or updates as obviously Apache needs to process these files. For example, if you have an httpd.conf with ~2000 hosts, you could have a file in excess of 100K lines which can take 30-60 seconds to be processed depending on the amount of information you have within.

There are then additional disaster recovery considerations to review, if the file becomes corrupt, how many hosts will you be taking down? When updating Apache, how long will the rebuild take, or because of the size, how much of a resource impact on the machine would the update have?

Simple issues such as a syntax error can cause huge issues with large files, one mistake, hundreds of hosts offline :)

Edit: If you're redirecting the sub domain to another physical server, why not use DNS zones instead?

Actually, the deployment process is automated so there should be no problem. Also, if this is implemented, I will make the system write multiple configuration files in /etc/apache2/sites-enabled/ so that it's easy to deal with those. There's also the mighty apache -t to check the configuration. As for service interruption, there is apache -k restart which doesn't drop existing connections (although I'm not sure how fast it starts accepting new ones; after each directive vs after the configuration is fully loaded).
–
saalaaApr 6 '11 at 13:53

You will still find issues unfortunately during day to day operations, it won't be regular but unfortunately it does happen, during a restart for example your application might try and write to the file, it might hang etc etc. It will work, don't get me wrong, it's just not very efficient and not 100% reliable, I'd personally consider splitting this across multiple machines where possible reducing the risk.
–
RichardApr 6 '11 at 13:56

As for the DNS zones, I'm not sure to understand what you mean exactly. If you advise to use a different zone for each flavour, this is something I try to avoid because it requires changing our existing naming scheme and impacts our customers.
–
saalaaApr 6 '11 at 14:02

Hmm, 100K lines is not that big. Are you sure it would be that slow? 30-60 seconds? really? (is there any benchmarks?) A lot of web applications have more than 100K lines of code and it's being parsed and executed just in milliseconds.
–
rvsApr 7 '11 at 6:34

Reloading the configuration with a hundred ServerAliases takes 200ms (as said earlier, it doesn't drop existing connections). As modifying these is not a very common operation, this poses no problem for our service. Thanks all for your help!
–
saalaaApr 7 '11 at 15:08