High Availability (Load Balancing) behind a firewall

My boss wants me to setup a load balanced system with a firewall filtering the traffic out and making this whole thing scalable for adding new machines into the cluster.

How would I go about this?

There would be a pyramid-like structure in his proposed system, where the firewall host would route traffic arriving to its internal servers (load balancers), which would in turn distribute it among the web and file servers in the inner network.

Is this a good solution? I would think that having (at least) 2 load balancers directly connected to the Internet would be desirable. Otherwise one has the single point of failure firewall. I would even go further and include the load balancers in the 2 firewalls, which are directly connected to the Internet and share a common virtual wan ip address.

So, I would have the following (simplest) setup:

2 Firewall hosts with the load balancers sharing 1 virtual ip
2 Web servers behind these firewalls which are to be load balanced by the firewalls

Does this sound like a better solution or do you think I should go with the pyramid approach? Does load balancing even justify then? Isn't the connection speed (10-100Mbit) a bottleneck rather than the server power (having very new hardware). Wouldn't the firewall, which needs to handle ALL connections be the bottleneck when it comes to using resources?

Might we even install webservers on the firewalls/load balancers as well to make use of their resources more efficiently or does that defeat the purpouse of a firewall?

With todays technology virtualisation (aka Xen, VirtualLinux) could be used as well to make use of all the resources of the firewall hosts while still completely separating the firewall from the load balancer and the maybe even installed web server on that system.

What would be the best solution? Is there a best solution? What does it depend on: Connection speed to the network/Internet of the various hosts, their processing power? How can one approximate the number of connections a host (firewall) can handle?

Is there a formula to calculate the number of firewall, load balancing and web server hosts which is optimal?

Can we measure the speeds of various tasks fulfilled by the hosts to approximate an optimal solution?

Any ideas would be greatly appreciated.

As I go on the load balancing howto I'm writing a script to automate this for loadb1 and loadb2, so that one can interactively enter the various bits of information necessary btw. If I get some good feedback on this and the system goes into production with the script working, I think I will post it somewhere in this forum.

My question was more about firewalls. I know there are tutorials how to set them up on this site as well. But I'm trying to figure out what kind of setup is optimal with what hardware. In theory I know all the possibilities because I've been browsing this forum. What I'm trying to figure out is the optimal solution. So, should we separate firewall from web server completely, by hardware hosts or just by virtual hosts or is it OK to put web server and firewall on the same machine. Things like that, more general. You know what I mean? I'm sure this is of interest to many who are setting up networks.

To study this kind of stuff, do you know of any good tools to monitor performance on Linux systems which are preferably open source?

A problem that I came across:
When I followed the howto http://www.howtoforge.com/high_availability_loadbalanced_apache_cluster
I made a mistake and put two different load balancers into the file /etc/ha.d/haresources (i.e. loadb1's `uname -n` in loadb1 and loadb2's `uname -n` output in loadb2.
I tried reversing the step by putting loadb1's output into the file of loadb2 and restarting the services, but the output of `ipvsadm -L -n` still gave the same on loadb1 and loadb2:

My question was more about firewalls. I know there are tutorials how to set them up on this site as well. But I'm trying to figure out what kind of setup is optimal with what hardware. In theory I know all the possibilities because I've been browsing this forum. What I'm trying to figure out is the optimal solution. So, should we separate firewall from web server completely, by hardware hosts or just by virtual hosts or is it OK to put web server and firewall on the same machine. Things like that, more general. You know what I mean? I'm sure this is of interest to many who are setting up networks.

Click to expand...

I'd most likely put the firewalls on the Apache nodes.

geek.de.nz said:

To study this kind of stuff, do you know of any good tools to monitor performance on Linux systems which are preferably open source?

A problem that I came across:
When I followed the howto http://www.howtoforge.com/high_availability_loadbalanced_apache_cluster
I made a mistake and put two different load balancers into the file /etc/ha.d/haresources (i.e. loadb1's `uname -n` in loadb1 and loadb2's `uname -n` output in loadb2.
I tried reversing the step by putting loadb1's output into the file of loadb2 and restarting the services, but the output of `ipvsadm -L -n` still gave the same on loadb1 and loadb2:

Falko, first thanks for all you do on these boards... I've used your advice many times.

Are there any modern updates to the first link with the High Availability and Load Balanced Clusters using redundant LB's and LAMP stacks? I am having trouble finding current equivalents using current gen software stacks (ie: Ubuntu Server 10.04, HAProxy, Linux-HA etc.)...