For the moment there's a lot of choices for setting up a Linux cluster.

For cluster manager: you can use Red Hat Cluster manager, Pacemaker or Veritas Cluster Server.
The first one has the most momentum, the second one comes by default with RH subscriptions and the last one is very expensive and has a very good reputation ;-)

For storage:
- You can replicate LUN's using software raid / md device
- You can use the network using DRBD replication, which offers a bit more flexibility
- You can use Veritas Storage Foundation technology to talk to your SANs replication technology.

8 Answers
8

I'd go with GlusterFS. The latest version 3.x supports geo-replication (long latent pipe type of thing) as well as LAN replication. There's plenty of docs about how to replicate and spread data across the cluster.

I don't like DRDB, because there's a limit on the number of nodes you can use. I think GlusterFS on decent hardware, with a decent bit of network tuning might be just what you're after. Definitely worth a test session.

You still have to replicate storage from one SAN to the other SAN ;-) So how to do this is the most important question... (SAN technology approached by scripts or Veritas Agent, DRBD, md...)
–
PieterBSep 3 '10 at 10:48

No you do not have to replicate - that's one of the main reasons for having a SAN in the first place. But you need a filesystem which knows that other machines may be trying to access the same physical sectors/logical structure.
–
symcbeanSep 3 '10 at 11:35

2

Since the OP mentioned "stretch cluster" rather than regular ha-clustering, seems he wants to protect against his SAN array going down as well, in which case he does need replication. That replication can of course be DIY with DRBD or similar, or some $$$ solution from the SAN array vendor.
–
jannebSep 3 '10 at 12:06

That's when you don't have 2 SAN's. If you have a split-site cluster (stretch), you have the same data on each side. So you need some method for the replication ;-)
–
PieterBSep 3 '10 at 12:12

@janneb: indeed, we need to protect for the loss of a complete datacenter site.
–
PieterBSep 3 '10 at 12:14

I am currently testing "stretch cluster" using Red Hat Cluster Suite and DRBD. I am typing this at a hotel near Red Hat Summit in Boston which just ended. I talked with the Red Hat CLuster Suite developers and they said stretch clusters were not supported at this time.

This won't stop me from working on it for fun though. My set up is four HP blades in a single cluster. Two blades are in one datacenter about 15 miles from the other datacenter which houses the other two blades. In order to get the cluster to even join together, I needed the network team to configure the routers between the sites to pass multicast traffic. In addition, since Red Hat hard codes a TTL of "1" to the multicast heartbeat packets, I had to use iptables to mangle that multicast address to a higher TTL.

After that was done, I was able to get a four node cluster with my blades. For storage, I have a 3Par LUN shared at each site between each of it's two local nodes. These are the block devices I use for my DRBD devices. I should add here that I have a dedicated 1G WAN link for just my DRBD traffic. I was able to get DRBD running fairly easily between the sites and use that DRBD device as a PV in a clustered LV with GFS2 running on it. I do occasionally have split-brain conditions on my DRBD setup that I must manually recover from and I am trying to isolate that problem.

The next step has been the hardest. I want to be able to fail over my GFS2 mount to the other node in case the primary fails. My GFS2 service consists of a floating IP -> DRBD -> LVM -> GFS2. The drbd.sh script that comes in the source code for clustering doesn't work at all so I have been testing with the regular DRBD startup script in /etc/init.d. Seems to work "sometimes" so I will need to tweak that it seems.

I ws dismayed to discover that none of this is supported in Red Hat Cluster Suite, so any dream I had of moving this to production is dashed. And where else would you need this kind of set up? Pretty much only very important production stuff.

I did talk with Symantec here and they told me they absolutely support active-active stretch clusters with shared storage. I will believe that when I actually see it though.

DRBD is dead slow as everybody knows. You can't use that for high load enterprise purposes. It uses 128 KiB hashing functions which limit the IO requests to max. 128 KiB instead of 512 KiB what a regular HDD can provide. Furthermore, there is a stupid IO request size detection. This thing only works when connected to the other host. If you loose the connection this is reset to 4 KiB on your local HDDs. 8.4.1 and 8.3.11 have the same issues.

Wrt. software raid/md, while DRBD superficially is just RAID 1 over the network, in reality DRBD is significantly more complicated in order to deal e.g. with temporary network partitions without having to resync from scratch, and so forth.

Also, consider that software RAID-1 typically tries to balance the load on the drives by distributing reads somewhat evenly over them. Needless to say, this isn't a very good idea if one drive is local and the other is somewhere behind a potentially low bandwidth/high latency network link.

I really don't know. These servers are connected with FibreChannel. So latency is negligible and all communication uses these channels when you use md. For the server these are just normal disks and the SAN doesn't care (the LUN's must be visible on both sites) Compare this with DRBD where all traffic passes the network...
–
PieterBSep 3 '10 at 12:23

I have worked with Veritas Volume Manager, Cluster and Global cluster in a $$$ company - I really liked it.

I`ve worked with host-based mirroring of SAN-devices.

I have a couple of XEN-clusters running DRBD with local disks for replication between two data-centers (not too far away from each other). I just ran into some troubles last friday after short network disconnects there...

What I really loved about the Veritas solution is that can fine-tune every aspect. So for a read-intensive db-application we tuned the volumes so that reads came from the primary data center colocated with the clients - that gave an enormous performance boost.

So for storage-replication: If you can afford it - go for Veritas.

Now for the cluster-software: I known Veritas, Sun, AIX/HACMP/HAGEO, HP-Serviceguard and Linux-Heartbeat.

I liked Veritas best and especially like the way it prevents split-brains (jeopardy mode)...

But you can achieve the same on any other cluster-software, if you use independent lines for heartbeats - so invest in these lines - instead of the software.

I may cite Alan Robertson here: "A cluster is not a cluster unless you tested it."

And I saw more downtimes BECAUSE of a complex cluster-setup than savings through such a setup. So keep it simple (Heartbat v1 instead of v2).