This is dedicated to the linux users, system admins, open source enthusiastic, techs whoever is looking for solution, tricks & concept etc. Reader will apply concept or execute command at their own risk. Owner of these article is not responsible for any impact, damages or errors.

Tuesday, May 17, 2011

What is cluster ?

■ Requirement : Red Hat Cluster Suite■ OS Environment : Linux[RHEL, Centos]■Clustering software : Red Hat Cluster suite■Storage : SCSI, SAN, NAS■ Storage Protocols : iSCSI(pronounced "eye-scuzzy) /FCP■ Resolution : cluster : A cluster is two or more interconnected computers that create a solution to provide higher availability, higher scalability or both. The advantage of clustering computers for high availability is seen if one of these computers fails, another computer in the cluster can then assume the workload of the failed computer. Users of the system see no interruption of access.

iSCSI => protcol to connect server to storage over IP network. Needs iSCSI initiator util will be on source or server. iSCSI target util will be on storage/target machine.
FCP => fibre channel protocol to connect server to storage over optical channel. Here needs HBA(host bus adapter like NIC) cards. It driver accesses this HBA and HBA communicates to SAN switch/storage controller.(Drivers like qla2xxx of qlogic company, lpfc(of emulex) etc)

Concepts: iSCSI is a protocol whereas SCSI is storage disk. consists using initiator(software+ hardware) and target. Initiator send packet to HBA. Target resides on storage like EqualLogic, NetApp filer, EMC NS-series or a HDS HNAS computer appliance. These attched with LUN to the drives. LUN is logical unit on storage treats as device or drive.

Storage System Connection Types :

a)active/active : all paths active all time
b)active‐passive : one path is active and other is backup
c)virtual port storage system.

Multipathing and Path Failover : When transferring data between the host server and storage, the SAN uses a multipathing technique where package "device-mapper-multipath" will have to be installed on server/node. The daemon "multipathd" periodically checks the connection paths to the storage. Multipathing allows you to have more than one physical path from the Server host to a LUN(treat is a device) on a storage system. If a path or any component along the path—HBA or NIC, cable, switch or switch port, or storage processor—fails, the server selects another of the available paths. The process of detecting a failed path and switching to another is called path failover.

Installation of Red Hat Cluster Suite on RHEL 5 :

1. Register system to RHN(Needs subscription with Red Het). Make sure it subscribes storage channel. Packages comes with DVD too. If Ignore if system is already registered :

5. Configuration file (will be same on each nodes):
/etc/cluster/cluster.conf, /etc/sysconfig/cluster. While you'll configure it using web interface, it'll be automatically copied on each nodes. Make sure you have enabled all the ports in firewall or disabled the firewall on all nodes as well as on luci node.

6. Now login into LUCI web interface and create a new cluster and give a name. Then in this lcuster add each nodes one by one. In this cluster add one fail over domain like httpd.(Make sure you have installed the httpd on each nodes where all the configuration files are same.). I shall describe it later and show you the result of real fail over testing.

AA) The shared partitions are used to hold cluster state information including "Cluster lock states", "Service states", "Configuration information". The shared disk may be on any node or on storage disk( will be connected to HBA, RAID controller(raid 1 ie mirror). This will be for shared disk(primary partition+shadow). Each minimum 10MB. Two raw devices on shared disk storage must be created for the primary shared partition and the shadow shared partition. Each shared partition must have a minimum size of 10 MB. The amount of data in a shared partition is constant; it does not increase or decrease over time. Periodically, each member writes the state of its services to shared storage. In addition, the shared partitions contain a version of the cluster configuration file. This ensures that each member has a common view of the cluster configuration. If the primary shared partition is corrupted, the cluster members read the information from the shadow (or backup) shared partition and simultaneously repair the primary partition. Data consistency is maintained through checksums, and any inconsistencies between the partitions are automatically corrected. If a member is unable to write to both shared partitions at start-up time, it is not allowed to join the cluster. In addition, if an active member can no longer write to both shared partitions, the member removes itself from the cluster by rebooting (and may be remotely power cycled by a healthy member).

BB) The following are shared partition requirements:

a)Both partitions must have a minimum size of 10 MB.
b)Shared partitions must be raw devices since file cache won't be there. They cannot contain file systems.
c)Shared partitions can be used only for cluster state and configuration information.

CC) Following are recommended guidelines for configuring the shared partitions(By Red Hat):

a)It is strongly recommended to set up a RAID subsystem for shared storage, and use RAID 1 (mirroring) to make the logical unit that contains the shared partitions highly available. Optionally, parity RAID can be used for high availability. Do not use RAID 0 (striping) alone for shared partitions.
b)Place both shared partitions on the same RAID set, or on the same disk if RAID is not employed, because both shared partitions must be available for the cluster to run.
c)Do not put the shared partitions on a disk that contains heavily-accessed service data. If possible, locate the shared partitions on disks that contain service data that is rarely accessed.

ii)Add quorum disk to cluster at the backend(In web interface it can be done. Just login into luci interface and go to cluster. You'll see "Quorum Partition" tab. click on it and proceed further to configure it.) :

Note : Need to manually copy this file on each node. But if you do in web interface, you don't need to manually cop. It'll automatically done.

b)Please increase the config_version by 1 and run ccs_tool update /etc/cluster/cluster.conf.
c) Check to verify that the quorum disk has been initialized correctly : #mkqdisk -L and clustat to check its availability.
d)Please note Total votes=quorum votes=5=2+3, if quorum disk vote is less than (node votes+1), the cluster wouldn’t have survived
e) Typically, the heuristics should be snippets of shell code or commands which help determine a node’s usefulness to the cluster or clients. Ideally, you want to add traces for all of your network paths (e.g. check links, or ping routers), and methods to detect availability of shared storage. Only one master is present at any one time in the cluster, regardless of how many partitions exist within the cluster itself. The master is elected by a simple voting scheme in which the lowest node which believes it is capable of running (i.e. scores high enough) bids for master status. If the other nodes agree, it becomes the master. This
algorithm is run whenever no master is present. Here it is "ping -c1 -t1 10.65.211.86". IP may be san ip/ other nodes' IP etc.

8. Configuring Storage : (Either SAN/NAS - using multipath or nfs)
In luci interface click on "add a system" and then go to storage tab and assign the storage in the cluster.

To start the cluster software on a member, type the following commands in this order:

1. service ccsd start
2. service lock_gulmd start or service cman start according to the type of lock manager used
3. service fenced start
4. service clvmd start
5. service gfs start, if you are using Red Hat GFS
6. service rgmanager start

To stop the cluster software on a member, type the following commands in this order:

1. service rgmanager stop
2. service gfs stop, if you are using Red Hat GFS
3. service clvmd stop
4. service fenced stop
5. service lock_gulmd stop or service cman stop according to the type of lock manager used
6. service ccsd stop

Stopping the cluster services on a member causes its services to fail over to an active member.
=================