4 Clustering Achieve reliability and scalability by interconnecting multiple independent systems Cluster: A group of standard, autonomous servers configured so they appear on the network as a single machine Single system image 4

5 Ideally Bunch of off-the shelf machines Interconnected on a high speed LAN Appear as one system to users Processes are load-balanced across the cluster May migrate May run on different systems All IPC mechanisms and file access available Fault tolerant Components may fail Machines may be taken down 5

6 We don t get all that (yet) at least not in one general purpose package 6

9 Cluster membership Software to manage cluster membership What are the nodes in the cluster? Which nodes in the cluster are currently alive (active)? Quorum Number of elements that must be online for the cluster to function Voting algorithm to determine whether the set of nodes has quorum (a majority of nodes to keep running) Keeping track of quorum Count cluster nodes running the cluster manager If over ½ are active, the cluster has quorum Forcing a majority avoids split-brain 9

11 Cluster configuration & service management Cluster configuration system Manages configuration of systems and software in a cluster Runs in each cluster node Changes propagate to all nodes Administrator has a single point of control Service management Identify which applications run where Specify how failover occurs November 27, Paul Krzyzanowski 11

16 Cluster Interconnect Sometimes you want a cluster interconnect that is separate from the LAN Extra LAN dedicated for cluster activities & cluster data movement Sometimes known as System Area Network (SAN) For storage: Storage Area Network (SAN) 16

19 Shared storage access If an application can run on any machine, how does it access file data? If an application fails over from one machine to another, how does it access its file data? Can applications on different machines share files? 19

23 Shared nothing Shared nothing No shared devices Each system has its own storage resources No need to deal with DLMs If a machine A needs resources on B, A sends a message to B If B fails, storage requests have to be switched over to a live node Exclusive access to shared storage Multiple nodes may have access to shared storage Only one node is granted exclusive access Exclusive access changed on failover 23

26 HA issues How do you detect failover? How long does it take to detect? How does a dead application move/restart? Where does it move to? 26

27 Heartbeat network Machines need to detect faulty systems Heartbeat: ping mechanism Need to distinguish system faults from network faults Useful to maintain redundant networks Send a periodic heartbeat to test a machine s liveness Watch out for split-brain! Synchronous networks make it easier They give us a bounded response time Microsoft Cluster Server supports a dedicated private network Two network cards connected with a pass-through cable or hub Can also use SAN interconnect for heartbeats IP & Ethernet are asynchronous 27

30 Design options for failover With either type of failover Multi-directional failover Failed applications migrate to / restart on available systems Cascading failover If the backup system fails, application can be restarted on another surviving system 30

31 IP Address Takeover (IPAT) Depending on the deployment: Ignore IP addresses of services don t matter. A load balancer, name server, or coordinator will identify the correct machine Take over IP address A node in an active/passive configuration may need to take over the IP address of a failed node Take over MAC address MAC address takeover may be needed if we cannot guarantee that other nodes will flush their ARP cache Listen on multiple addresses A node in an active/active configuration may need to listen on multiple IP addresses 31

33 Fencing Fencing: method of isolating a node from a cluster Failed node Disconnect I/O to ensure data integrity Avoid problems with Byzantine failures Avoids problems with fail-restart Restarted node has not kept up to date with state changes Types of fencing Power fencing: shut power off a node SAN fencing: disable a Fibre Channel port to a node Disable access to a global network block device (GNBD) server Software fencing: remove server processes from the group E.g., virtual synchrony 33

42 Clustering for performance Example: Early effort on Linux Beowulf Initially built to address problems associated with large data sets in Earth and Space Science applications From Center of Excellence in Space Data & Information Sciences (CESDIS), division of University Space Research Association at the Goddard Space Flight Center This isn t one fixed package Just an example of putting tools together to create a supercomputer from commodity hardware 42

44 What can you run? Programs that do not require fine-grain communication Nodes are dedicated to the cluster Performance of nodes not subject to external factors Interconnect network isolated from external network Network load is determined only by application Global process ID provided Global signaling mechanism 44

60 Redirection Trivial to implement Successive requests automatically go to the same web server Important for sessions Visible to customer Don t like the changing URL Bookmarks will usually tag a specific site 60

61 Load balancing router As routers got smarter Not just simple packet forwarding Most support packet filtering Add load balancing to the mix This includes most IOS-based Cisco routers, Altheon, F5 Big-IP 61

62 Load balancing router Assign one or more virtual addresses to physical address Incoming request gets mapped to physical address Special assignments can be made per port e.g., all FTP traffic goes to one machine Balancing decisions: Pick machine with least # TCP connections Factor in weights when selecting machines Pick machines round-robin Pick fastest connecting machine (SYN/ACK time) Persistence Send all requests from one user session to the same system 62

Lectures on distributed systems Building scalable and reliable systems Paul Krzyzanowski Background The traditional approach to designing highly available systems was to incorporate elements of fault-tolerant

Features Comparison: Hyper-V Server and Hyper-V February 2012 The information contained in this document relates to a pre-release product which may be substantially modified before it is commercially released.

TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer

Cloud Optimize Your IT Windows Server 2012 The information contained in this presentation relates to a pre-release product which may be substantially modified before it is commercially released. This pre-release

Network Attached Storage Jinfeng Yang Oct/19/2015 Outline Part A 1. What is the Network Attached Storage (NAS)? 2. What are the applications of NAS? 3. The benefits of NAS. 4. NAS s performance (Reliability

Red Hat Enterprise linux 5 Continuous Availability Businesses continuity needs to be at the heart of any enterprise IT deployment. Even a modest disruption in service is costly in terms of lost revenue

The functionality and advantages of a high-availability file server system This paper discusses the benefits of deploying a JMR SHARE High-Availability File Server System. Hardware and performance considerations

White Paper From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller The focus of this paper is on the emergence of the converged network interface controller

the Availability Digest Penguin Computing Offers Beowulf Clustering on Linux January 2007 Clustering can provide high availability and superr-scalable high-performance computing at commodity prices. The

Poster Companion Reference: Hyper-V and Failover Clustering Introduction This document is part of a companion reference that discusses the Windows Server 2012 Hyper-V Component Architecture Poster. This

Data sheet A virtual SAN for distributed multi-site environments What is StorMagic SvSAN? StorMagic SvSAN is a software storage solution that enables enterprises to eliminate downtime of business critical

Building a Linux Cluster CUG Conference May 21-25, 2001 by Cary Whitney Clwhitney@lbl.gov Outline What is PDSF and a little about its history. Growth problems and solutions. Storage Network Hardware Administration

Delivering High Availability Solutions with Red Hat Cluster Suite Abstract This white paper provides a technical overview of the Red Hat Cluster Suite layered product. The paper describes several of the

Integrated Application and Data Protection NEC ExpressCluster White Paper Introduction Critical business processes and operations depend on real-time access to IT systems that consist of applications and

Network Storage for Business Continuity and Disaster Recovery and Home Media White Paper Abstract Network storage is a complex IT discipline that includes a multitude of concepts and technologies, like

Objectives At the end of this chapter, participants will be able to understand: Web server management options provided by Network Deployment Clustered Application Servers Cluster creation and management

Hewlett Packard - NBU partnership : SAN (Storage Area Network) или какво стои зад облаците Why SAN? Business demands have created the following challenges for storage solutions: Highly available and easily

LS-DYNA Performance Benchmark and Profiling on Windows July 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center

At one point in time only a single processor was needed to power a server and all its applications. Then came multiprocessing, in which two or more processors shared a pool of memory and could handle more

Delivering High Availability Solutions with Red Hat Cluster Suite Abstract This white paper provides a technical overview of the Red Hat Cluster Suite layered product. The paper describes several of the

High Availability Solutions for the MariaDB and MySQL Database 1 Introduction This paper introduces recommendations and some of the solutions used to create an availability or high availability environment

Red Hat Global File System for scale-out web services by Subbu Krishnamurthy (Based on the projects by ATIX, Munich, Germany) Red Hat leads the way in delivering open source storage management for Linux

the Availability Digest Redundant Load Balancing for High Availability July 2013 A large data center can comprise hundreds or thousands of servers. These servers must not only be interconnected, but they

Peter Ruissen Marju Jalloh Agenda concepts >> To research the possibilities for High Availability (HA) failover mechanisms using the XEN virtualization technology and the requirements necessary for implementation

Fibre Channel Overview from the Internet Page 1 of 11 Fibre Channel Overview of the Technology Early History and Fibre Channel Standards Development Interoperability and Storage Storage Devices and Systems

LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family White Paper June, 2008 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy OVERVIEW The global communication and the continuous growth of services provided through the Internet or local infrastructure require to

To ensure the functioning of the site, we use cookies. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy &amp Terms.
Your consent to our cookies if you continue to use this website.