Similar presentations

2
Agenda Basic Algorithms such as Leader Election Consensus in Distributed Systems Replication and Fault Tolerance in Distributed Systems GFS as an example of a Distributed System

3
Network Algorithms Distributed System is a collection of entities where Each of them is autonomous, asynchronous and failure-prone Communicating through unreliable channels To perform some common function Network algorithms enable such distributed systems to effectively perform these “common functions”

4
Gobal State in Distributed Systems We want to estimate a “consistent” state of a distributed system Required for determining if the system is deadlocked, terminated and for debugging Two approaches: 1. Centralized- All processes and channels report to a central process 2. Distributed – Chandy Lamport Algorithm

5
Chandy Lamport Algorithm Based on Marker Messages M On receiving M over channel c: If state is not recorded: a) Record own state b) Start recording state of incoming channels c) Send Marker Messages to all outgoing channels Else a) Record state of c

8
Ring Election Processes organized in a ring Send message clockwise to next process in a ring with its id and own attribute value Next process checks the election message a) if its attribute value is greater, it replaces its own process id with that in the message. b) If the attribute value is less, it simply passes on the message c) If the attribute value is equal it declares itself as the leader and passes on an “elected” message. What happens when a node fails?

12
Consensus A set of n processes/systems attempt to “agree” on some information P i begins in undecided state and proposes value v i єD P i ‘s communicate by exchanging values P i sets its decision value d i and enters decided state Requirements: 1.Termination: Eventually all correct processes decide, i.e., each correct process sets its decision variable 2. Agreement : Decision value of all correct processes is the same 3. Integrity: If all correct processes proposed v, then any correct decided process has d i = v

13
2 Phase Commit Protocol Useful in distributed transactions to perform atomic commit Atomic Commit: Set of distinct changes applied in a single operation Suppose A transfers 300 $ from A’s account to B’s bank account. A= A-300 B=B+300 These operations should be guaranteed for consistency.

14
2 Phase Commit Protocol What happens if the co-ordinator and a participant fails after doCommit?

20
3PC Cont… Why is this better? 2PC: execute transaction when everyone is willing to COMMIT it 3PC: execute transaction when everyone knows it will COMMIT (http://www.coralcdn.org/07wi-cs244b/notes/l4d.txt) But 3PC is expensive Timeouts triggered by slow machines

21
Paxos Protocol A consensus algorithm Important Safety Conditions: Only one value is chosen Only a proposed value is chosen Important Liveness Conditions: Some proposed value is eventually chosen Given a value is chosen, a process can learn the value eventually Nodes behave as Proposer, Acceptor and Learners

22
Paxos Protocol – Phase 1 22 Proposer Acceptor Select a number n for proposal of value v Prepare message What about this acceptor? Majority of acceptors is enough Acceptors respond back with the highest n it has seen Acknowledgement

27
Paxos Protocol Contd… Some issues: a) How to choose proposer? b) How do we ensure unique n ? c) Expensive protocol d) No primary if distinguished proposer used Originally used by Paxons to run their part-time parliament

29
Failure in Distributed Systems An important consideration in every design decision Fault detectors should be : a) Complete – should be able to detect a fault when it occurs b) Accurate – Does not raise false positives

30
Byzantine Faults Arbitrary messages and transitions Cause: e.g., software bugs, malicious attacks Byzantine Agreement Problem: “Can a set of concurrent processes achieve coordination in spite of the faulty behavior of some of them?” Concurrent processes could be replicas in distributed systems

33
PBFT Cont… The algorithm provides -> Safety By guaranteeing linearizability. Pre-prepare and prepare ensures total order on messages -> Liveness By providing for view change, when the primary replica fails. Here, synchrony is assumed. How do we know apriori the value of f?

38
GFS Design Isuues Replication of chunks a) Replication across racks – default number is 3 b) Allowing concurrent changes to the same file. -> In retrospect, they would rather have a single writer c) Primary replica serializes mutation to chunks - They do not use any of the consensus protocols before applying mutations to the chunks. Ref: http://queue.acm.org/detail.cfm?id=1594206