Recall that Paxos is a distributed system algorithm with the goal that the processes participating in its protocol will reach consensus on one of the valid values.

I was trying to better understand the revoting mechanism to avoid deadlocks in Paxos. The revoting mechanism is explained as follows in the article:

The revoting mechanism now works like this: before taking a vote, a
proposer tests the waters by sending a prepare(n) message to all
accepters where n is the proposal number. An accepter responds to this
with a promise never to accept any proposal with a number less than n
(so that old proposals don't suddenly get ratified) together with
the highest-numbered proposal that the accepter has accepted (so
that the proposer can substitute this value for its own, in case the
previous value was in fact ratified).

The bold section is the one that I was trying to understand better. The author tries to justify it with:

"...so that the proposer can substitute this value for its own, in case the
previous value was in fact ratified..."

But I didn't really understand why the proposer would want to ratify the previous value. By doing this, what crucial safety property is he guaranteeing? Why is he responding with that and not something else? Responding with the highest could be problem, right, since the current proposal would get lost?

2 Answers
2

As I was writing my question I realized what I believe to be the correct answer and wanted to share it just in case someone else was wondering this step in Paxos, or maybe correct it if its not 100% correct (or clear).

Recall that, in Paxos, the actual goal is to achieve some consensus (agreement) on some value. The actual value is not very important as long as its from the valid set of values. However, we only care that our protocol somehow achieves some agreement. Thus, if we are a proposer and want to propose a value, what we really care is that some value is agreed amongst the processes participating in the protocol (since that the goal of a consensus algorithm, that some valid value is accepted). If for some reason, we were desperate for our value to be accepted at some point, we could eventually re-run a different instance of Paxos later (or more instances of Paxos until that value is chosen), so its ok to try to suggest the value later (to a different paxos instance) if we really need to (but that will be a different set of rounds of Paxos, starting with round zero and doing Paxos all over again). Its crucial to understand that suggesting the value to the same paxos instance is pointless since the point is that once paxos reaches consensus, that it sticks to it. We have to run paxos for a different set of values if we want our value to still be decided at some point.

But, the point of the step that "acceptors respond with the highest-numbered proposal that the current acceptor has seen", is to guarantee that if a value in the past (with lower sequence number) had been ratified, then why complicate agreement by considering a different value? It's easier to eventually agree at some value if we just keep that ratified value instead. Furthermore, and the more important reason is, imagine if we decided to discard the value we had ratified previously and instead ratify the new incoming one. Well, if we did that and update our state it could be very bad, because, one important property we wish our algorithm to have is, "once an agreement by majority has been reached, we wish it to be impossible for our system to change its mind" (because consensus has occurred already, so a change in mind is completely unnecessary). If we take this new (wrong) approach I have suggested, then that property could not hold. However, if instead, we always just respond with the highest ratified value, then the proposer on the other side is bound to learn the chosen value, if one has been chosen (if he queries some majority). Thus, the safe thing to do is, just send the proposer the current ratified value with the highest sequence number, that way, if a value was chosen, then the system is not able to change its mind if a consensus occurred (since its not necessary for it to change its mind since some value has already been accepted by some majority!). The whole point of consensus means that we agree on something, so changing the value can be viewed as breaking consensus.

$\begingroup$This is more or less correct (you should accept your answer) - the key is that this will improve the algorithm's progress since multiple proposers requesting accepts of the same value will usually achieve majority consensus faster than several unique contending ones.$\endgroup$
– VF1May 28 '14 at 22:42

Consider three nodes A, B and C playing every role (proposer, acceptor, learner) each in a separate rack with an interconnected network switch in each rack. Node A is leading in steady state. The link goes dark between nodes A and B. After a short while of no messages node B times-out and proposes. Node B issues a prepare to itself and node C, accepts its own proposal, and awaits the response from C to get a majority. If C reports a value then it was issued by either A or C, if A then it has accepted its own value as well as C so that value is fixed and that fact may have been communicated as such to the client. Node B must propose that value to make itself consistent with the other two nodes and the knowledge of the client.

Extend the example to five nodes by adding node D into the same rack as A, and node E into the the same rack as B. Again A is leading and the link between the racks of A,D and B,E goes dark. Once again node C is the single node in overlapping majorities. Node B times-out, proposes getting back a promise from itself, E and C with C once again returning a value. Whether the value is fixed on the other side of the partition depends on whether nodes A,D accepted the message. This cannot be determined be asking C what's going on with A,D as it may be that the switch in the rack A,D is dead and the client who issued the value is also in that rack so it may or may not have been told that the value is fixed. The only safe decision is to assume a majority was obtained and to complete the work and propagate the value reported by C across all reachable nodes.