Tuesday, August 7, 2012

How NFS4 improved RPC

Network File System (NFS) since its very beginning has been using a lower level protocol in order to perform remote procedure calls. Thus NFS deals with files and Open Network Computing Remote Procedure Call (ONC RPC) takes care of sending requests and replies over the network.

However, it has been over 20 years since first NFS and RPC specifications and the former evolved very much while the latter remained virtually unchanged. NFS has changed from stateless protocol to stateful what made helper protocols like Network Lock Manager (NLM) and Network Status Monitor (NSM). Consequently, NFS started to require more guarantees on the way in which client requests are processed. Some of them RPC could not provide.

Since version 4 NFS supports file locks on its own without using any external protocol like the earlier versions did. There are also share reservations which are also a kind of file locks. Acquiring and releasing both file locks and share reservations are the operations that have to be ordered and executed at most once.

Unfortunately, RPC does not guarantee uses at-least-once semantics and messages are not ordered. RPC transaction identifier (XID) also does not help at all since the specification forbids the server to treat it as a sequence number. In addition to that, since RPC is independent from transport layer protocols NFS can not take advantage of any TCP or SCTP guarantees.

Ordering and at-most-once semantics are achieved by introducing sequence numbers and state owners to the requests that require it. For each state owner the server stores the last received sequence number L and the response that was returned to that request. Then, when the server receives another request with sequence number r one of the following will happen:

r < L - the request is rejected

r == L - received request is a duplicate and server returns the cached response

Following this behavior ensures that requests are performed in correct order and at most once if L was correctly initialized. The server need also a way to deal with the first use of a state owner and corresponding sequence number. NFS4 specification states that:

The first request issued for any given lock_owner is issued with a sequence number of zero.

This guarantee is too weak, though. The server is allowed to dispose any state owner if it is not used for a prolonged period of time. Hence, there may be a valid request with, from the server point of view, new state owner and non-zero sequence number.

To deal with such situations first use of an open owner needs to be additionally confirmed. Correct confirm request has sequence number one greater than the request it is confirming. Once the request is confirmed, a proper state is established. However, if the client fails to confirm the request in a timely manner or sends another request with sequence number that is incorrect for the one that is pending confirmation the server disposes the unconfirmed state.

Lock owners are dealt with in a bit less complicated way. Since, it is impossible to lock a file that is not already opened it can be safely assumed that when using a new lock owner there already exists a confirmed open owner. Each time new lock owner is used the client in the same request sends open owner sequence number. Thus first use of lock owner sequence number is also sequenced and does not need to be confirmed.

Earlier versions

It is worth mentioning that in the earlier versions there were also non-idempotent operations, namely create, rename and remove. Nevertheless, they did not require such special treatment as locking. In case of replicated rename or remove request the client was returned an error and assumed that someone else already removed the file.

Exclusive create operation in version 3 of the protocol uses a verifier to ensure at-most-once semantics. Verifier is a random value provided by the client. When a file is created the server stores the verifier, then when another exclusive create request is issued the server compares verifiers, if they are the same the request is a duplicate and still a success reply is returned. Otherwise, server informs the client that the file already exists.