Cmd_nfs3d

nfs3d - daemon for the NFS bridge

Synopsis

nfs3d -conf file [-fg] [-pid file]

Description

This is the daemon acting as an NFS server and forwarding requests to
the PlasmaFS cluster (namenodes and datanodes). The daemon implements
the nfs and mountd programs of NFS version 3. There is no support for
the nlockmgr protocol yet.

For security reasons, this daemon should only be bound to the local
loopback network (127.0.0.1). NFSv3 is inherently insecure, as there
are no authentication verifiers. It is possible and recommended to run
the daemon on every machine that wants to mount the filesystem. This
way, the security problems can be avoided, because the unprotected
data exchange is then restricted to a circuit in the local machine.

An instance of the NFS bridge can only connect to a single PlasmaFS
cluster.

The NFS bridge can only be contacted over TCP. There is no UDP support,
and it is also not planned. NFS runs well over TCP.

Here, <host> is to be replaced by the machine running the NFS bridge
(normally localhost). <clustername> is the name of the
cluster. The port numbers might need adjustments - we assume the same
numbers are used as in the examples.

NFS (version 3) only implements weak cache consistency: An NFS client
usually caches data as long as nothing is known about a possible
modification, and modifications can only be recognized by changed
metadata (i.e. the mtime in the inode is changed after a
write). Although NFS clients typically query metadata often, it is
possible that data modifications remain unnnoticed. This is a problem
in the NFS protocol, not in the bridge. The PlasmaFS protocol has
better cache consistency semantics, especially it is ensured that a
change of data is also represented as an update of the
metadata. However, the different semantics may nevertheless cause
incompatibilities. For example, it is allowed for a PlasmaFS client to
change data without changing the mtime in the inode. Within the
PlasmaFS system this is not a big problem, because there are other
means to reliably detect the change. An NFS client connected via this
bridge might not see the update, though, and may continue to pretend
that its own cache version is up to date. All in all, it is expected
that these problems are mostly of theoretical nature, and will usually
not occur in practice.

NFS version 3 can deal with large blocks in the protocol, and some
client implementations also support that. For example, the Linux
client supports block sizes up to 1M automatically, i.e. this is the
maximum transmission unit for reads and writes. Independently of the
client support, the NFS bridge translates the sizes of the data blocks
used in the NFS protocol to what the PlasmaFS protocol requires. This
means that the NFS bridge can handle the case that the client uses
data sizes smaller than the PlasmaFS block size. There is a performance
loss, though.

Especially for write accesses, it should be avoided that the
blocksize is larger than the maximum blocksize the NFS client
can support. Otherwise there might be an extreme performance loss.
(Actually, this is a problem in NFS clients, and cannot be worked
around in the server.)

PlasmaFS does not keep the count of the hard links a file has.
Because of this, the NFS bridge always reports this count as 1.

Mapping principals

NFS uses numeric UIDs and GIDs to identify users and groups while
PlasmaFS prefers names. Because of this, the numeric identifiers
need to mapped to names (and vice versa).

The daemon just consults the local /etc/passwd and /etc/group
files to do the mappping. This is correct if the filesystem is mounted
via the local loopback network (i.e. for the recommended
configuration), and it is better than everything else if the
filesystem is mounted over a real network.

For simplicity, the daemon just believes the group memberships the NFS
client claims, i.e. the memberships are not verified with the PlasmaFS
namenode. Because of this, it is possible to have different group
memberships via NFS than via using the PlasmaFS protocol directly.
(This might be fixed in a future release.)

Persistent mounts

The NFS bridge stores mounts in a PlasmaFS file under
/.plasma/var/lib/nfs3. Because of this, the mounts survive restarts
of the bridge (or other PlasmaFS components).

Options

-conf file: Reads the configuration from this file. See below for
details.

-fg: Prevents that the daemon detaches from the terminal and puts
itself into the background.

-pid file: Writes this pid file once the service process is forked.

Configuration

The configuration file is in Netplex syntax, and also uses many features
from this framework. See the documentation for Netplex which is available
as part of the Ocamlnet library package. There are also some explanations
here: Cmd_plasmad.

node_list is a text file containing the names of the namenodes, one
hostname a line.

buffer_memory configures how large the internal buffer is that
the NFS bridge uses. Bigger buffers improve performance.

It is not advisable to use the official NFS ports, or to register
the NFS ports with portmapper.

How to shut down the daemon

First, one should unmount all NFS clients. There is no way for an NFS
server to enforce unmounts (i.e. that clients write all unsaved data).

The orderly way for shutting down the daemon is the command

netplex-admin -sockdir <socket_directory> -shutdown

netplex-admin is part of the Ocamlnet distribution. The
socket directory must be the configured socket directory.

It is also allowed to do a hard shutdown by sending SIGTERM signals to
the process group whose ID is written to the pid file. There is no
risk of data loss in the server because of the transactional
design. However, clients may be well confused when the connections
simply crash.