multiple processes appending to a single file - NFS

This is a discussion on multiple processes appending to a single file - NFS ; Consider the situation of an empty file, and two processes A and B
concurrently write N bytes to this file via NFS. Process A writes to
offset 0, and process B lseeks() to offset N to write there.
Additionally, each ...

multiple processes appending to a single file

Consider the situation of an empty file, and two processes A and B
concurrently write N bytes to this file via NFS. Process A writes to
offset 0, and process B lseeks() to offset N to write there.
Additionally, each process write-locks the range it is accessing before
actually doing so.

Please note that this is different from the "append-to-file-via-NFS"
problem that has been discussed to some degree already. Here, each
process seeks to a known (non-conflicting) position to write its data.

We have observed that for the described access pattern, it can happen
that after the write operations, the file contains all zeros for offsets
0..N-1, and the data of process B between N..2N-1. This means, the data
of process A is lost. Obviously, to comply with POSIX, the write
operation of process B implied filling up the "gap" between 0 and N-1
with zeros, overwriting the data of process A (although it locked it's
range!). This showed up with different NFS server implementations and a
single NFS client implementation which makes us think that this might be
a problem of the client.

Has anyone experienced this problem? Is the expected behaviour defined
anywhere? Shouldn't the locking prevent the observed behaviour (meaning
there's a bug somewhere)? I found nothing related on the web, and
neither in the NFSv3 RFC.

--
Joachim - reply to joachim at domain ccrl-nece dot de

Opinion expressed is personal and does not constitute
an opinion or statement of NEC Laboratories.

Re: multiple processes appending to a single file

Joachim Worringen wrote:
> We have observed that for the described access pattern, it can happen
> that after the write operations, the file contains all zeros for
offsets
> 0..N-1, and the data of process B between N..2N-1. This means, the
data
> of process A is lost. Obviously, to comply with POSIX, the write
> operation of process B implied filling up the "gap" between 0 and N-1
> with zeros, overwriting the data of process A (although it locked
it's
> range!). This showed up with different NFS server implementations and
a
> single NFS client implementation which makes us think that this might
be
> a problem of the client.
>
Not "Obviously". POSIX says that "holes in files" (areas not yet
written)
are read as all zeros, but that doesn't mean they should be written
with
zeros. What you describe should work and should not require any locks
(which
for NFSv2 and v3 are not a part of the NFS protocol, but "supported" by
another protocol that often doesn't work:-).

Since writes can be delayed at the client, it might be prudent to
add fsync() calls after the write() calls, to try and ensure the
writes are pushed to the server.