Legend:

The client is implemented as a [http://fuse.sourceforge.net/ FUSE] filesystem that a user mounts locally. It subscribes to a metadata server, from which it downloads and constructs the filesystem hierarchy. When an application opens and reads a file whose data is stored remotely, the client pulls the requested data into the CDN and streams it back to the application via the read() call. When an application creates a file and writes to it, its data is written locally to underlying storage, and when the file is closed metadata for that file will be uploaded to the metadata server to add it to the filesystem. Future read() operations on this file will be directed to local storage. When an application opens a file whose data is hosted remotely and writes to it, the client downloads the file to local storage from the CDN, performs the write() operation, and informs the metadata server that the file is now hosted locally when the file is closed.

16

16

17

Periodically, the client polls the metadata server for metadata updates, which it then merges into the directory hierarchy. New files discovered by the metadata server will become visible to the client, and files that have been removed will disappear from the hierarchy (unless there are local, uncommitted changes), but not from underlying storage. If two clients both upload new metadata for the same file, the metadata record with the latest last-modified time is committed. As such, we require Syndicate client hosts and the metadata server to have loosely-synchronized clocks, and we have the metadata server discard new metadata that is too far ahead or behind the metadata server's host's clock.

17

Creates a directory is very similar to creating a file--the client first creates the directory on underlying storage and then uploads the metadata for the directory to the metadata server.

18

18

19

The client runs an embedded HTTP server to serve file data to the CDN. Each time a file is locally modified and closed, the client generates a new URL for that file and uploads it (along with the rest of the file's metadata) to the metadata server. The URL is generated by creating a symbolic link from the file's data on underlying storage to the client's HTTP document root, and appending a version number to the basename. Once the metadata server successfully receives the new metadata for the file (including the new URL), the old symbolic links are removed. This process is called ''republishing'' a file. If the CDN requests a file that has been locally modified but not yet republished, the client replies with the data, but adds a no-cache header to its response. This way, a remote reader can still get at the data without polluting the CDN with soon-to-be-stale data.

19

Any file or directory created on the client will be preserved on underlying storage, until it is removed by a local call to unlink() or rmdir(). In the event of a write-conflict where a remote file replaces a local file or a remote directory replaces a local directory, the underlying local data is preserved but the filesystem hierarchy the client presents will reflect the metadata server's hierarchy. This is because we do not want to introduce the possibility that a remote writer can destroy local data.

20

20

21

TODO: directories

21

Periodically, the client polls the metadata server for metadata updates, which it then merges into the directory hierarchy. New files and directories discovered by the metadata server will become visible to the client, and files and directories that have been removed will disappear from the hierarchy (unless there are local, uncommitted changes), but not from underlying storage if they were locally created. If two clients both upload new metadata for the same file or directory, the metadata record with the latest last-modified time is committed. As such, we require the Syndicate client hosts and the metadata server to have loosely-synchronized clocks (NTPv4 daemons work well), and we have the metadata server discard new metadata that is too far ahead or behind the metadata server's host's clock.

22

23

The client runs an embedded HTTP server to serve file data to the CDN. Each time a file is locally modified and closed, the client generates a new URL for that file and uploads it (along with the rest of the file's metadata) to the metadata server. The URL is generated by creating a symbolic link from the file's data on underlying storage to the client's HTTP document root, and appending a version number to the basename. Once the metadata server successfully receives the new metadata for the file (including the new URL), the old symbolic links are removed. This process is called ''republishing'' a file. If the CDN requests a file that has been locally modified but not yet republished, the client replies with the data, but adds a no-cache header to its response. This is intended to allow a remote reader that has not yet seen the new metadata for the file to get at the data without polluting the CDN with soon-to-be-stale data.