Each machine has about 200[GB] of data that I'd like to share with all other 19 machines for READ ONLY PURPOSES. The reading should be done at the fastest possible way.

A friend told me to look into setting up HTTP / FTP. Is it indeed the optimal way to share data between the machines (better than NFS)? if so, how do I go about it?

Is there a python module that would help in accessing/reading the data?

UPDATE: Just to clarify, all I want is to be able (from within machine X) to access one of machine Ys files and LOAD IT INTO MEMORY. all of the files are of uniform size (500 [KB]). Which method is fastest (SAMBA / NFS / HTTP / FTP)?

This question came from our site for professional and enthusiast programmers.

@user540009, "Each machine has about 200[GB] of data that I'd like to share with all other 19 machines for READ ONLY PURPOSES." sounds like you have 200GB/machine. Perhaps you can change your question to reflect only 500KB.
–
GregApr 4 '11 at 4:19

2 Answers
2

There are hundreds of ways to solve this problem. You can mount a FTP, or HTTP file system over fuse, or even use NFS (why not?). Search for httpfs2 or curlftpfs (or even sshfs, which should not be used if you are looking for performance)

But the problem I see is, you have a single point of failure of the one and only master machine. Why not distribute the storage?

I usually using glusterfs [1], which is fast and can be used in different modes.