Best practices

Many people are asking us about technical aspects of setting up MooseFS instances. In order to answer these questions, we are publishing a list of best practices and hardware recommendations. Follow these to achieve best reliability of your MooseFS installation.

1. Minimum goal set to 2

In order to keep your data safe, we recommend to set the minimum goal to 2 for the whole MooseFS instance.

The goal is a number of copies of files' chunks distributed among Chunkservers. It is one of the most crucial aspects of keeping data safe.

If you have set the goal to 2, in case of a drive or a Chunkserver failure, the missing chunk copy is replicated from another copy to another chunkserver to fulfill the goal,
and your data is safe.

If you have the goal set to 1, in case of such failure, the chunks that existed on a broken disk, are missing, and consequently, files that these chunks belonged to, are also missing.
Having goal set to 1 will eventually lead to data loss.

To set the goal to 2 for the whole instance, run the following command on the server that MooseFS is mounted on (e.g. in /mnt/mfs):

# mfssetgoal -r 2 /mnt/mfs

You should also prevent the users from setting goal lower than 2.
To do so, edit your /etc/mfs/mfsexports.cfg file on every Master Server and set mingoal appropriately in each export:

The value 1 (before multiplying by BACK_LOGS + 1)
is an estimation of size used by one changelog.[number].mfs file. In highly loaded instance it uses a bit less than 1 GB.

Example: If you have 128 GiB of RAM on your Master Server, using the given formula, you should reserve for /var/lib/mfs on Master Server(s):

128*3 + 51 = 384 + 51 = 435 GiB minimum.

3. RAID 1 or RAID 1+0 for storing metadata

We recommend to set up a dedicated RAID 1 or RAID 1+0 array for storing metadata dumps and changelogs. Such array should be mounted on /var/lib/mfs directory and should not be smaller than the value calculated in the previous point.

We do not recommend to store metadata over the network (e.g. SANs, NFSes, etc.).

4. Virtual Machines and MooseFS

5. JBOD and XFS for Chunkservers

We recommend to connect to Chunkserver(s) JBODs. Just format the drive as XFS and mount on e.g. /mnt/chunk01, /mnt/chunk02, ... and put these paths into /etc/mfs/mfshdd.cfg. That's all.

We recommend such configuration mainly because of two reasons:

MooseFS has a mechanism of checking if the hard disk is in a good condition or not. MooseFS can discover broken disks, replicate the data and mark such disks as damaged.
The situation is different with RAID: MooseFS algorithms do not work with RAIDs, therefore corrupted RAID arrays may be falsely reported as healthy/ok.

The other aspect is time of replication. Let's assume you have goal set to 2 for the whole MooseFS instance. If one 2 TiB drive breaks, the
replication (from another copy) will last about 40-60 minutes. If one big RAID (e.g. 36 TiB) becomes corrupted, replication can last even for 12-18 hours.
Until the replication process is finished, some of your data is in danger, because you have only one valid copy. If another disk or RAID
fails during that time, some of your data may be irrevocably lost. So the longer replication period puts your data in greater danger.

6. Network

We recommend to have at least 1 Gbps network. Of course, MooseFS will perform better in 10 Gbps network
(in our tests we saturated the 10 Gbps network).

We recommend to set LACP between two switches and connect each machine to both of them to enable redundancy of your network connection.

7. overcommit_memory on Master Servers (Linux only)

If you have an entry similar to the following one in /var/log/syslog or /var/log/messages: fork error (store data in foreground - it will block master for a while) you may encounter (or are encountering) problems with your master server, such as timeouts and dropped connections from clients.
This happens, because your system does not allow MFS Master process to fork and store its metadata information in background.

Linux systems use several different algorithms of estimating how much memory a single process needs when it is created. One of these algorithms assumes that if we fork a process, it will need exactly the same amount of memory as its parent. With a process taking 24 GB of memory and total amount of 40 GB (32 GB physical plus 8 GB virtual) and this algorithm, the forking would always be unsuccessful.

But in reality, the fork command does not copy the entire memory, only the modified fragments are copied as needed. Since the child process in MFS master only reads this memory and dumps it into a file, it is safe to assume not much of the memory content will change.

Therefore such "careful" estimating algorithm is not needed. The solution is to switch the estimating algorithm the system uses. It can be done one-time by a root command: # echo 1 > /proc/sys/vm/overcommit_memory

To switch it permanently, so it stays this way even after the system is restarted, you need to put the following line into your /etc/sysctl.conf file: vm.overcommit_memory=1

8. Disabled updateDB feature (Linux only)

Updatedb is part of mlocate which is simply an indexing system, that keeps a database listing all the files on your server.
This database is used by the locate command to do searches.

9. Up-to-date operating system

We recommend to use up-to-date operating system. It doesn't matter if your OS is Linux, FreeBSD or MacOS X. It needs to be up-to-date.
For example, some features added in MooseFS 3.0 will not work with old FUSE version (which is e.g. present on Debian 5).

10. Hardware recommendation

Since MooseFS Master Server is a single-threaded process, we recommend to use modern processors with high clock and low number of cores for Master Servers, e.g.:

We also recommend to disable hyper-threading CPU feature for Master Servers.

For Chunkservers we recommend modern multi-core processors.

Minimal recommended and supported HA configuration for MooseFS Pro is 2 Master Servers and 3 Chunkservers.
If you have 3 Chunkservers, and one of them goes down, your data is still accessible and is being replicated and system still works.
If you have only 2 Chunkservers and one of them goes down, MooseFS waits for it and is not able to perform any operations.

Minumum number of Chunkservers required to run MooseFS Pro properly is 3.