PVFS: A Parallel Virtual File System for Linux Clusters

An introduction to the Parallel Virtual File System and a look at how one company installed and tested it.

Using networked file systems is a common
method for sharing disk space on UNIX-like systems, including
Linux. Sun was the first to embrace this technology by introducing
the Network File System (NFS), which provides file sharing via the
network. NFS is a client/server system that allows users to view,
store and update files on remote computers as though they were on
the user's own computer. NFS has since become the standard for file
sharing in the UNIX community. Its protocol uses the Remote
Procedure Call method of communication between computers.

Using NFS, the user or a system administrator can mount all
or a portion of a file system. The portion of your file system that
is mounted can be accessed with whatever privileges accompany your
access to each file (read-only or read-write).

As the popularity and utility of this type of system have
grown, more networked file systems have appeared. These new systems
include advances in reliability, security, scalability and
speed.

As part of my responsibilities in the Systems Research
Department at Ericsson Research Canada, I evaluated Linux-networked
file systems to decide what networked file system(s) to adopt for
our Linux Clusters. At this stage, we are experimenting with Linux
and clustering technologies and trying to build a Linux cluster
that provides extremely high scalability and high
availability.

An important factor in building such a system is the choice
of the networked file system(s) with which it will be used. Among
the tested file systems were Coda, Intermezzo, Global File System
(GFS), MOSIX File System (MFS) and the Parallel Virtual File System
(PVFS). After considering these and other options, the decision was
made to adopt PVFS as the networked file system for our test Linux
cluster. We are also using the MOSIX file system as part of the
MOSIX package (see Resources) that enhances the Linux kernel with
cluster-computing capabilities.

In this article, we cover our initial experiences with the
PVFS system. We first discuss the design of the PVFS system in
order to help familiarize readers with the terminology and
components of PVFS. Next, we cover installation and configuration
on the 7 CPU Linux Cluster at the Ericsson Systems Research Lab in
Montréal. Finally, we discuss the strengths and weaknesses
of the PVFS system in order to help others decide if PVFS is right
for them.

PVFS Overview and Goals

Linux cluster technology has matured and undergone many
improvements in the last few years. Commodity hardware speed has
increased, and parallel software has become more advanced.
Input/Output (I/O) support has traditionally lagged behind
computational advances, however. This limits the performance of
applications that process large amounts of data or rely on
out-of-core computation.

Figure 1. PVFS System Architecture

PVFS was constructed with two main objectives. The foremost
is to provide a platform for further research into parallel file
systems on Linux clusters. The second objective is to meet the
growing need for a high-performance parallel file system for such
clusters. PVFS goals are to:

Provide high bandwidth for concurrent read/write
operations from multiple processes to a common file

Support multiple APIs, including a native PVFS API,
the UNIX/POSIX I/O API, as well as MPI-IO (through ROMIO)

Support Common Unix utilities such as
ls,
cp and
rm for PVFS files

Provide a mechanism for applications developed for
the UNIX I/O API to work with PVFS without recompiling

Offer robustness and scalability

Be easy to install and use

PVFS Node Types

One machine, or node, in a cluster, can play a number of
roles in the PVFS system. A node can be thought of as being one or
more of three different types: compute, I/O or management.
Typically, a single node will serve as a management node, while a
subset of the nodes will be compute nodes and another subset will
serve as I/O nodes. It is also possible to use all nodes as both
I/O and compute nodes.

PVFS exists as a set of dæmons and a library of calls
to access the file system. There are two types of dæmons,
management and I/O. Typically, a single-management dæmon runs
on the management node and a number of I/O dæmons run on the
I/O nodes. The library of calls is used by applications running on
compute nodes, or client nodes, in order to communicate with both
the management dæmon and the I/O dæmons.

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.