PVFS: A Parallel Virtual File System for Linux Clusters

An introduction to the Parallel Virtual File System and a look at how one company installed and tested it.

Management and I/O Dæmons

Management dæmons, or managers, have two
responsibilities: validating permission to access files and
maintaining metadata on PVFS files. All of these tasks revolve
around the access of metadata files. Only one management
dæmon is needed to perform these operations for a file system
and a single-management dæmon can manage multiple file
systems. The manager is also responsible for maintaining the file
system directory hierarchy. Applications running on compute nodes
communicate with the manager when performing activities such as
listing directory contents, opening files and removing
files.

On the other hand, I/O dæmons serve the single purpose
of accessing PVFS file data and correlating data transfer between
themselves and applications. Direct connections are established
between applications and I/O servers in order to directly exchange
data during read and write operations.

Accesses to Client Nodes

There are several options for providing PVFS access to the
client nodes. First, there is a shared, or static, library that can
be used to interact with the file system using its native
interface. This requires writing applications specifically to use
functions such as pvfs_open,
however. As an alternative, there are two access methods that
provide transparent access. The preferred method is to use the PVFS
kernel module, which allows full access through the Linux VFS
mechanism. This loadable module allows the user to mount PVFS just
like any other traditional file system. Another option is to use a
set of C library wrappers that are provided with PVFS. These
wrappers directly trap calls to functions such as open and close
before they reach the kernel level. This provides higher
performance but with disadvantages in that the compatibility is
incomplete, and the wrappers work only with certain supported
versions of glibc.

A final option is to use the MPI-IO interface, which is part
of the MPI-2 standard for message passing in parallel applications.
The MPI-IO interface for PVFS is provided through the ROMIO MPI-IO
implementation (see Resources) and allows MPI applications to take
advantage of the features of MPI-IO when accessing PVFS. It also
ensures that the MPI code will be compatible with other
ROMIO-supported parallel file systems.

Installation Environment

The test system at Ericsson Montréal started as a
cluster of seven diskless Pentium grade CPUs with 256MB of RAM
each. These CPUs first boot using a minimal kernel written on flash
using a tool provided by the manufacturer. They then they get their
IP address and download a RAM disk from a Linux box acting as both
a DHCP and a TFTP server. This same machine also acts as an NFS
server for the CPUs, providing a shared disk space.

When we decided to experiment with PVFS, we needed some PCs
with disks to act as I/O nodes and one PC to be the management
node. We added one machine, PC1, to be the management node and
three machines, PC2, PC3 and PC4, with a total disk space of 35GB,
to be the I/O nodes. The new map of the cluster became:

Seven Diskless Client CPUs

One Management Node

Three I/O Nodes

Installation Steps

While PVFS developers provide RPMs for all types of nodes, we
chose to recompile the source in order to optimize installation on
the diskless clients. This went over without a hitch using the PVFS
tarball package. For the manager and I/O nodes, we used the
relevant RPM packages. The manager and I/O nodes are using the Red
Hat 6.2 distribution and the 2.2.14-5.0 kernel. The diskless CPUs
run a customized minimal version of the 2.2.14-5.0 kernel.

Setting up the Manager

The first step towards setting up the PVFS manager is to
download the PVFS manager RPM package and install it. PVFS will be
installed by default under /usr/pvfs. Once the automatic
installation is done, it is necessary to create the configuration
files. PVFS requires two configuration files in order to operate:
“pvfsdir”, which describes the directory to PVFS and “iodtab”,
which describes the location of I/O dæmons. These files are
created by running the mkiodtab
script (as root):

[root@pc1 /root]# /usr/pvfs/bin/mkiodtab

See Listing 1 for the iodtab
setup for the Parallel Virtual File System. It will also make the
.pvfsdir file in the root directory.

When we ran mkiodtab on the manager, PC1, it complained that
it did not find the I/O nodes. It turned out to be that we had
forgotten to include entries of my I/O nodes in /etc/hosts. We
updated the /etc/hosts file and reran mkiodtab; everything went
okay. mkiodtab created a file called “iodtab” under /pvfs. This
file contained the list of my I/O nodes. It looked like the
following:

Running enablemgr on the
management node ensures that the next time the machine is booted
the dæmons will be automatically started, so that it doesn't
need to be started manually after rebooting. The enablemgr command
only needs to be run once to set up the appropriate links.

Geek Guides

Pick up any e-commerce web or mobile app today, and you’ll be holding a mashup of interconnected applications and services from a variety of different providers. For instance, when you connect to Amazon’s e-commerce app, cookies, tags and pixels that are monitored by solutions like Exact Target, BazaarVoice, Bing, Shopzilla, Liveramp and Google Tag Manager track every action you take. You’re presented with special offers and coupons based on your viewing and buying patterns. If you find something you want for your birthday, a third party manages your wish list, which you can share through multiple social- media outlets or email to a friend. When you select something to buy, you find yourself presented with similar items as kind suggestions. And when you finally check out, you’re offered the ability to pay with promo codes, gifts cards, PayPal or a variety of credit cards.