When you are ready and about to compile, there are several "Make" selection found at "$SOURCE/lammps-30Mar10/src/MAKE". I chose the makefile.openmpi. Be default you do not need to edit the Makefile.openmpi. But if you are a guru and want to edit the file, feel free to

I compiled using make.linux. Quite soon, I encounter the following error fft3d.h(164): catastrophic error: could not open source file "fftw.h"

I have compiled my fftw3 and my Intel Math Kernel library properly and was able to locate the header in my Intel Math Kernel Library. I've correctly "source" the path at LD_LIBRARY_PATH and /etc/ld.so.conf.d. But the LAMMPS is still not able to locate the library.

I realise the cruz of the problem was that LAMPS requires FFTW-2.1.x. configuring and compiling fftw-2.x, the problem went away

Problem eliminated. My compilation is not done yet. But this fftw initial problem is settle for now :)

Monday, April 26, 2010

Taken and modified from the README.BIN for my environment. This deserve hightlight for adminstrators to setup it quickly.

Check that you have the correct versions of the OS, and libraries for your machine, as listed in the website G09 platform list

Select or create a group (e.g. g09) which will own the Gaussian files inside the /etc/group. Users who will run Gaussian should either already be in this group, or should have this added to their list of groups.

Create a Directory to place g09 and gv (For example gaussian). You can do it by using a command

mkdir gaussian

Mount the Gaussian CD using commands like this one

mount /mnt/cdrom

Within the CD, you can copy the gaussian binary contents (E64_930N.TGZ) out into your newly created gaussian directory.

Manual setup of TCP LINDA for Gaussian
To configure for TCP Linda for Gaussian to run Parallel on Nodes, all you need is to tweak the ntsnet and LindaLauncher file found at g09 directory. For TCP Linda to work in Gaussian, just make sure the LINDA_PATH is correct.

GPFS Tuning Parameters is a good wiki information resource written by IBM for GPFS Tuning. Just parroting some of the useful tips I have learned

To view the configuration parameters that has been changed from the default

mmlsconfig

To view the active value of any of these parameters you can run

mmfsadm dump config

To change any of these parameters use mmchconfig. For example to change the pagepool setting on all nodes.

mmchconfig pagepool=256M

1. Consideration to modify the PagePool

A. Sequential I/O
The default pagepool size may be sufficient for sequential IO workloads, however, a recommended value of 256MB is known to work well in many cases. To change the pagepool size

mmchconfig pagepool=256M [-i]

If the file system blocksize is larger than the default (256K), the pagepool size should be scaled accordingly. For example, if 1M blocksize is used, the default 64M pagepool should be increased by 4 times to 256M. This allows the same number of buffers to be cached.

B. Random I/O
The default pagepool size will likely not be sufficient for Random IO or workloads involving a large number of small files. In some cases allocating 4GB, 8GB or more memory can improve workload performance.

mmchconfig pagepool=4000M

C. Random Direct IO
For database applications that use Direct IO, the pagepool is not used for any user data. It's main purpose in this case is for system metadata and caching the indirect blocks of the database files.

D. NSD Server
Assuming no applications or Filesystem Manager services are running on the NSD servers, the pagepool is only used transiently by the NSD worker threads to gather data from client nodes and write the data to disk. The NSD server does not cache any of the data. Each NSD worker just needs one pagepool buffer per operation, and the buffer can be potentially as large as the largest filesystem blocksize that the disks belong to. With the default NSD configuration, there will be 3 NSD worker threads per LUN (nsdThreadsPerDisk) that the node services. So the amount of memory needed in the pagepool will be 3*#LUNS*maxBlockSize. The target amount of space in the pagepool for NSD workers is controlled by nsdBufSpace which defaults to 30%. So the pagepool should be large enough so that 30% of it has enough buffers.

Wednesday, April 21, 2010

If you are using NFS as the shared file system, you may encounter this issue where NFS share on Linux client not immediately visible to other NFS clients. This is due to caching parameters which you must take note of on the NFS Client side. These are

acregmin=n. The minimum time (in seconds) that the NFS client caches attributes of a regular file before it requests fresh attribute information from a server. The default is 3 seconds.

acregmax=n. The maximum time (in seconds) that the NFS client caches attributes of a regular file before it requests fresh attribute information from a server. The default is 60.

acdirmin=n. The minimum time (in seconds) that the NFS client caches attributes of a directory before it requests fresh attribute information from a server. The default is 60

acdirmax=n. The maximum time (in seconds) that the NFS client caches attributes of a directory before it requests fresh attribute information from a server. The default is 60

actimeo=n. When you wish to sets all of acregmin, acregmax, acdirmin, and acdirmax to the same value.

Thursday, April 15, 2010

OpenMP* is a high level, pragma-based approach to parallel application programming. Cluster OpenMP is a simple means of extending OpenMP parallelism to 64-bit Intel® architecture-based clusters. It allows OpenMP code to run on clusters of Intel® Itanium® or Intel® 64 processors, with only slight modifications.

Prerequisite

Cluster OpenMP use requires that you already have the latest version of the Intel® C++ Compiler for Linux* and/or the Intel® Fortran Compiler for Linux*.

Benefits of Cluster OpenMP

Simplifies porting of serial or OpenMP code to clusters.

Requires few source code modifications, which eases debugging.

Allows slightly modified OpenMP code to run on more processors without requiring investment in expensive Symmetric Multiprocessing (SMP) hardware.

Offers an alternative to MPI. Is easier to learn and faster to implement.

Tuesday, April 13, 2010

The Scenario:
I encountered this error while executing an mpirun. Do a "pbsnodes -l" and everything seems is online. I thought my $LD_LIBRARY_PATH was giving the issues. But after some exhaustive check, I've realise that communication to one of our nodes was having issues. Here's are the steps I took to solve the issue

--------------------------------------------------------------------------
A daemon (pid 16704) died unexpectedly with status 127 while attempting to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------

The Error seems like it is due to LD_LIBRARY_PATH, but it may or may not.

Step 1: Check whether it is a LD_LIBRARY_PATH Issue for your head and compute node
First thing first, you should try to check whether you have the pathing of your LD_LIBRARY_PATH is blank or filled with the correct information for your head node and compute node.

Step 3: If the error still remains.....
Modify the hostfilename and insert 1 compute node at a time and compile the mpirun. You should be able to quickly identify that the problem is not $LD_LIBRARY_PATH but a problematic compute node

n01
n02
...

. In my situation, my problem was due to a broken ssh-generated-key and despite my torque showing all nodes as healthy

Sunday, April 11, 2010

The PCI Utilities are a collection of programs for inspecting and manipulating configuration of PCI devices, all based on a common portable library libpci which offers access to the PCI configuration space on a variety of operating systems.

The utilities includes:

lspci

setpci

This utilities are usually installed by default for most Distribution. Definitely a must-installed for CentOS. This handy utility cross check the database and provide us with a more useful name.....

ThinApp
VMware ThinApp virtualizes applications by encapsulating application files and registry into a single ThinApp package that can be deployed, managed and updated independently from the underlying OS.

Some of the key benefits according to NetApp:

Simplify Windows 7 migration:

Eliminate application conflicts

Consolidate application streaming servers:

Reduce desktop storage costs:

Increase mobility for end users:

SpringSource tc Server
SpringSource tc Server provides enterprise users with the lightweight server they want paired with the operational management, advanced diagnostics, and mission-critical support capabilities businesses need. It is designed to be a drop in replacement for Apache Tomcat 6, ensuring a seamless migration path for existing custom-built and commercial software applications already certified for Tomcat. One interesting feature is that the DownloadsSpringSource Tool Suite is Free.

Wednesday, April 7, 2010

pbs_mom;Svr;pbs_mom;LOG_ERROR:: Address already in use (98) in scan_for_exiting, cannot bind to port 464 in client_to_svr - too many retries

One cause for this is very high traffic on the network not allowing the mom and the server to communicate properly. One common case are job scripts that incessantly run qstat. You will be surprise that sometimes users input some of these qstat scripts that cause the error

Thursday, April 1, 2010

If you are installing GROMACS using the Installation Instructions from Gromacs and encounter " can't find fftw3f library ", this is probably due to wrong precision being used. Try reconfiguring FFTW with the following settings "--enable-float"