!---------------------------------------------------------------!
! LSDMap v1.0 - Aug 01 2012 - Initial Release !
! !
! Developed by !
! Wenwei Zheng wz7@rice.edu !
! Mary Rohrdanz mar3@rice.edu !
! Cecilia Clementi cecilia@rice.edu !
! !
! Please reference the papers below if you use LSDMap: !
! 1. Rohrdanz, M.A., Zheng, W., Maggioni, M., and Clementi, C., !
! J. Chem. Phys., 134, 124116, 2011 !
! 2. Zheng, W., Rohrdanz, M.A., Caflisch, A., Dinner, A.R., !
! and Clementi, C., J. Phys. Chem. B, 115, 13065-13074, 2011 !
!---------------------------------------------------------------!
** File description
[LSDMap/]
------------------
LSDMap.inc:
server-specific valuables for makefile
configure.sh:
script to configure the programs on the servers in Rice University
prepare_rmsd_neighbor.sh:
script to preprocess the input files for RMSD and nearest neighbor calculation
prepare_localscale.sh:
script to preprocess the input files for picking local scales
prepare_embed.sh:
script to preprocess the input files for embedding new points to the diffusion map space
[LSDMap/src/]
----------------------
source code and the input example for each program
p_rmsd_neighbor.f90:
parallel program to calculate the RMSD and nearest neighbors
s_rmsd_neighbor.f90:
serial version
p_local_mds.f90:
parallel program to pick the local scales
p_wlsdmap.f90:
parallel weighted LSDMap
s_wlsdmap.f90:
serial version
p_wlsdmap_embed.f90:
parallel program to embed new points to the diffusion map space
split_rmsd.f90:
the program to preprocess RMSD files for program p_wlsdmap_embed.f90
[LSDMap/example/]
--------------------------
aladip.gro.tar.gz:
An example data set of alanine dipeptide from GROMACS.
** Compile the program
-If you are using servers in Rice Univerisity, such as
DAVinCi with openmpi/1.4.4-intel,
BIOU with openmpi/1.4.3-ibm,
and STIC with openmpi/1.3.3-pgi,
follow the steps below:
1. Copy the libraries from toscana:~wz7/git.program/lib to $(HOME)/lib.
2. Load openmpi module with correct version according to the server listed above.
3. Run 'configure.sh'.
-If you are not using the servers listed above, please change LSDMap.inc according
to your compilers and the path of the required libraries, PARPACK, ARPACK, LAPACK and BLAS.
PARPACK and ARPACK can be downloaded in www.caam.rice.edu/software/ARPACK/.
LAPACK and BLAS are included in many packages, such as MKL for intel compilers,
ESSL for IBM compilers and ACML for AMD platform, or you can always download and
compile your own version from www.netlib.org/lapack and www.netlib.org/blas.
** How to proceed Weighted LSDMap
We will show how to proceed LSDMap on the example data set in directory 'example/'.
1. Prepare the data set to the accepted format for LSDMap.
-Decompress the example data from 'example/aladip.gro' to the program directory.
> tar -xvf example/aladip.gro.tar.gz
-Run script gro2xyz.sh to convert the gromacs .gro file to the format for the
LSDMap program, and put the data file in working directory:
> ./gro2xyz.sh aladip.gro 1
The .xyz file format is used in LSDMap programs. The format is:
| first line: <number of points> <number of dimensions> |
| starting from the second line: <the atomic (xyz) coordinates for the first point> |
| <the atomic (xyz) coordinates for the second point> |
| ... |
| <the atomic (xyz) coordinates for the last point> |
2. Set the parameters in file 'run_parameters' according to the comments.
-For the example data, leave all parameters as in default EXCEPT:
number of nodes: nodes=
number of cores per node: ppn=
queue name: queue=
The example data contains 10,000 points, and therefore needs about
10,000^2*12=1.2G memory. Please make sure the memory on the number of CPUs
is enough for the data set. In most of the clusters, each CPU has at least
1.5G memory, so the example data set can be calculated with one single CPU.
3. Calculate the RMSD and nearest neighbor map.
-Run 'prepare_rmsd_neighbor.sh' to prepare the input files.
> ./prepare_rmsd_neighbor.sh
-Submit the pbs script 'rmsd_neighbor.pbs'.
> qsub rmsd_neighbor.pbs
-The nearest neighbor map file is under directory "neighbor/".
-The RMSD files are under directory "rmsd/". The number of the RMSD
files are the same as the number of CPUs set up in 'run_parameters'.
3. Calculate the local scales.
-Run 'prepare_localscale.sh' to prepare the input files.
> ./prepare_localscale.sh
-Submit the pbs script:
> ./submitlist
-Merge the localscale files by running 'merge_localscale_results.sh':
> ./merge_localscale_results.sh
4. Run LSDMap.
-Submit the pbs script 'wlsdmap.pbs' for LSDMap.
> qsub wlsdmap.pbs
5. Results:
<trajectory file name>_dif.eg
(eigenvalues in reverse order, that is, 9th, 8th, 7th, ..., 1st, 0th)
<trajectory file name>_dif.ev
(eigenvectors, the first column corresponds to the 0th eigenvalue,
the second cooresponds to the 1st eigenvalue, ...)
<trajectory file name>_eps
(local scale from different cutoffs, columns are in the order of
<point ID>, <neighbor ID at the local scale with cutoff1>,
<local scale with cutoff1>,
<gap between the first and second eigenvalues with cutoff1>,
<intrinsic dimension from the spectra gap with cutoff1>,
<intrinsic dimension from checking the noise spectra with cutoff1>,
<neighbor ID at the local scale with cutoff2>,
<local scale with cutoff2>, ...)
-If everything works fine, you can view aladip.xyz_dif.eg and get the
following list of eigenvalues:
0.167848 0.178829 0.233711 0.267388 0.341161
0.452993 0.528263 0.697209 0.979835 1.000000
Note: Small difference in the eigenvalue spectra is possible because random
projection multidimensional scaling is used.
** How to embed new points to the diffusion map space
1. Set the parameters in file 'run_parameters' according to the comments
2. Calculate the embedding coordinates.
-Run 'prepare_embed.sh' to prepare the input files.
-Run './split_rmsd < split_rmsd.input' to split rmsd files so as
to match the number of CPUs to use.
-Submit the pbs script 'wlsdmap_embed.pbs'.
3. Results:
<trajectory file name>_dif_embed.ev (eigenvectors for the new points)
embed/<trajectory file name>_dif_<90000+<loop id>>.eg
embed/<trajectory file name>_dif_<90000+<loop id>>.ev
(the eigenvalues and eigenvectors in each loop)
Note:
The number of points to add for one loop is related to the errors
or distortions to the diffusion map space. Usually, the number of
points to add for one loop should be smaller than 0.01 times the
number of points in the old data set. The total number of points
to add should be divisible by the number of points to add for one loop.
** Why is there a serial version
-In most cases, you can use the parallel version serially.
-If you want to perform LSDMap on large number of small data sets,
serial version is faster without initializing MPI.