My scripts for crystallographic data manipulation

Except for the setup script, these scripts are found locally
in /software/misc/scripts and some run
FORTRAN programs that are found in /software/misc/.
The FORTRAN programs are also freely available. As far as I know, all of these
scripts work as I meant them to work. Though
there are no guarantees, if you find bugs/problems, if you want a new program,
or have suggestions for a better way, then please let me know !

These programs are "free." You may do with them as you please, but please
let me know if you find bugs or have questions about the use of them.

Setup script for all crystallographic software

These scripts are used for setting environment variables (e.g. $PATH) and
aliases for various software packages. They are designed to make it more user friendly in that your PATH will
not increase in length if you repeatedly set up the environment over and over again.

to remove all instances of the "progname" directory from the PATH environment variable .

Miscellaneous useful scripts

list_symm.py
Python script that lists the symmetry operators for any (or all) space groups
in both (x,y,z) format and in matrix form (rotation first, then translation).
This requires that you have the cctbx Computational
Crystallography Toolbox installed. E.g.:

Multiple codes can be listed on the command line to download multiple files at once. For example

pdb_get.py -s 1f83 1f82

Compressed files are uncompressed using gunzip.

stats.py
Python library of useful statistical calculations. Can be used to calculate some basic numbers on any file read in
via stdin or as a filename on the command line. It splits columns of data into separate data sets and calculates
mean, stdev, median,
max, and min. Other functions include:
avg_dev (average deviation), var (variance), skew,
kurtosis, mode, histogram, lsq (least-squares fit)

window_avg.py
Python script for the calculation of a moving-window average of data from an input file (expects the file to
have two columns of data, i.e. X and Y values). Typically used to provide a smoothed
plot of some feature versus residue number for a protein sequence.
Options include:

-w # or --window=# to
specify a window of size # (default 7)

Sequence manipulations and analysis

seq_convert.py -
Python script to convert amino acid sequence
from 1-letter to 3-letter code and vice versa. Can also read SEQRES records
or determine the sequence from the coordinates (using MyPDB.py) in PDB files.
Try seq_convert.py --help for instructions.

seq_pattern.py -
Python script to search for repetitive patterns in sequences. Input sequence is expected to be in single letter code.
Lines beginning with '>', such as in PIR/FASTA format files, are ignored. End-of-line numbers and spaces
are also ignored.
Patterns are entered as regular expressions, e.g.:

strip_mult.py
Python script built on Biopython to strip multiple conformations
out of one file and write out a file with either the "A" or "B" conformations.
Useful when using a structure for molecular dynamics with, e.g. GROMACS, or other analyses.

strip_mult
Awk script to strip multiple conformations
out of one file and write to two separate files for rebuilding with
O. The original "chain_id" is maintained to enable working with
multi-subunit (or multiple molecules in the asymmetric unit) structures.

re_mult
- Restore multiple conformations to one
PDB file for refinement with PROTIN/REFMAC from your two PDB files
that were used for rebuilding in O.

scale2xplor
- convert scalepack merged I and sig(I) to F and sig(F) for X-plor usage. Option to
convert negative I's to -sqrt(|I|).

scalepack_cell
- strip out last refined unit cell value from scalepack log file for use in refinement scripts.

Structure analysis and format conversion

ddm_strip
- strip out difference distance matrix output from my FORTRAN programddm to do a
scatter plot with gnuplot (faster than a true contour plot). An
alternative to allow better control of the plot (ranges etc.) and to
provide a cleaner-looking output is to use the gnuplot contouring
option (with the splot command), write the contours out to a file ("set
out table") and replot that file (editing if necessary to split the
different contour levels into separate files).

helix_angles.py
- calculate the angle and distance between the vectors defining two helices.

newchain
- for use with the output of
XPAND
with the -E option: increments the chain id for each symmetry operator in the
output, thus removing the redundant naming of atoms. This allows for easier
identification of atoms using O and makes measurement of distances possible. It
won't work properly if you have multiple chains to start with.

strip_hyd
- remove hydrogens from PDB file (MOLEMAN2 may be more reliable, though slightly slower).

msms_complementarity.py
- Python script that calculates surface complementarity as in the Sc program from the
CCP4 (Collaborative Computational Package 4 -- Protein Crystallography) program package.
msms_similarity.py
- Python script that calculates surface similarity as in the Sc program from the
CCP4 (Collaborative Computational Package 4 -- Protein Crystallography) program package.
For both of the above, simply run the programs with two input file names on the command line:

./msms_complementarity.py mol1 mol2

where mol1 and mol2 are file name roots for the coordinate and vertices files.
In other words, after running MSMS
with the -of flag, you would have the files:

mol1.xyzrn
mol1.vert
mol1.face
mol2.xyzrn
mol2.vert
mol2.face

The .xyzrn and .vert files are the input to the msms_complementarity.py or
msms_similarity.py program.
The .face files are not used here.

The output .vert file from msms_similarity.py and msms_complementariy.py can be used in the
msms_sim_draw.py script
to draw the surface in PyMOL coloured according to the Sc value at each vertex.

You first need to convert PDB-format files to the xyzrn format using my
pdb_to_xyzrn.py script.

pdb_to_xyzrn.py
Conversion script for convert PDB-format files to
xyzrn-format files for use by MSMS.