Contents

Installing a cctbx.xfel snapshot from March 28, 2013

The thermolysin data for the cctbx.xfel paper was processed around March 28, 2013. Unfortunately, regular nightly releases are not available for that time, but a custom source-code bundle has been prepared. This bundle differs from the regular cctbx bundles in that LABELIT is included, and that the directory layout is identical to that of a developer installation. To download and unpack the bundle in the current directory:

Next, create a build directory (called phenix-build-20130328 below, but that is an arbitrary choice), in which the sources are configured and compiled. The Python interpreter required to complete this step must be the one supplied by the PSDM Software Distribution. Once the PSDM software has been set up, this interpreter can be located using

where /path/to/test/release and my_ana_pkg are the path to the test release and the name of the analysis package chosen while setting up the PSDM software distribution, respectively. /path/to denotes the path to the directory containing the unpacked cctbx.xfel sources. The last step compiles the cctbx.xfel analysis modules.

Creating a dark image

To meaningfully process the thermolysin diffraction data, an average of all the images in a dark run—a run without any X-rays impinging on the detector—must be subtracted from the individual diffraction images. The configuration file below can be used to produce such an average, as well as an image of the standard deviation of all the pixels over the course of the run.

/path/to refers to the directory containing the unpacked cctbx.xfel sources. All other options with values set in italics can be modified without adversely affecting averaging. The above file will use four simultaneous processes, and write the average and standard deviation images to files whose names start with Ds1-avg and Ds1-stddev, respectively, both in a directory called r0031.

The data deposited at the CXIDB contains a dark run, r0031. To average the images in that run, save the above configuration file to disk, e.g.L498-dark.cfg, apply modifications as necessary, and execute

$ cxi.pyana -c L498-dark.cfg/path/to/xtc/files/e157-r0031-*.xtc

where /path/to/xtc/files is the path to the directory containing the raw XTC files downloaded from CXIDB. The files written to the r0031 directory will have a current date stamp appended, which can safely be removed to simplify subsequent configuration files.

The configuration file above instructs mod_hitfind to use 48 processes. It disables all image output, which reduces the amount of disk space required to perform the analysis to about 3.4 GiB. /path/to again refers to the directory containing the unpacked cctbx.xfel sources, and dark_path as well as dark_stddev may have to be changed to reflect the location of the previously generated dark images. Integration results will be written to the directory integration-first-lattice.

Due to particularities of the thermolysin measurement, processing needs to proceed in two batches. To analyze the first batch, runs 16 through 27, save the above configuration file to disk, e.g.L498-indexigrate.cfg, apply modifications as necessary, and execute

where /path/to/xtc/files is the path to the directory containing the raw XTC files. The second batch, runs 71 through 73, was recorded using a different distance between the interaction region and the detector. Whilst the changes to the detector position are automatically handled by cctbx.xfel, the resulting difference in shadowing is not. Different areas of the detector should be ignored at the different distances, and this is accounted for by the value of the xtal_target option in the configuration file. To analyze this set of runs, edit L498-indexigrate.cfg, change thermolysin27 to thermolysin73, and

On successful completion, the number of files in the integration-first-lattice directory corresponds to the number of successfully integrated images. Owing to variations in hardware and compiler internals, it may deviate slightly from 11,583, the number reported in the cctbx.xfel paper<ref name="Hattne:2014"/>. Further details are available on the indexing and integration page of the tutorials.

Indexing the secondary lattice

Indexing the secondary lattice is very similar to indexing the primary lattice, but requires a change to the source code. Edit /path/to/phenix-src-20130328/labelit_regression/xfel/xfel_targets.py, and uncomment (i.e. remove the leading # character) "outlier_detection.switch=True" on line 25. Then edit the configuration file, L498-indexigrate.cfg above, and change integration-first-lattice to integration-second-lattice in order not to overwrite the results of the previous analysis of the primary lattice. Before reanalyzing the first batch, ensure that xtal_target is set to thermolysin27.

integration-first-lattice and integration-second-lattice may need to be adjusted to point to the directories where the indexing and integration step left its results. db_name, db_user, and db_passwd must be substituted with the database name and access credentials to a MySQL database. Databases on hosts other than the one used to merge the thermolysin data can be accessed by additionally specifying the mysql.host and mysql.port options. The model and structure factors for the scaling reference, model and scaling.mtz_file above, are both available for download from the RCSB Protein Data Bank (PDB ID 2tli). If the PHENIX suite is installed, these are conveniently obtained at the command line using

$ phenix.fetch_pdb --mtz 2tli

To merge the thermolysin data, save the suitably modified configuration file to e.g.L498-merge.phil, and run

$ cxi.merge L498-merge.phil
$ cxi.xmerge L498-merge.phil

Merging statistics are printed on standard output. The merged MTZ-file is written to a file whose name is determined by the value of output.prefix in the configuration file (with the values shown above, the output file would be L498_thermolysin.mtz). Note that the version of merging programs from 28 March, 2013 do not not report the Rsplit statistic.