IOTA: integration optimization, triage and analysis

IOTA is a user-friendly front end for the cctbx.xfeland DIALS suites of serial diffraction data processing programs. It is comprised of three main modules:

Raw image import, conversion, pre-processing and triage

Image indexing and integration using cctbx.xfel (with optimization of spot-finding parameters) or DIALS (this is currently in the process of being adapted for diffraction stills)

Analysis of the integrated dataset

Please note that IOTA is a front-end for (currently) two pieces of data processing software: cctbx.xfel and DIALS. Therefore, the preferred construction for citation should be something like "diffraction data were processed with IOTA [1] using data reduction algorithms implemented in cctbx.xfel [2] (or DIALS [3])".

Running IOTA: GUI Mode

in your command line. First you will see the main input screen, which will allow you to enter basic information, such as the input and output folders (the current folder is automatically designated as output, but can be changed). This version, therefore, will only accept a single path (which must be a folder) as input. A folder with image-containing subfolders will also work as input.

Currently, two multiprocessing modes are available (by clicking on the "Preferences" toolbar button): 'multiprocessing' refers to merely using multiple cores on your local machine, while "lsf" will allow you to submit jobs to an LSF queue. The queues can be selected from a drop-down list or, if not found on the list, a queue name can be supplied by user.

The main screen also contains three buttons that open dialogs for image import options, processing options (this varies depending on backend choice) and analysis options. The image import options dialog allows the user to turn on/off image triage (i.e. image rejection based on whether sufficient Bragg spots are found), override beam XY and detector Z coordinates, threshold out the beamstop shadow, etc. The processing options dialog allows the user to generate a default target (PHIL) file for cctbx.xfel or read in an existing one, and modify the settings manually in a text window. Furthermore, you can modify spot-finding grid search options and integration result filter options. The analysis dialog allows you to output various charts summarizing IOTA output as well as individual image integration results.

Once IOTA is running, a run-time processing window will appear with two tabs: a Log tab that will display iota.log as it is updated in real time, and the Charts tab, which will display several useful graphs: of resolution vs. frame, number of strong (I / sigI > threshold) spots per frame, a pie chart breaking down indexing / integration success for the full dataset, and a beam XY chart, which can be used to monitor the ongoing processing run for indexing abnormalities (detected as concentric striation patterns of the refined beam XY coordinates for the processed images). The processing window will also allow the user to turn on the "Monitor Mode", in which IOTA will continuously check if any new diffraction images have been added to the input folder (or subfolders therein); this is a useful mode to use when running IOTA concurrently with data collection. A "time-out" period can be set, at the end of which IOTA will finish the run if no new images have been found. (If no time-out period has been set, un-toggling the Monitor button will cause the run to finish.)

When the run finishes, a new Analysis tab will appear in the processing window. There, the pertinent summary of the run would be displayed, along with buttons that will display several useful charts: a heatmap of the spot-finding results (if the LABELIT backend was used), resolution histograms and beam XYZ charts. The user can also choose to run PRIME from this window, in which case the PRIME GUI will launch with the parameters pertinent to this run filled in (e.g. input / output folders, resolution limits, pixel size, unit cell, etc.)

Running IOTA: Auto Mode

The simplest way to run IOTA is in Auto Mode. To do so, simply issue:

iota.run /path/to/image/files/

The path may contain a tree of folders in any configuration. IOTA will then carry out a conversion step if the source folder contains raw diffraction images. The converted image pickles will be saved in the current folder under the subfolder "converted_pickles". Inside that folder, converted pickles will be saved separately for each IOTA run, under subfolders named "001", "002", "003", etc. Alternatively, once raw images have been successfully converted to image pickles, IOTA can be pointed to the image pickles instead, e.g.:

iota.run ./converted_pickles/001/

Alternatively, if a text file with a list of images exists, IOTA can accept that file as input (IOTA creates the input list automatically and saves it under ./integration/###/input_images.lst):

iota.run ./integration/001/input_images.lst

Once running, IOTA will display a program logo, some information about the configuration of the run and a progress bar for each major step, e.g.:

IOTA will automatically create two script files: iota.param (which contains settings for running IOTA) and target.phil (a cctbx.xfel target file), which can be modified by a user to fine-tune various settings. The output will be collected in the folder named "integration", which will contain subfolders for each integration run, titled "001", "002", "003", etc. Each run generates a folder named "final" with the final integrated pickles as well as individual cctbx.xfel logs for each image. Furthermore, lists of files that have been successfully integrated (integrated.lst), failed integration (not_integrated.lst), etc. can be found there. Finally, a pre-populated script for PRIME (prime.phil) can be found there as well. (Currently, the user must manually edit prime.phil to specify the number of residues - "n_residues" - in order to run PRIME successfully.)

Running IOTA: Target Files

IOTA itself is a front-end to the data processing programs cctbx.xfel and DIALS. These programs require their own set of parameters, distinct from IOTA parameters, which are located in so-called "target" files: text files containing parameters encoded in Python-based hierarchical interchange language or PHIL. When run in AUTO mode, IOTA generates an appropriate target file for cctbx.xfel or DIALS using defaults deemed reasonable for most serial crystallography projects. (NOTE: since the DIALS stills indexer remains a work in progress, those defaults may not work very well.) These default target files can also serve as a starting point for the user to modify those settings as they see fit. The user has the option to provide their own target file (perhaps generated during a previous data processing attempt). The user can edit the IOTA settings to specify the target file

Perhaps the most useful of these are -r and -n options, as they allow the user to adjust an IOTA run in Auto-mode on the fly. Alternatively, both of these settings can be changed within the script file.

All of the options in the script can be introduced as command-line statements by using a "compressed" PHIL format. Thus:

cctbx {
grid_search {
type = None *brute_force smart
}
}

translates into

iota.run script.param cctbx.grid_search.type=brute_force

IOTA Output

Due to IOTA's flexibility, there are several types of output that co-exist simultaneously and can be somewhat disconnected from one another. It helps to think of them as three separate stages of the process: pre-processing, grid-search / integration, and post-processing / analysis.

In pre-processing, raw images are read in and converted to Python pickles. These are saved under the "converted_pickles" folder in the format <prefix>_<run_no>_<#####>.pickle; each cycle of pre-processing is assigned a run number (e.g. "001", "002", "003", etc.). Pre-processing is only triggered if a) the read-in image is not already pickled or b) the image has to be modified in some way (e.g. override beamXY coordinates, change detector distance, etc.). Thus, if converted and modified pickles are submitted to IOTA, the "converted_pickles" folder will not be created. The purpose of this is to allow the user to experiment with image modification, then subsequently select the converted pickles that best fit the user's needs.

The output of the other two steps (grid-search / integration and post-processing / analysis) can be found under the "integration" folder. The grid-search results are saved to the "integration/###/image_objects" folder in the format <filename>.int. These are pickled dictionaries which contain all the information about the individual images (without the pixel values or integrated intensities), such as raw image filename, converted pickle filename, the details of the grid search, etc. These can be used for some of the advanced operations, such as experimentation with the selection process without repeating the grid search.

The integrated pickles are collected under "integration/###/final" folder, in the format int_<filename>.pickle. Only successfully integrated images are saved this way. For each of the input images, however, a log of cctbx.xfel or DIALS output is saved in the same folder, in the format <filename>.log. This log documents each integration attempt from the grid-search with the final integration attempt at the end (for cctbx.xfel) or the linear indexing/integration output (for DIALS) and can be used for troubleshooting.

If the user chooses to output any charts (e.g. grid-search heatmap, beam center plot, image visualization, etc.), these will be found under "integration/###/visualization" folder.

Finally, the "integration/###" folder itself contains text files with lists of images, e.g. input images, all integrated images, all images that failed integration, major clusters from the unit cell-clustering module, etc. The main logfile (iota.log) is also found here, as is the default input file for PRIME (prime.phil).