Missing Number Flags in MTZ Files: Users Guide

Purpose

1) Remove the Possibility of using a missing datum in some calculation.

2) To help establish the principle that reflection data files should
contain entries for each possible set of hkl in the asymmetric unit,
to a specified resolution limit.

3) To use this information when assigning FreeR sets, and when calculating
electron density maps.

Old and New Style Missing Data Checks

In the next version of the CCP4 Suite (3.0) the concept of missing number flags
within MTZ files will be introduced.

A missing number flag (MNF) will indicate that a certain datum for an
HKL record has not been measured or calculated. This means that any datum
within an MTZ can be tested to see if it should be used. Previously files did
not contain this information for all data types. There was only one way of
distinguishing an unmeasured reflection and that was by checking if its
standard error SIGF or SIGI was zero. It was possible to test if an
experimentally determined phase had been calculated by checking if its FOM was
zero.

Example 1)
A list of reflection data from one crystal would contain only
those values of hkl which had an observation associated with them.
E.g. If you had a blind region up the C* axis for a P212121 data set:
your file might begin
0 0 8 F SIGF
0 0 10 F SIGF
.....
from which you could deduce that there was no measurement made for
reflections 0 0 2, 0 0 4, and 0 0 6
Example 2)
For an MTZ file containing native and derivative data sets.
an old style MTZ file could look like this:
H K L FO SIGFO FPH SIGFPH FOM PHIB FreeRflag
0 0 6 0 0 40 4 0.00 0.0 9.0
0 0 8 10 1 0 0 0.0 0.0 1.0
0 0 10 75 2 80 5 0.25 45.0 9.0
...........
From this you could deduce that there was no measurement made for 0 0 8
for the derivative FPH, no measurement of FP for 0 0 6, no measurement
for either FP or FPH for 0 0 2 or 0 0 4. and therefore no phase or FOM
could be calculated for these reflections.

It was possible to mis-use these old-style files by assigning F without
assigning SIGF or PHI without FOM. Many people have done "difference" maps
where some "differences" were between observed and unobserved data, and
where phases were taken as 0.00 when in fact no phase had been determined.

In the new style MTZ file the missing number flag (MNF) will indicate
a datum is not present.

All CCP4 programs that use MTZ files have been changed in order to deal with
MNFs. No program will use a datum flagged with a MNF. The functionality in
most cases has remained the same and also to ensure backwards compatibility the
old style checks on SIGF and FOM have still been kept where appropriate.

It is strongly advised that you should change existing MTZ file to the new
style MTZ format. This is possible through MTZMNF, a new program in the Suite .
It uses the old protocols for checking for missing data and then
replaces them with MNF.
(see mtzmnf.doc)

Existing programs which output data, such as MLPHARE, will now output MNF for
undetermined phase and FOMs. Therefore, if the input file to MLPHARE is an old
style MTZ file, the output will be a hybrid of both old and new. This is
undesirable.

A Complete Reflection List and Assigning FreeRflags.

FreeR information is becoming a requirement for any structural report.
It is also used actively in some programs; eg DM, REFMAC, and soon SIGMAA.
To use these correctly, they should be assigned at the earliest opportunity
and used consistently throughout the structure determination. In particular
if you subsequently collect another data set for a structure, either an
extended native set, or a mutant which crystallises in the same spacegroup,
the same FreeR asignments need to be preserved, otherwise the freeR statistics
are to some extent invalididated.
To do this, it is sensible to first generate all possible hkl to a given
resolution, to assign FreeRflags to this set, then merge this master list with
the observed data sets. There will be example scripts of the procedure to
follow, both for new data sets and for those which already have a FreeR
assigned.

Restoring Missing Data in the Calculation of Maps.

The current practise when calculating "nFo - (n-1)FC" electron density maps
(eg: Fo, 2Fo-Fc, 3Fo-2Fc or the SIGMAA or REFMAC style 2mFo-DFc maps) is to
leave out any term where Fo is unmeasured. Effectively, you are saying that
the contribution from that structure factor is zero. Obviously, this is not
correct and errors will be introduced into the map as a consequence of this
assumption. See Kevin's
Book of Fourier for a duck's a view of the problem.

There is now an option in FFT that will allow
you to substitute Fc for "nFo - (n-1)Fc" as the Fourier term for all missing
values of Fo
(see fft.doc).
This invokes the assumption that the most likely value for Fo is
Fc. REFMAC ( and soon SIGMAA) will generate a term DFc to substute for
2mFo-DFc. This reduces the distortion caused by missing slabs of data.
Although the noise in the map will diminish, it is possible that the systematic
error (model bias) may increase. Note, that this substitution is not needed
for difference maps (Fo-Fc) where the assumption Fo~Fc will generate a zero
difference.