The format is based on a context-free grammar. PDBx/mmCIF has a simple grammar. Data are
presented in either key-value or tabular form. It is much easier to parse than the record-oriented PDB format. Say good-bye to "exception" handling when reading old-style PDB flat files!

There are no column width limitations.

All relationships between common data items (e.g. atom and residue identifiers) are explicitly
documented within the PDBx Exchange Dictionary. This permits software applications to
evaluate and validate referential integrity with any PDB entry.

The mmCIF/PDBx Exchange Dictionary provides metadata (e.g. data types, allowed ranges,
controlled vocabularies) which can be used to generate a validating mmCIF parser or a database
loader.

Parsing tools are available in most popular languages (e.g. C/C++, Java, Python, Perl, FORTRAN)
and toolkits (e.g. BioJava and BioPython).

Mapping information between the residue sequences of the experimental sample and the model
coordinates is included within each entry.

The following examples show the ATOM records from the current PDB format and an example from the proposed stylized
PDBx/mmCIF format. In the PDBx/mmCIF example the order of columns places the chain, residue and atom nomencature items
in the left-most columns. Data items that depend on the experimental method (e.g. occupancy, B-value ) are placed in
columns to the left. All of the items of the atom record in the PDBx/mmCIF format example are placed on a single
text line and are white-space delimited.

The PDBx/mmCIF format files are named following the convention <PDB_4-LETTER-ID_CODE>.cif.gz (e.g. 1abc.cif.gz).
Experimental data files containing X-ray structure factors are only distributed in PDBx/mmCIF format and are named following an
older PDB naming convention r<PDB_ID_CODE>sf.ent.gz (e.g. r1abcsf.ent.gz).

A complete description of the download options for PDB data files is maintained at here by the wwPDB.
The special handling of PDB entries containing very large structures is available here.

The PDBx/mmCIF format has a simple appearance with only a few syntax elements. All of syntax elements
used in PDBx data files are shown in the following snippet describing polymer sequence.

The essential syntax features include:

All data items are identified by name and begin with the underscore character, _entity_poly.entity_id.

Data item names can be decomposed into a category name and an attribute name, _category.attribute which
are separated by a period.

Data categories are presented in two styles: key-value and tabular. In the example, categories entity_name_com and
entity_poly both use the key-value style and the entity_poly_seq category uses the tabular style. In the tabular sytle, the data
item names correpsonding to the table columns follow a reserved loop_ token which are followed by the rows
of data rows of white-space delimited data values.

Any character data value may be quoted using encapsulating single or double quotes; however, character values containing
internal whitespace (e.g. the value of _entity_name_com.name) must be quoted. Character values that extend over
multiple lines are quoted using leading and trailing semi-colons positioned at the first character position of the
records surronding the multi-line character value (e.g._entity_poly.pdbx_seq_one_letter_code).

Lines beginning with the hash symbol # are comments.

Look here for a more complete description of PDBx/mmCIF data file and
dictionary syntax.

Yes, the atom coordindate records in the PDBx/mmCIF data distributed by the wwPDB are stored on
individual lines each beginning with either 'ATOM' or 'HETATM'. The elements of each coordinate record
are white-space delimited. For example, PDBx/mmCIF coordinate records in PDB entries all have
the following regular layout.

Coordinate data is recorded in PDBx/mmCIF ATOM_SITE data category.
This brief tutorial describes the PDBx/mmCIF representation of
coordinated data and the relationship to PDB format coodinate data items.