There is now a
Web-Based version of TALOS+ which can be used directly without
installing NMRPipe. A Java version viewer (JRAMA+) is also available to
display TALOS+ results without installing NMRPipe.
You can access this Web-based system, along with other facilities for
manipulating chemical shifts, dipolar couplings, and molecular
structures at the Bax Group NMR Server site:

TALOS+ is a hybrid system for empirical prediction of protein phi and
psi backbone torsion angles using a combination of six kinds
(HN, HA, CA, CB, CO, N) of chemical shift assignments for a given
residue sequence.

TALOS+ is an enhanced version of the earlier TALOS system which
improves upon the original TALOS database mining approach by
including a neural network classification scheme, as well as
a larger database of 200 proteins.
This improved approach allows TALOS+ to make a larger number of
useful backbone angle predictions, 88% of residues in a given
protein on average.

The original TALOS approach is an extension of the well-known observation
that many kinds of secondary chemical shifts (i.e. differences between
chemical shifts and their corresponding random coil values) are highly
correlated with aspects of protein secondary structure.
The goal of TALOS+ is to use secondary shift and sequence
information in order to make quantitative predictions for the
protein backbone angles phi and psi, and to provide a measure
of the uncertainties in these predictions.
In the original TALOS approach, we search a high-resolution structural
database for the 10 best matches to the secondary chemical shifts
of given residue in a target protein along with its two flanking
neighbors (a residue triplet).
If there is a consensus of phi and psi angles among the 10 best
database matches, then we use these database triplet structures
to form a prediction for the backbone angles of the target residue.

The TALOS+ approach adds an artificial neural network (ANN) classification
scheme to this database mining approach. The neural network analyzes
the chemical shifts and sequence to estimate the likelihood of a given residue
being in a sheet, helix, or loop conformation. This ANN classification
information is combined with the database mining results to
increase the number of residues where useful backbone angle
predictions can be made.

In addition, TALOS+ also offers several new
features compared to the original TALOS program:

In order to expand the program's ability to predict backbone torsion
angles, TALOS+ now also considers the frequently encountered cases where
residue assignments are lacking. Although the fraction of such residues for
which unambiguous predictions can be made tends to be significantly lower, the
reliability of such predictions remains high.

For convenience, and in order to prevent assignment of backbone torsion
angles to regions that are dynamically disordered, TALOS+ also reports an
estimated backbone order parameter S2 derived from the chemical
shifts in a way recently described by Berjanskii and Wishart (J. Am. Chem.
Soc. 127: 14970-14971).

TALOS+ also provides ANN-predicted secondary structure information from
the chemical shifts, with about 89% prediction accuracy.

As with TALOS, the reliability of the TALOS+
approach was tested by a cross-validation "leave-one-out" procedure where each
protein was removed from the database, and its phi and psi angles were predicted
using the remaining protein data. For the purposes of testing, a prediction was
considered "Good" if it fell in the same well-populated region of the Ramachandran map as the phi and psi values from the crystal structure.
Conversely, a prediction was considered "Bad" or incorrect if it
greatly deviated from the observed phi or psi angles from the crystal structure
(see definition here). According to the tests:

TALOS+ makes consistent predictions for, on
average, for about 88% of the residues.

(IMPORTANT!) Over all 200 database proteins,
about 2.5% of the unambiguous predictions made by TALOS+ were incorrect relative
to the corresponding crystal structure. However, a substantial fraction of
this 2.5% appears to reflect genuine differences relative to the crystalline
state, and the true error rate therefore is believed to be below 2.5%.

On average, the uncertainty as reported by
TALOS+ for the consensus predictions was 12.6 degrees for phi, and 12.3
degrees for psi.

The actual RMSD of the "correct" predictions relative to the crystal
structures was about 13.5 degrees for phi, and 12.9 degrees for psi.

As noted in (2) above, it must be remembered that TALOS+ will produce a small
number of predictions which seem to be valid (because the best matches from the
database are consistent) but which are nevertheless in error.

It should also be noted that the tests above included only the most
well-defined parts of each protein; roughly 6% of the residues had first been
removed because they had high B factors (exceeding 1.5 times the average
B-factor for that protein) in the crystal structure or (for the original 78
TALOS proteins) because
they were known to be highly mobile in solution . Evaluation of the results indicates that
many of the "erroneous" predictions occur outside of regions of secondary
structure, where the X-ray and solution structures may actually differ from one
another, as evidenced by large differences between X-ray structures when
multiple such structures are available for the same protein. Therefore, the accuracy of TALOS+ will vary from protein to protein,
and tends to be lower for proteins with large flexible regions. A partial remedy
is to increase the S2 threshold for "dynamic" residues to 0.65, but this will
decrease the number of consensus predictions made.

The TALOS+ core database search system is implemented in the C++ language,
and includes a graphical interface to inspect the prediction results.
The graphical interface, called RAMA+, is implemented in the TCL/TK
scripting language via the NMRPipe TCL interpreper called nmrWish.

The TALOS+ files are installed into a talosplus subdirectory
of an NMRPipe installation. The NMRPipe initialization commands will
establish an environment variable TALOSP_DIR which will
give the full path to the talosplus directory.

Both of these scripts can be invoked
with the -help command-line
argument to generate a complete list of options. For backward-compatibility,
the script names: talos+ talos+.tcl talos.tcl can all used to
run TALOS+, and the script names: rama+ rama+.tcl rama.tcl
can all be used to run RAMA+.

Other files of the TALOS+ system include:

talosplus/demo
A directory with example chemical shift input data and scripts
for a demo of TALOS+.

talosplus/tab/talos.tab
The compiled database of residue triplets with their corresponding secondary
shifts and PHI/PSI values.

talosplus/tab/randcoil.tab
The table of random coil shifts used in the prediction process.

talosplus/tab/homology.tab
The residue type homology factors used in the prediction process.

talosplus/tab/weight.tab
The weighting factors of the 18 secondary shifts used in the prediction
process.

talosplus/tab/*level*.tab
The weighting factors and biases of the neural network used in the prediction
process.

Create a directory for the prediction session; all subsequent commands
will be executed from this directory.

Prepare the input table of shift assignments (for example "myshifts.tab"),
according to the format given below.

Run TALOS+ (talos+) to perform the database searches. Most commonly, this
will simply require a command such as:

talos+ -in myshifts.tab

During the database search, a summary file "predAll.tab" will be
created to store the 10 best database matches for all residues in the target protein. Before exiting, a file "pred.tab"
will also be created, which includes an initial summary of the prediction
results. Additionally, three files "predAdjCS.tab", "predABP.tab" and "predSS.tab" will be
created to store the calculated secondary chemical shifts used for prediction,
the ANN-predicted 3-state phi/psi distribution (Alpha, Beta and
Positive-Phi) information and the predicted secondary structure, respectively. The database search will typically
take about 15-20 sec per 100 residues.

In the original TALOS System, the classification step was performed
by the VINA application (vina.tcl).
This classification is now part of the TALOS+ database search procedure,
and the VINA application is no longer used.

Run RAMA (rama+) or
JRAMA+
to inspect and adjust the predictions. The simplest
RAMA+ invocations are:

rama+ -in myshifts.tab
rama+ -in myshifts.tab -ref mystruct.pdb

During this inspection, you will:

Examine the phi/psi distributions of the center residues of the best 10
database matches for a given query residue, and decide which ones should be
included in the prediction, and which are "outliers". (NOTE: in
the vast majority of cases, the initial automated classifications performed by the current
version of the TALOS+ program should be acceptable with no manual adjustment
needed).

Classify the results for a given residue as "Good", "Ambiguous", or (if a
reference structure is known) "Bad".

The file "predAll.tab" will be adjusted along the way to reflect any
changes made interactively, and a new "pred.tab" summary file will be created on
exiting. When the above steps are completed, the final "pred.tab" file will
include the classification ("Good" etc) and predictions (averages and standard
deviations) for phi and psi at each residue.

Convert TALOS+ results to other formats, for use as structural restraints,
etc. TALOS+ package includes shell scripts such as "talos2dyana.com"
and "talos2xplor.com" for this purpose, examples for using them are:

It checks the referencing for 13CA, 13CB, 1HA
and 13C' chemical shifts, using the empirical correlation between certain sets of chemical
shifts data (Wang et al., 2005 J Biol NMR, 32:13-22). The
estimated chemical shift referencing offsets, as well as the chemical shifts
which largely deviate from their expected ranges, will be printed with the following
format:

Note that (1) a chemical shift
referencing correction is likely required when ever the estimated
referencing error approaches the average uncertainty in the database chemical
shifts (~1.0 ppm for 13CA/CB and 13C' shifts; ~0.3 ppm for 1HA shifts), and/or
the estimated referencing error larger than five times the average fitting
errors; (2) chemical shift outliers,
which fall far outside (>2-3 times of) the expected range of secondary
chemical shifts (and marked by "!"), are
unlikely to be correct (or like in the above example correspond to a C-terminal carboxylate instead of a backbone carbonyl) and need
to be checked carefully.

An example portion of the required shift table format is shown below. Full Example: ubiq.tab.
Other examples can be found in the talosplus/shifts and talosplus/demo directories of an NMRPipe
installation, or at the TALOS Server site.
Specifically:

13C chemical shifts for CA, CB, and CO used as input for TALOS/TALOS+ should be
referenced relative to TSP. The 15N chemical shifts used as input for TALOS/TALOS+ should be referenced relative
to liquid ammonia at 25 degrees C.

Use the optional DATA FIRST_RESID line to specify the first residue ID
number of the sequence. By default, residue numbering is assumed to begin at 1.

The protein sequence should be given as shown, using one or more DATA
SEQUENCE lines. Space characters in the sequence will be ignored. Use "c" for
oxidized CYS (CB ~ 42.5 ppm) and "C" for reduced CYS (CB ~ 28 ppm), "h" for
protonated HIS and and "H" for unprotonated HIS,
in both the sequence header and the shift table. Use X for residues other than
the usual 20 amino acids.

The table must include columns for residue ID, one-character residue name,
atom name, and chemical shift.

The table must include a "VARS" line which labels the corresponding
columns of the table.

The table must include a "FORMAT" line which defines the data type of the
corresponding columns of the table.

Atom names are always given exactly as:

HA

for H-alpha of all residues except
glycine

HA2

for the first H-alpha of glycine
residues

HA3

for the second H-alpha

C

for C' (CO)

CA

for C-alpha

CB

for C-beta

N

for N-amide

HN

for H-amide

As noted, there is an exception for naming glycine assignments, which
should use HA2 and HA3 instead of HA. In the case of glycine HA2/HA3
assignments, TALOS/TALOS+ will use the average value of the two, so that it is not
necessary to have these assigned stereo specifically ; for use of TALOS/TALOS+, the
assignment can be arbitrary. Note however that the assignment must be given
exactly as either "HA2" or "HA3" rather than "HA2|HA3" etc.

Other types of assignments may be present in the shift table; they will be
ignored.

TALOS now also has the option to use chemical shift input in the BMRB NMR-Star format.
If NMR-Star format input is used, the input must contain shifts for a single protein
chain only. It must also contain complete sequence information for the protein.
Specifically, the NMR-Star format table must contain a sequence section
with _Residue_seq_code and _Residue_label values,
and a chemical shift section with values for
_Residue_seq_code_Residue_label_Atom_name_Atom_type and
_Chem_shift_value.
Example: ubiq_bmr6457_1D3Z.str.

The final step in interpreting the results of the TALOS+ database search is to
inspect and classify the matches so that useful predictions can be formed;
however, in most cases, the initial automated classifications performed by the
current version of the TALOS+ program should be acceptable with no manual
adjustment needed.

Refinement of predictions can be done via the graphical interface rama+,
which is included in the package, or a web-based Java version of the RAMA+
Viewer (JRAMA+)
The simplest invocation of rama+ is:

rama+ -in myshifts.tab

If a proposed structure is available, first run TALOS+ with it to generate a
prediction summary:

talos+ -in myshifts.tab -ref mystruct.pdb

Then, invoke RAMA so that the reference structure is included in the display
of prediction data:

rama+ -in myshifts.tab -ref mystruct.pdb

The various windows displayed by rama+ are shown below.

Sequence Window: displays the target protein sequence, with each
residue colored according to its classification. Clicking on a residue with
the mouse will select that residue for display and analysis in the other
windows. The residues are colored according to this scheme:

Green

Unambiguous/Good prediction (no outlier)

Yellow

Ambiguous; no prediction

Blue

Dynamic; no prediction

Red

Bad prediction relative to a known
structure

Gray

No classification yet

Prediction Window: lists the statistics of the 10 best database
matches for the currently selected residue in the target protein. The individual
entries in this window can be toggled by a mouse click, to include or remove a
particular match from the prediction.

Ramachandran window: graphs the phi/psi distributions of the 10 best
database matches for the currently selected residue. It also displays the average
and standard deviation of phi and psi for those matches which are selected (i.e.
included in the prediction), as well ANN-predicted probability to find any given
residue in the Alpha, Beta, or Positive-phi region. The shaded region of the map shows
the most populated regions of the TALOS+ database for the residue type in
question.

In the graph, each match from the database is drawn as a small square at a
particular phi/psi coordinate. The individual squares can be toggled by a mouse
click, to include or remove the corresponding match from the prediction. The
squares are colored according to this scheme:

Green

This match is included in the prediction.

Red

Outlier; not included in the prediction

Blue

Reference (phi/psi taken from "-ref" structure)

The Ramachandran window also includes buttons to reclassify the overall
prediction as "Good", "Ambiguous", etc., and to move to the next or previous
residue in the sequence.

Secondary Structure and RCI-S2 Prediction Window: graphs
the predicted order parameter S2 (upper panel) and ANN-predicted secondary
structure (lower panel; aqua, beta-sheet; red, helix) for all residues. The height of the
bars reflects the probability of the neural network secondary structure
prediction. The RCI-S2 value and the probabilities of the 3-state [helix|sheet|loop]
secondary structure prediction for the current residue (indicated by yellow
vertical lines) are labeled above the corresponding panel, followed by the S2
and secondary structure probabilities for the "cursor-activated" residue
(indicated by white vertical lines, not visible in this figure).

Secondary Shift Window: (optional with the "-sd" argument)
graphs the secondary shift distributions of the 10 best database matches for the
currently selected residue.

Molecular Viewer Window: (optional with the "-ras" argument)
Displays the three-dimensional structure given by the "-ref" argument,
colorized according to the residue classification scheme above. This option
assumes that the program RasMol is available as a viewer.

The old TALOS rules for defining consistent ("Good") predictions are based on
clustering of at least 9 out of the 10 best database matches in the same
region of the Ramachandran map. The TALOS+ rules for defining consistent
("Good") predictions are similar but slightly more strict:

All 10 best database matches fall in a "consistent" region of the
Ramachandran map, i.e., in a consistent Alpha, Beta or Positive-Phi region, and

The confidence of the ANN 3-state Phi/Psi distribution prediction for this
residue (defined as the difference between the probabilities of the two most
favored predicted states) must be above 0.6. (0.7 for residues with "Positive-Phi" prediction),
and

The RCI-predicted order parameter S2 value > 0.5.

All the cases with predicted S2 value <0.5 are likely to be
"Dynamic", and will not be considered as unambiguous predictions.

All other cases are considered "Ambiguous".

When a reference structure is available, predictions will be flagged as "Bad"
(automatically by talos+) if either of the following conditions applies:

Cases where |Phi(obs) - Phi(pred) + Psi(obs) - Psi(pred)| < 60 cause the
peptide chain to continue in roughly the correct direction,
and larger tolerance
limits (up to +/-90 degrees) are accepted for phi and psi in these cases.

In practice, this usually means that the standard
deviation of phi and psi for the selected group of matches will be 35 degrees or
less (12-13 degrees on average).

When inspecting the phi/psi graphs to decide if matches are in a consistent
region, keep in mind their "periodic" nature; i.e. angles at one edge of the
graph are actually close to angles at the opposite edge.

The original version of TALOS is still installed along with NMRPipe
for backward compatibility reasons. The TALOS files are installed in
the talos directory, which is specified by the TALOS_DIR
environment variable. The components of the original TALOS, (TALOS, VINA,
and RAMA) can be accessed by
including the -old command-line flag, for example: