HAMAP •
HAMAP is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated, manually created annotation rules that specify annotations that apply to family members. HAMAP is applied to bacterial, archaeal and eukaryotic proteins and used to annotate records in UniProtKB via UniProt's automatic annotation pipeline. [less]

MyHits •
Hits is a free database devoted to protein domains. It is also a collection of tools for the investigation of the relationships between protein sequences and motifs described on them. These motifs are defined by an heterogeneous collection of predictors, which currently includes regular expressions, generalized profiles and hidden Markov models. [less]

Tools

2ZIP •
The leucine zipper is a dimerisation domain occurring mostly in regulatory and thus in many oncogenic proteins. 2ZIP combines a standard coiled coil prediction algorithm with an approximate search for the characteristic leucine repeat. No further information from homologues is required for prediction. This approach improves significantly over existing methods, especially in that the coiled coil prediction turns out to be highly informative and avoids large numbers of false positives [less]

Coiled-Coils prediction •
This program delineates coiled-coil domains in otherwise globular proteins, such as the leucine zipper domains in transcriptional regulators, and to predict regions of discontinuity within coiled-coil structures, such as the hinge region in myosin. [less]

GENIO/logo •
Positional dependent information contents of aligned RNA/DNA or amino acid sequences are useful for the display of consensus sequences and for finding optimal search windows used in sequence analysis. The program calculates the positional information content of mono or poly nucleotides/amino acids from a FASTA file of aligned sequences and writes a Postscript (or encapsulated Postscript, EPS) file that can be viewed and included in text processors [less]

HAMAP •
HAMAP is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated, manually created annotation rules that specify annotations that apply to family members. HAMAP is applied to bacterial, archaeal and eukaryotic proteins and used to annotate records in UniProtKB via UniProt's automatic annotation pipeline. [less]

HAMAP-Scan •
Scan several protein sequences or a whole genome (all ORFs) against HAMAP family profiles. Sequences that match HAMAP profiles will be annotated in the UniProtKB format by the associated annotation rules. [less]

HeliQuest •
HeliQuest calculates from the amino acid sequence of a helix (??-helix, 3-10 helix, 3-11 helix or ?? helix) its physicochemical properties and amino acid composition and uses the results to screen any databank in order to identify protein segments possessing similar features. [less]

InterProScan •
InterProScan is a tool that combines different protein signature recognition methods into one resource. The number of signature databases and their associated scanning tools, as well as the further refinement procedures, increases the complexity of the problem. [less]

Multicoil •
The MultiCoil program predicts the location of coiled-coil regions in amino acid sequences and classifies the predictions as dimeric or trimeric. The method is based on the PairCoil algorithm. To analyze your own sequences with MultiCoil, you can either use the web interface or download the program. [less]

MyHits •
Hits is a free database devoted to protein domains. It is also a collection of tools for the investigation of the relationships between protein sequences and motifs described on them. These motifs are defined by an heterogeneous collection of predictors, which currently includes regular expressions, generalized profiles and hidden Markov models. [less]

PattInProt •
PattInProt allows to scan a protein database of one or several sequences for one or several patterns written in PROSITE syntax. The tool allows to specify an allowed number of mismatches or a similarity threshold towards the pattern. [less]

pftools •
The pftools are a collection of programs to build, calibrate, and search biological sequences with generalized profiles. Generalized profiles are an extension of position specific scoring matrices by including position specific scores for insertions and deletions. They correspond to a matrix representation of a multiple sequence alignment that can be used to search distant homologous sequences and precisely align sequences to the model. [less]

PPSearch •
Search your query sequence for protein motifs, rapidly compare your query protein sequence against all patterns stored in the PROSITE pattern database and determine what the function of an uncharacterised protein is. This tool requires a protein sequence as input, but DNA/RNA may be translated into a protein sequence using transeq and then queried. Allows a graphical output. [less]

PRATT •
An important problem in sequence analysis is to find patterns matching sets or subsets of sequences. This tool allows the user to discover patterns conserved in sets of unaligned protein sequences. The user can specify what kind of patterns should be searched for, and how many sequences should match a pattern to be reported. The patterns are reported using PROSITE syntax. [less]

PRATT (EBI) •
An important problem in sequence analysis is to find patterns matching sets or subsets of sequences.This tool allows the user to search for patterns conserved in sets of unaligned protein sequences. The user can specify what kind of patterns should be searched for, and how many sequences should match a pattern to be reported.
[less]

ProDom •
ProDom is a protein domain family database constructed automatically by clustering homologous segments. Compare your sequence with ProDom by running a Blast-P or Blast-X search against: the consensus sequence provided with the ProDom families or the multiple alignments provided with each ProDom family. ProDom-CG (Complete Genomes) and ProDom-SG (Structural Genomics Candidate Search) can also be searched. [less]

Protein Sequence Logos •
Protein sequence alignment viewed as sequence logos. The total height of the sequence information part is computed as the relative entropy between the observed fractions of a given symbol and the respective a priori probabilities. [less]

Radar •
RADAR stands for Rapid Automatic Detection and Alignment of Repeats in protein sequences. Many large proteins have evolved by internal duplication and many internal sequence repeats correspond to functional and structural units. Radar is uses an automatic algorithm, for segmenting your query sequence into repeats, it identifies short composition biased as well as gapped approximate repeats and complex repeat architectures involving many different types of repeats in your query sequence. [less]

REPRO •
REPRO is able to recognise distant repeats in a single query sequence. The technique relies on a variation of the Smith-Waterman local alignment strategy to find non-overlapping top-scoring local alignments, followed by a graph-based iterative clustering procedure to delineate the repeat set(s) based on consistency of the pairwise top-alignments. [less]

SUPERFAMILY Sequence Search •
Use the SUPERFAMILY database of structural and functional annotation to provide structural (and hence implied functional) assignments to protein sequences primarily at the SCOP superfamily level. A superfamily contains all proteins for which there is structural evidence of a common evolutionary ancestor. This service offers sophisticated and expertly chosen remote homology detection. [less]

T-REKS •
T-REKS is an algorithm for de novo detection and alignment of repeats in sequences based on K-means algorithm. Minimal length of repeat arrays is 9 for true homorepeats and 14 for other repeats with potential biological meaning. [less]

TRUST •
TRUST is a method for ab-initio determination of internal repeats in proteins. The high sensitivity and accuracy of the method is achieved by exploiting the concept of transitivity of alignments. [less]

WebLogo •
Sequence logos are a graphical representation of an amino acid or nucleic acid multiple sequence alignment. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position. [less]