Supplementary Table 4: Alignment of 484 lipobox-containing proteins from halophilic archaea. The N-terminal regions of the 484 putative lipoproteins encoded by 6 halophilic archaeal genomes were aligned by introducing a gap of variable length between positions 5 and 6 after the twin-Arginine motif. The first 400 proteins are TatFind positive. The next 50 are TatFind negative but have a twin-Arginine motif. The last 34 are TatFind negative and lack a twin-Arginine motif. The results of the three lipoprotein prediction programs are indicated for each protein.

Supplementary Table 5: Position-specific amino acid frequencies. The position-specific amino acid frequencies computed for the 484 lipoproteins from 6 halophilic archaea. Amino acids in the vicinity of the lipobox motif showing a strong composition bias were used for the TatLipo algorithm, as indicated.

Supplementary Figure 1: Scheme for the prediction of archaeal lipoproteins. The figure shows a schematic representation of the assignment of secreted proteins tofour protein classes Tat/lipo, Tat/SPase I, Sec/lipo and Sec/SPase I. Data from three widely used lipoprotein prediction programs were integrated to predict the lipobox. TatFind was used to predict Tat substrates. TatFind negatives that were either predicted by Phobius or predicted to contain a lipobox are considered to be Sec substrates.

Supplementary Text: Bioinformatic Secretome Analysis. This text provides additional details concerning (a) lipoprotein prediction; (b) assignment of Tat-specific signal peptides; and (c) an evaluation of the TatLipo program and of the other lipoprotein prediction programs using the TatFind positive subset.