Online Supporting Information A. This benchmark dataset for Gneg-mPLoc that includes 1,456 locative protein sequences (1,392 different proteins), classified into 8 Gram-negative subcellular locations. Among the 1,392 different proteins, 1,328 belong to one location; and 64 to two locations. Both the accession numbers and sequences are given. None of the proteins has more than 25% sequence identity to any other in the same subset (subcellular location). See the text of the paper for further explanation.
Click Supp-A to download the dataset.