Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

Proteome Analysis Database

http://www.ebi.ac.uk/proteome/

Apweiler, R.1, Fleischmann, W.1, Kanapin, A.1, Karavidopoulou, Y.1, Kersey, P.1, Kriventseva, E.V.1, Mittard,1, Mulder, N.1, Phan, I.2, Pruess, M.1, Servant, F.1

1EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
2Swiss Institute of Bioinformatics, Rue Michel-Servet 1, 1211 Geneve 4, Switzerland

Contact   mpr@ebi.ac.uk


Database Description

The Proteome Analysis Database (http://www.ebi.ac.uk/proteome/) has been set up to provide comprehensive statistical and comparative analyses of the predicted proteomes of fully sequenced organisms spanning bacteria, archaea and eukaryotes (1). The analysis is compiled using InterPro (2), CluSTr (3) and GO (4), and is performed on the non-redundant complete proteome sets of SWISS-PROT and TrEMBL entries (5). InterPro and CluSTr, give a new perspective on families, domains and sites and cover from 40% to 70%(InterPro statistics) of the proteins from each of the complete genomes. The Proteome Analysis Database includes a program that has been designed to carry out InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database. Links to structural information through the HSSP (6), PDB (7) and SCOP (8) databases and links to the GO functional ontology for each proteome are an additional feature of the database.

Recent Developments

Many new proteomes have been added and the database now holds data for 89 proteomes. Although complete coding sequence predictions for Homo sapiens and Mus musculus are not yet available in the EMBL nucleotide sequence database, SWISS-PROT and Ensembl (http://www.ensembl.org) jointly offer a draft complete proteome for these species. This data is available as part of the Proteome Analysis Database and includes InterPro based statistical analysis for these data sets. Other developments include: statistics for protein length distributions, links to SCOP and links to NEWT, the taxonomy database browser (http://www.ebi.ac.uk/webapps/taxonomy/frameset.html).

REFERENCES

1. Apweiler R., Biswas M., Fleischmann W., Kanapin A., Karavidopoulou Y., Kersey P., Kriventseva E.V., Mittard V., Mulder N., Phan I., Zdobnov E. (2001) Proteome Analysis Database: online application of InterPro and CluSTr for the functional classification of proteins in whole genomes. Nucleic Acids Res. 29(1):44-48.
2. Apweiler,R., Attwood,T.K., Bairoch,A., Bateman,A., Birney,E., Biswas,M., Bucher,P., Cerutti,L., Corpet,F., Croning,M.D.R., et al. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites.
Nucleic Acids Res. 29(1):37-40.
3. Kriventseva,E.V., Fleischmann,W. and Apweiler,R. (2001) CluSTr: a database of Clusters of SWISS-PROT+TrEMBL proteins. Nucleic Acids Res. 29(1):33-36.
4. Ashburner,M., Ball,C.A., Blake,J.A., Botstein,D., Butler,H., Cherry,J.M., Davis,A.P., Dolinski,K., Dwight,S.S., Eppig,J.T., et al. (2000) Gene Ontology: for the unification of biology. Nature Genetics 25, 25-29.
5. Bairoch, A., Apweiler, R., (2000) The SWISS-PROT protein sequence database and its and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45-48.
6. Holm,L. and Sander,C. (1999) Protein folds and families: sequence and structure alignments. Nucleic Acids Res. 27,244-247.
7. Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res. 28,235-242.
8. Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995) SCOP: a
structural classification of proteins database for the investigation of sequences and
structures. J. Mol. Biol. 247,536-540.

Category   Proteome Resources

Go to the abstract in the NAR 2003 Database Issue.

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers