CluSTr

Compilation Paper

Categories List

Alphabetical List

Search Summary Papers

CluSTr

http://www.ebi.ac.uk/clustr/

Kriventseva, E.V., Servant, F., Fleischmann, W., Zdobnov, E.V., Apweiler, R.

EBI-EMBL Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD,UK

Contact zhenya@ebi.ac.uk

Database Description

The CluSTr (Clusters of SWISS-PROT and TrEMBL proteins) database offers an automatic classification of SWISS-PROT and TrEMBL proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. Analysis has been carried out for different levels of protein similarity, yielding a hierarchical organisation of clusters. The database provides links to InterPro, which integrates information on protein families, domains and functional sites from PROSITE, PRINTS, Pfam, ProDom and SMART. Links to the InterPro graphical interface allow users to see at a glance whether proteins from the cluster share particular functional sites. CluSTr also provides cross-references to HSSP and PDB. The database is available for querying and browsing at http://www.ebi.ac.uk/clustr.

Recent Developments

During last year our effort was focusted on creating clusters for complete proteomics sets. Clusters for 42 complete prokaryotic proteomes were build. In addition to existing last year complete eukaryotic proteomes of Caenorhabditis elegans, Saccharomyces cerevisiae and Drosophila melanogaster we added data for Arabidpsis thaliana. Since complete coding sequence predictions for Homo sapiens and Mus musculus are not yet available in the EMBL nucleotide sequence database, we provide clusters for SWISS-PROT and TrEMBL proteins from SPTr and Ensembl (http://www.ensembl.org) draft complete proteomes.

Acknowledgements

We thank Gene-IT for technical support.

REFERENCES

1. Kriventseva,E.V., Fleischmann,W. and Apweiler,R. (2001) CluSTr: a database of Clusters of SWISS-PROT+TrEMBL proteins. Nucleic Acids Res. 29(1): 33-36.
2. Bairoch, A., Apweiler, R., (2000) The SWISS-PROT protein sequence database and its and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45-48.
3. Apweiler,R., Attwood,T.K., Bairoch,A., Bateman,A., Birney,E., Biswas,M ., Bucher,P., Cerutti,L., Corpet,F., Croning,M.D.R., et al. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 29(1):37-40.
4. Hofmann, K., Bucher, P., Falquet, L. and Bairoch, A. (1999) The PROSITE database, its status in 1999. Nucleic Acids Res., 27, 215-219.
5. Attwood, T.K., Croning, M.D.R., Flower, D.R., Lewis, A.P., Mabey, J.E., Scordis, P., Selley, J.N. and Wright, W. (2000) PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res., 28, 225-227.
6. Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L. and Sonnhammer, E.L. (2000) The Pfam protein families database. Nucleic Acids Res., 28, 263-266.
7. Corpet, F., Servant, F., Gouzy, J. and Kahn, D. (2000) ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res., 28, 267-269.
8. Schultz J., Copley R.R., Doerks T., Ponting C.P., Bork P.(2000) SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 28(1),229-32.
9. Holm, L. and Sander, C. (1999) Protein folds and families: sequence and structure alignments. Nucleic Acids Res., 27, 244-247.
10. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N. and Bourne, P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235-242.

Category Protein Sequence Motifs

Go to the abstract in the NAR 2003 Database Issue.