Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

InterPro

http://www.ebi.ac.uk/interpro

Mulder, N.J.1, Apweiler, R.1, Attwood, T.K.2, Bairoch, A.3, Bateman, A.4, Binns, D.1, Biswas, M.10, Bradley, P.1, Bork, P.8, Bucher, P.5, Copley, R.R.11, Courcelle, E.6, Das, U.1, Durbin, R.4, Falquet, L.5, Fleischmann, W.1, Griffiths-Jones, S.4, Haft, D.9, Harte, N.1, Hermjakob, H.1, Hulo, N.3, Kahn, D.6, Kanapin, A.1, Krestyaninova, M.1, Lopez, R.1, Letunic, I.8, Lonsdale, D.1, Silventoinen, V.1, Orchard, S.E.1, Pagni, M.5, Peyruc, D.6, Ponting, C.P.7, Selengut, J.D.9, Servant, F.1, Sigrist, C.J.A.3, Vaughan, R.1, Zdobnov, E.M.8

1EMBL Outstation - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
2School of Biological Sciences and Department of Computer Science, The University of Manchester, Manchester, UK.
3Swiss Institute for Bioinformatics, Geneva, Switzerland.
4Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
5Swiss Institute for Experimental Cancer Research, Lausanne, Switzerland.
6CNRS/INRA, Toulouse, France.
7MRC Functional Genetics Unit, Department of Human Anatomy & Genetics, University of Oxford, UK.
8Biocomputing Unit EMBL, Heidelberg, Germany.
9The Institute for Genomic Research, Maryland, USA.
10ViaLactia Biosciences, Newmarket Auckland, New Zealand.
11Wellcome Trust Centre for Human Genetics, Oxford, UK.

Contact   mulder@ebi.ac.uk


Database Description

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase in nearly 15% since the conception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).

Recent Developments

There have been a number of improvements to the InterPro database since its conception, including increased coverage, additional features of the search tools, and a new look web interface.

Acknowledgements

The InterPro project is supported by the ProFuSe grant (number QLG2-CT-2000-00517) of the European Commission.

REFERENCES

1. Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C. J. A., Hofmann, K. and Bairoch, A. (2002) The PROSITE database, its status in 2002. Nucleic Acids Res., 30, 235-238.
2. Attwood, T. K., Blythe, M. J., Flower, D. R., Gaulton, A., Mabey, J. E., Maudling, N., McGregor, L., Mitchell, A. L., Moulton, G., Paine, K. and Scordis, P. (2002) PRINTS and PRINTS-S shed light on protein ancestry. Nucleic Acids Res., 30, 239-241.
3. Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S. R., Griffiths-Jones, S., Howe, K.L., Marshall, M. and Sonnhammer E. L. L. (2002) The Pfam Protein Families Database. Nucleic Acids Res., 30, 276-280.
4. Corpet, F., Servant, F., Gouzy, J. and Kahn, D. (2000) ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res., 28, 267-269.
5. Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning, M. D., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N. J., Oinn, T. M., Pagni, M. and Servant, F. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res., 29(1), 37-40.
6. Letunic, I., Goodstadt, L., Dickens, N. J., Doerks, T., Schultz, J., Mott, R., Ciccarelli, F., Copley, R. R., Ponting, C. P. and Bork, P. (2002) Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res., 30, 242-244.
7. Haft, D. H., Loftus, B. J., Richardson, D. L., Yang, F., Eisen, J. A., Paulsen, I. T. and White, O. (2001) TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res., 29(1), 41-43.
8. Bairoch, A. and Apweiler, R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45-48.
9. Doerks T., Copley R. R., Schultz J., Ponting C. P., Bork P. (2002) Systematic identification of novel protein domain families associated with nuclear functions. Genome Res., 12(1), 47-56.
10. The Gene Ontology Consortium (2001) Creating the gene ontology resource: design and implementation. Genome Res., 11, 1425-1433.
11. Zdobnov, E. M. and Apweiler, R. (2001) InterProScan--an integration platform for the signature-recognition methods in InterPro. Bioinformatics, 17(9), 847-848.
12. Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E. L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol., 305(3), 567-80.
13. Nielsen, H., Engelbrecht, J., Brunak, S. and von Heijne, G. (1997) A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Int J Neural Syst., 8(5-6), 581-599.
14. Etzold, T., Ulyanov, A. and Argos, P. (1996) SRS: information retrieval system for molecular biology data banks. Methods Enzymol., 266, 114-128.
15. Clamp, M. E., Cuff, J. A. and Barton, G. J. (1998) Jalview - a java multiple alignment editor, [http://www.ebi.ac.uk/~michele/jalview/].
16. Corpet, F., Gouzy, J. and Kahn, D. (1999) Browsing protein families via the 'Rich Family Description' format. Bioinformatics, 15, 1020-1027.
17. Biswas, M., O´Rourke, J.F., Camon, E., Fraser, G., Kanapin, A., Karavidopoulou, Y., Kersey, P., Kriventseva, E., Mittard, V., Mulder, N., Phan, I., Servant, F. and Apweiler, R. (2002) Applications of InterPro in protein annotation and genome analysis. Briefings in Bioinf., 3(3), 285-295.
18. The International Human Genome Consortium. (2001) Initial sequencing and analysis of the human genome. Nature, 409(6822), 860-921.
19. Kawaji, H., Schonbach, C., Matsuo, Y., Kawai, J., Okazaki, Y., Hayashizaki, Y. and Matsuda, H. 2002 Exploration of novel motifs derived from mouse cDNA sequences. Genome Res., 12(3), 367-78.
20. Yu, J., Hu, S., Wang, J., Wong, G. K., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., Cao, M., Liu, J., Sun, J., Tang, J., Chen, Y., Huang, X., Lin, W., Ye, C., Tong, W., Cong, L., Geng, J., Han, Y., Li, L., Li, W., Hu, G., Huang, X., Li, W., Li, J., Liu, Z., Li, L., Liu, J., Qi, Q., Liu, J., Li, L., Li, T., Wang, X., Lu, H., Wu, T., Zhu, M., Ni, P., Han, H., Dong, W., Ren, X., Feng, X., Cui, P., Li, X., Wang, H., Xu, X., Zhai, W., Xu, Z., Zhang, J., He, S., Zhang, J., Xu, J., Zhang, K., Zheng, X., Dong, J., Zeng, W., Tao, L., Ye, J., Tan, J., Ren, X., Chen, X., He, J., Liu, D., Tian, W., Tian, C., Xia, H., Bao, Q., Li, G., Gao, H., Cao, T., Wang, J., Zhao, W., Li, P., Chen, W., Wang, X., Zhang, Y., Hu, J., Wang, J., Liu, S., Yang, J., Zhang, G., Xiong, Y., Li, Z., Mao, L., Zhou, C., Zhu, Z., Chen, R., Hao, B., Zheng, W., Chen, S., Guo, W., Li, G., Liu, S., Tao, M., Wang, J., Zhu, L., Yuan, L. and Yang H. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science, 296(5565), 79-92.
21. Goff, S. A., Ricke, D., Lan, T. H., Presting, G., Wang, R., Dunn, M., Glazebrook, J., Sessions, A., Oeller, P., Varma, H., Hadley, D., Hutchison, D., Martin, C., Katagiri, F., Lange, B. M., Moughamer, T., Xia, Y., Budworth, P., Zhong, J., Miguel, T., Paszkowski, U., Zhang, S., Colbert, M., Sun, W. L., Chen, L., Cooper, B., Park, S., Wood, T. C., Mao, L., Quail, P., Wing, R., Dean, R., Yu, Y., Zharkikh, A., Shen, R., Sahasrabudhe, S., Thomas, A., Cannings, R., Gutin, A., Pruss, D., Reid, J., Tavtigian, S., Mitchell, J., Eldredge, G., Scholl, T., Miller, R. M., Bhatnagar, S., Adey, N., Rubano, T., Tusneem, N., Robinson, R., Feldhaus, J., Macalma, T., Oliphant, A. and Briggs, S. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science, 296(5565), 92-100.
22. Carlton, J. M., Muller, R., Yowell, C. A., Fluegge, M. R., Sturrock, K. A., Pritt, J. R., Vargas-Serrato, E., Galinski, M. R., Barnwell, J. W., Mulder, N., Kanapin, A., Cawley, S. E., Hide, W. A. and Dame, J. B. (2001) Profiling the malaria genome: a gene survey of three species of malaria parasite with comparison to other apicomplexan species. Mol Biochem Parasitol., 118(2), 201-220.
23. Rubin, G.M., Yandell, M.D., Wortman, J.R., Gabor Miklos, G.L., Nelson, C.R., Hariharan, I.K., Fortini, M.E., Li P.W., Apweiler, R., Fleischmann, W., Cherry, J.M., Henikoff, S., Skupski, M.P., Misra, S., Ashburner, M., Birney, E., Boguski, M.S., Brody, T., Brokstein, P., Celniker, S.E., Chervitz, S.A., Coates, D., Cravchik, A., Gabrielian, A., Galle, R.F., Gelbart, W.M., George, R.A., Goldstein, L.S., Gong, F., Guan, P., Harris, N.L., Hay, B.A., Hoskins, R.A., Li, J., Li, Z., Hynes, R.O., Jones, S.J., Kuehl, P.M., Lemaitre, B., Littleton, J.T., Morrison, D.K., Mungall, C., O'Farrell, P.H., Pickeral, O.K., Shue, C., Vosshall, L.B., Zhang, J., Zhao, Q., Zheng, X.H., Zhong, F., Zhong, W., Gibbs, R., Venter, J.C., Adams, M.D., Lewis, S. (2000) Comparative genomics of the eukaryotes. Science, 287, 2204-2215.
24. Wu, C. H., Xiao, C., Hou, Z., Huang, H., Barker, W. C. (2001) iProClass: an integrated, comprehensive and annotated protein classification database. Nucleic Acids Res., 29(1), 52-54.
25. Lo Conte, L., Brenner, S. E., Hubbard, T. J., Chothia, C. and Murzin, A. G. (2002) SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res., 30(1), 264-267.
26. Pearl, F. M, Lee, D., Bray, J. E,, Buchan, D. W., Shepherd, A. J. and Orengo, C. A. (2002) The CATH extended protein-family database: providing structural annotations for genome sequences. Protein Sci., 11(2), 233-244.
27. Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modelling. Electrophoresis, 18(15), 2714-2723.

Category   Protein Databases

Go to the abstract in the NAR 2003 Database Issue.

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers