Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

SUPFAM

http://pauling.mbu.iisc.ernet.in/~supfam

Pandit, S.B.1, Balaji, S.1, Gowri, V.S.1, Abhinandan, K.R.1, Vaishnavi, R.2, Srinivasan, N.1

1Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
2Biotechnology Centre, Indian Institute of Technology, Powai, Mumbai 400 076, India

Contact   ns@mbu.iisc.ernet.in


Database Description

Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional (3-D) structures of the proteins concerned are determined using X-ray analysis or NMR. The SUPFAM (1) database described here relates two or more homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present SUPFAM update (1.2) has been derived using Pfam (2) (Version 7.2), a database of sequence domains, and PALI (3) (Release 2.1) which is an alignment database of homologous proteins of known structure that is derived largely from SCOP (4). The first step in establishing SUPFAM is to relate Pfam families with the families in PALI. The second step involves relating Pfam families that could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA (5) has been used in these steps. In the present update, the first step resulted in identification of 1788 Pfam families (out of 3735 ~ 48%) which are related to a SCOP/PALI family resulting new superfamily connections. Out of these 1788 families we could relate 311 Pfam families, with apparently no structural information, to families of known 3-D structures resulting in the identification of new families belonging to existing superfamilies. 1429 of the PALI families (86%) could be associated to Pfam families. These 1429 PALI families exist in 860 SCOP superfamilies and the Pfam families associated with these automatically becomes members of 860 superfamilies. In the second step, using the profiles of 1947 Pfam families with apparently no structural information an all-against-all comparison involving sequence-profile match using IMPALA resulted in clustering of 58 homologous protein families of Pfam in to 23 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying 'priority proteins' for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam.

Recent Developments

The PFAM and PALI releases used in the current update of SUPFAM correspond to a much larger databases compared to the last update. This resulted in 508 additional number of PFAM families associated with PALI families and hence the present release of SUPFAM has a much larger number of sequence families associated with the structures. The present updated revision of SUPFAM also resulted in grouping of 58 PFAM families into 23 potential new superfamilies.

Acknowledgements

This work is supported by the Wellcome Trust, London in the form of a Senior Fellowship to NS.

REFERENCES

1. Pandit, S.B., Gosar, D., Abhiman, S., Sujatha, S., Dixit, S.S., Mhatre, N.S., Sowdhamini, R. and Srinivasan, N. (2002) SUPFAM - Database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: Implications for structural genomics and function annotation in genomes. Nucleic Acids Res., 30, 289-293.
2. Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L. and Sonnhammer, E.L.L. (2000)
PFAM protein families database.Nucleic Acids Res., 28, 263-266.
3. Balaji, S., Sujatha, S., Kumar, S.S.C. and Srinivasan, N. (2001) PALI: A database of Phylogeny and ALIgnment of homologous protein structures. Nucleic Acids Res. 29, 61-65.
4. Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536-540.
5. Schaffer, A.A., Wolf, Y.I., Ponting, C.P., Koonin, E.V., Aravind, L. and Altschul, S.F. (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics, 15, 1000-1011.

Category   Protein Sequence Motifs

Go to the abstract in the NAR 2002 Database Issue.

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers