Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

iProClass

http://pir.georgetown.edu/iproclass/

Wu, C.H.1, Huang, H.1, Chen, Y.2, Barker, W.C.2

1Dept. Biochemistry and Molecular Biology Georgetown University Medical Center Washington, DC 20057
2National Biomedical Research Foundation Georgetown University Medical Center Washington, DC 20057

Contact   wuc@georgetown.edu


Database Description

The iProClass database provides comprehensive descriptions of proteins and serves as a framework for data integration in a distributed networking environment. The protein information in iProClass includes family relationships at both global (superfamily/family) and local (domain, motif, site) levels, as well as structural and functional classifications and features of proteins. The current version consists of more than 800,000 non-redundant PIR-PSD, Swiss-Prot, and TrEMBL proteins organized with more than 36,000 PIR superfamilies, 145,000 families, 4000 homology domains, 1300 motifs, 280 post-translational modification sites, and 550,000 FASTA similarity clusters. iProClass provides rich links to over 50 databases of protein families (e.g., COG, InterPro), functions and pathways (e.g., KEGG, WIT), protein-protein interactions (e.g., DIP, BIND), post-translational modifications (e.g., RESID), structures and structural classifications (e.g., PDB, SCOP, CATH), genes and genomes (e.g., TIGR, GDB, OMIM), ontologies (e.g., GO), literature (PubMed), and taxonomy (NCBI Taxonomy). Protein and superfamily summary reports contain annotations such as membership information with length, taxonomy, and keyword statistics, extensive cross-references, and graphical display of domain and motif regions. The iProClass employs a modular architecture for scalability and extendibility, thereby providing a framework for integration of new data in a distributed networking environment. The database is implemented in Oracle object-relational database system, and searchable by both sequence (BLAST search and peptide match) and text (unique identifiers and combinations of text strings). The data integration in iProClass supports exploration of proteins and their comparative studies. In particular, interesting relationships between database objects, such as relationships among protein sequences and families, structures, and functions, can be revealed readily. Such knowledge is fundamental to the understanding of protein evolution, structure, and function, and crucial to functional genomic and proteomic research.

Recent Developments

iProClass is now updated biweekly, includes all protein sequences in PIR-PSD, Swiss-Prot, and TrEMBL databases, and has added links to about ten more databases in the past year.

Acknowledgements

Supported by NSF grants DBI-9974855 and DBI-0138188

Category   Protein Sequence Motifs

Go to the abstract in the NAR 2003 Database Issue.

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers