Protein Information Resource (PIR)

Compilation Paper

Categories List

Alphabetical List

Search Summary Papers

Protein Information Resource (PIR)

http://pir.georgetown.edu

Wu, C.H.¹, Huang, H.¹, Yeh, L.S.², Hu, Z.Z.², Barker, W.C.²

¹Dept. Biochemistry and Molecular Biology Georgetown University Medical Center Washington, DC 20057
²National Biomedical Research Foundation Georgetown University Medical Center Washington, DC 20057

Contact pirmail@nbrf.georgetown.edu

Database Description

The Protein Information Resource (PIR) is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery. The PIR continues to enhance the Protein Sequence Database (PSD), a major annotated protein database containing more than 283,000 sequences covering the entire taxonomic range. Central to PIR protein annotation is the family classification approach for sensitive identification, consistent annotation, and systematic detection of annotation errors. The PIR superfamily provides comprehensive, non-overlapping, and hierarchical clustering of sequences to reflect their evolutionary relationships. The ongoing effort on systematic superfamily curation defines signature domain architecture, categorizes regular and associate members, and designates representative and seed members. For high quality annotation and database interoperability, the PIR uses rule-based and classification-driven procedures based on controlled vocabulary and accepted ontologies, and includes evidence attribution to distinguish experimentally determined from predicted features. To increase the coverage of experimentally validated data, a bibliography mapping and submission system allows curators and scientists to map, categorize, and submit citations that describe the proteins. PIR also maintains and distributes NREF, a non-redundant reference database of protein sequences, and iProClass, an integrated database of protein family, function, and structure information. The PIR web site connects data mining and sequence analysis tools to underlying databases for information retrieval and knowledge discovery, with functionalities for interactive queries, combinations of sequence and annotation text searches, and sorting and visual exploration of search results. Sequence analysis options include similarity search, peptide and pattern match, hidden Markov model domain search, multiple sequence alignments, phylogentic tree generation, and graphical display of superfamily, domain, and motif relationships. The FTP site provides free download for PSD and NREF biweekly releases and other auxiliary databases and files. The PIR allows users to answer complex biological questions that may typically involve querying multiple sources and serves as a primary resource for exploration of proteins.

Recent Developments

The PIR-PSD database is now available in XML format (with DTD) and in open source relational database, MySQL (with database schema).

Acknowledgements

The PIR is supported by NIH/NLM grant P41 LM05798.

Category Protein Databases

Go to the abstract in the NAR 2003 Database Issue.