Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

PIR-NREF

http://pir.georgetown.edu/pirwww/pirnref.shtml

Huang, H.1, Suzek, B.E.2, Wu, C.H.1

1Dept. Biochemistry and Molecular Biology Georgetown University Medical Center Washington, DC 20057
2National Biomedical Research Foundation Georgetown University Medical Center Washington, DC 20057

Contact   pirmail@nbrf.georgetown.edu


Database Description

The PIR-NREF is a Non-redundant REFerence database that provides a timely and comprehensive collection of protein sequence data, keeping pace with the genome sequencing projects and containing source attribution and minimal redundancy. The database contains all sequences in PIR-PSD, Swiss-Prot, TrEMBL, RefSeq, GenPept, and PDB, totaling almost 1,000,000 entries currently. Identical sequences from the same source organism (species) reported in different databases are presented as a single NREF entry with protein IDs, accession numbers, and protein names from each underlying database, as well as amino acid sequence, taxonomy, and composite bibliographic data. Also listed are related sequences identified by all-against-all FASTA search, including identical sequences from different organisms, identical subsequences, and highly similar sequences (>=95% identity). NREF can be used for sequence searching and protein identification against the entire sequence collection or a subset of one or more genomes. The collective protein names, including synonyms, and the bibliographic information can be used to develop a protein name ontology. The different protein names assigned by different databases may help detect annotation errors, especially those resulting from large-scale genomic annotation. The web site supports both text and sequence searches. Direct report retrieval is based on sequence unique identifiers of the source databases. The text search matches protein and species names using combinations of text strings. Sequence searches include BLAST searches, peptide match, and pattern match for functional identification of query proteins or peptides. Species-based browsing and searching are supported for about 100 organisms, including over 70 complete genomes. NREF is updated biweekly and available for free downloading from our FTP site in XML format (data file) and FASTA format (sequence file).

Acknowledgements

Supported by NIH/NLM grant P41 LM05798

Category   Protein Databases

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers