PDB-REPRDB

Database Description

PDB-REPRDB is a database of representative protein chains from the Protein Data Bank (PDB) (1). Criteria used to select representatives are: a) quality of atomic co-ordinate data, b) sequence uniqueness, and c) conformation uniqueness. The system of PDB-REPRDB (2,3) is designed so that the user may obtain a quick selection of representative chains from PDB. System operation can be divided into two stages: 1) calculation of similarities between all pairs of protein chains, 2) classification of those chains and selection of representative chains according to priorities specified by the user. Similarities are calculated beforehand, and selection of representative chains can be dynamically configured according to the user's requirement. The WWW interface provides a large degree of freedom in setting parameters, such as cut-off scores of sequence and structural similarity. Users can eliminate unnecessary chains from the PDB chain list by setting threshold values and can also change priority of nine factors: resolution, R-factor, number of chain breaks, ratio of non-standard amino acid residues, ratio of residues with only Ca co-ordinates, ratio of residues with only backbone co-ordinates, number of residues, whether mutant or wild, and whether complex or not. Moreover, users can select whether or not to include entries by NMR experimental techniques and whether or not to include membrane protein chains by setting a flag of NMR and membrane, respectively. The membrane flag was introduced into the system in September 2002. One can obtain a representative list and classification data of protein chains from the system. The representative list includes information about factors mentioned above, EC number, and compound in PDB. The ‘ID’ sections are hyper-linked to data on classified groups, and a graphic representation of three-dimensional structure can be displayed using the RasMol program by clicking on ‘*’. Furthermore, ‘Ecnumber’ sections are hyper-linked to the Ligand chemical database for enzyme reactions (LIGAND) (4). The system is available at the new PDB-REPRDB WWW server (http://www.cbrc.jp/pdbreprdb/).

Recent Developments

In recent years, the number of entries including membrane protein structures in the PDB has increased rapidly with the determination of numbers of membrane protein structures because of improved X-ray crystallography, NMR, and electronic microscope experimental techniques. Since protein structure researchers deal separately with globular proteins and membrane proteins, membrane proteins should be separate from globular proteins. One recent improvement is the introduction of a new factor of elimination, which does not include membrane protein chains, for selecting representatives on the top page of the present PDB-REPRDB. The membrane protein chain, including a domain defined by the SCOP (5), is eliminated by this factor. Moreover, the PDB-REPRDB for membrane protein chains, which selects representatives from membrane protein chains including a membrane domain defined by the SCOP, has been developed for researchers for membrane proteins. The PDB-REPRDB system for membrane protein chains is available at the same site. The current database of PDB-REPRDB includes 33,368 protein chains from 16,682 PDB entries (1 September, 2001), from which are excluded (a) DNA and RNA data, (b) theoretically modeled data, (c) short chains (l<40 residues), or (d) data with non-standard amino acid residues at all residues. The current database for the membrane includes 551 protein chains, which include membrane domains in SCOP database of release 1.59 (15 May 2002).

Acknowledgements

We thank Dr. Susumu Goto and Prof. Minoru Kanehisa at the Institute for Chemical Research, Kyoto University, for their support.

REFERENCES

1. Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235-242.
2. Noguchi,T., Onizuka,K., Ando,M., Matsuda,H. and Akiyama,Y. (2000) Quick selection of representative protein chain sets based on customizable requirements. Bioinformatics, 16, 520-526.
3. Noguchi,T., Matsuda,H. and Akiyama,Y. (2001) PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB). Nucleic Acids Res., 29, 219-220.
4. Goto,S., Okuno,Y., Hattori,M., Nishioka,T. and Kanehisa,M. (2002) LIGAND: database of chemical compound and reactions in biological pathways. Nucleic Acids Res., 30, 402-404.
5. Conte,L.L., Brenner,S.E., Hubbard,T.J.P., Chothia,C. and Murzin,A.G. (2002) SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res., 30, 264-267.