Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

GenProtEC

http://genprotec.mbl.edu

Riley, M., Serres, M.H., Liang, P., Sun, Y.

Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA 02543 USA

Contact   mriley@mbl.edu


Database Description

GenProtEC is a database of the chromosomally encoded genes and proteins of Escherichia coli K-12. The database contains information on 4401 genes encoding 4285 proteins and 116 RNAs. The proteins are presented as modular units where a module is defined as a domain that has at least 100 amino acid residues, carries a biological function, and has an independent evolutionary history. Protein modules were identified by Darwin analysis of E. coli protein sequences (1) using an alignment of at least 100 amino acids and a PAM (accepted point mutation) score of less than 200 as minimum requirements. Most of the modules are represented as individual proteins but some are part of multi-modular (compound) proteins, which are a result of gene fusion events during evolution (2). There is currently 287 such compound genes containing 2, 3 or 4 modules. The present number of modules is 4616. GeneProtEC (http://genprotec.mbl.edu) can be searched by bnumber (Blattner number), gene or protein name, gene product type, Enzyme Commission (EC) number, function description or physiological role. For any sequence-related pair, the position and length of the alignment is given as well as the percent of the protein aligned, the percent identical amino acids, and the PAM score. Most of the information in the database is available as downloadable text files. GenProtEC provides molecular function assignments for 3531 (80%) of the E. coli genes including experimentally characterized functions (1924 genes), phenotype associated functions (96 genes), phage associated functions (312 genes) and putative functions (1199 genes). The putative function assignments of E. coli K-12 were recently reviewed and updated (3). Literature references to the characterized, phenotype and some putative functions are provided. GenProEC also contains MultiFun, a recently developed system for classification of cellular functions in E. coli (4). MultiFun, is based on the previous classification system developed by Monica Riley (5) and has incorporated the transport classification system of Milton Saier (6). MultiFun contains 10 major categories; Metabolism, Information Transfer, Regulation, Transport, Cell Processes, Cell Structure, Location , Extrachromosomal Origin, DNA Site and Cryptic Genes, which are further subdivided in a hierarchical system. Cellular functions have been assigned to 66% of the E. coli gene products. Because a gene product may play many roles in the cell, multiple cell function assignments are allowed per gene product where appropriate. The average number of cellular roles assigned per gene product is currently 2-3. A correlation table between MultiFun categories and Gene Ontology (GO) Consortium categories (7) is also available. GenProtEC ?s protein modules have been grouped into sequence similar or paralogous groups where each member must recognize at least one partner of the group and none outside of the group using the requirements listed above. The current groups were generated in collaboration with Bernard Labedan (2). Over half of the protein modules have at least one E. coli partner and the sequence related groups ranges in size from two to 96. Most members of each group are related by function as well as by sequence.

Recent Developments

Incorporation of MultiFun, a classification schema for cellular function. Reannotation of putative function assignments.

REFERENCES

1. Gonnet, G.H., Cohen, M.A., Benner, S.A. (1992) Exhaustive matching of the entire protein sequence database. Scinece 256, 1443-1445.
2. Riley, M. and Labedan, B. (1997) Protein evolution viewed through Escherichia coli protein sequences: introducing the notion of a structural segment of homology, the module. J. Mol. Biol. 268, 857-868.
3. Serres, M.H., Gopal, S., Nahum, L.A., Liang, P., Gaasterland, T. and Riley, M. (2001) A functional update of the Escherichia coli K-12 genome. Genome Biol. 2, 0035.1-00035.7.
4. Serres, M.H. and Riley, M. (2000) MultiFun, a multifunctional classification scheme for Escherichia coli K-12 gene products. Microb. Comp. Genomics 5, 205-222.
5. Riley, M. (1993) Function of the gene products of Escherichia coli. Microbiol. Rev. 57, 862-952.
6. Saier, M.H. Jr. (2000) A functional-phylogenetic classification system for transmembrane solute transporters. Microbiol. Mol. Biol. Rev. 64, 354-411.
7. Ashburner, M., Ball, C.A., Blake, A.J., Botstein, D., Butler, H., and Cherry, J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T. et al. Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25-29.

Category   Protein Databases

Go to the abstract in the NAR Database Issue.

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers