Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

ProtoMap

http://protomap.cornell.edu

Contact   golan@gimmel.stanford.edu


Database Description

Within the last year the ProtoMap database has gone through major changes. Since the last release (release 2.0) that was reported in the 2000 databases issue of NAR, the conceptual and algorithmic framework has changed and several new features were added. Here is a summary of the changes: The source databases - The ProtoMap system now classifies not only the SWISSPROT database (as in release 2.0) but also TrEMBL and TrEMBL-new. This is a major increase in the size of the database (from approximately 80,000 proteins to more than 365,000 proteins). As in previous releases, online classification of new sequences is also available. The clustering algorithm - The clustering algorithm was modified to better deal with suspicious connections, and avoid false clusters. Several components of the computational procedure were modified. First, the procedure for creating the core clusters was modified to ensure that the automatically generated seed clusters are stable and biologically correct. In the new procedure, the seed clusters correspond to semi-cliques. Those are subsets of vertices that form complete (or almost complete) subgraphs, in which each vertex is connected to all (or almost all) other vertices. Second, the procedure that is applied iteratively to merge clusters was modified, and additional statistical tests are used to prevent false merges. The classification - As part of a major conceptual change, the ProtoMap classification is now a soft classification, meaning, each protein can be classified to more than one cluster (as should be expected by the multi-domain trait of proteins) with different qualities. The terms 'core member' and 'satellite member' are introduced. A protein is a core member of one main cluster, and can be a satellite member of several clusters. Some proteins are not core members of any cluster and are only satellite members of clusters. Such proteins are, for example, very long proteins that are expected to contain multiple domains, or very repetitive and highly degenerate proteins. New features - The new ProtoMap release offers multiple alignments for more than 10,000 clusters. These are based on a new PSI-BLAST based approach for multiple alignment. Also available are links to more than 100,000 three-dimensional models at atom resolution, for protein sequences without a known structure, through the BioSpace website. Future plans - Additional major changes are planned for the near future: First, frequent updates with respect to known structures will be posted on the ProtoMap website. Second, based on its new soft classification characteristic, the ProtoMap site will offer an extensive collection of domains. Graphic tools will be added to present domains in a user friendly interface. Finally, a new graphic tool (java based) will be introduce to visualize local maps of the protein space and to actually navigate in the protein space.

Category   Protein Sequence Motifs

Go to the abstract in the NAR 2000 Database Issue.

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers