Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

SEVENS

http://sevens.cbrc.jp

Suwa, M.1, Sato, T.2, Okouchi, I..2, Arita, M.1, Matsumoto, S.3, Tsutsumi, S.3, Aburatani, H.3, Asai, K.1, Akiya, Y.1

1Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology
2Center for Computational Science and Engineering, Fuji Research Institute Corporation
3Genome Science Division, Research Center for Advanced Science and Technology (RCAST), University of Tokyo,

Contact   m-suwa@aist.go.jp


Database Description

Seven-transmembrane-helix receptors (7-TMR), known as G-protein-coupled receptors [1], are important genes that work as the gateway of signal transudation induced by ligand binding. Recent progress in determination of human draft sequences [2,3] accelerates the comprehensive analysis of 7-TMR in whole human genome. We have developed an automated system for discovering 7-TMR genes in the whole human genome by three stages. (I) Gene prediction stage: From human genomic sequences, we generated all possible combinations of 6 open reading frames between initial and stop codon. Furthermore, we used the GeneDecoder [4] to translate the entire structure of multiple exon sequences. (II)Screening stage: The predicted genes passed an analyzing filter using items of BLASTP [5] for similarity search, HMMER [6] and in house program for assigning 7-TMR specific HMM. (PFAM domain [6] ), PROSITE patterns [7] and transmembrane helix (TMH) prediction tools [8]. By carefully assessing each component, two threshold settings, best specificity and best sensitivity, were determined. Then four confidence levels of the datasets were obtained by combining the best specificity and best sensitivity thresholds. (III) Quality improvement stage: Sequence redundancies were adjusted as follows. (1) Pair-wise alignment was applied to the candidate sequences in all-against-all fashion. (2) Sequences were linked together only when they hit for > 50 A.A residues with > 95% identity and shared the same chromosome, contig No., and overlapping genetic position. (3)The result of a transitive closure of the links was then regarded as one cluster. And one representative gene was selected from each cluster. Applying this system to human genome draft sequences (Feb, 2002), we collected 7-TMR genes in four confidence levels ranging from 827 candidates at the highest specificity to 2,109 at the highest sensitivity. These are summarized in SEVENS (http://sevens.cbrc.jp). This database intends to cover all g7-TMR universeh with not only the known sequences but also to use newly discovered sequence by computational gene finding program. This aspect is clearly different from previous databases [9-11]. The content search button navigates a page, where candidates are obtained. by the "AND" combination of (a) key word in nr.aa , (b) chromosome number, (c) Data Level, (d) predicted gene structure whether multi or single exon form, (e) predicted exon number, (f) sequence length, (g) E-value of sequence search against SWISSPROT or nr.aa and (h) whether the query have predicted TMH, PROSITE motifs and PFAM domains. This search lists up 7-TMR sequences that hit to selected contents. Then each sequence links to the sequence analysis page that includes the similarity search result, the mapping information of candidate genes to each contig. Protein Structure part shows the results of analysis, with TMH prediction, PROSITE motif pattern and PFAM domain in amino acid sequence. We are planning to maintain SEVENS with constant updates according to the version up of NCBI draft sequence. Additional information (such as expression data, chromosomal mapping data, tertiary structure data etc.) will be included in database with every update chance. We hope these datasets will be of value to researchers engaged in 7-TMR studies.

Recent Developments

We submitted the first version of SEVENS.

REFERENCES

1 Watson, S. & Arkinstall, S. (1994). The G-protein Linked Receptor Facts Book, Academic Press,@ London.
2 International Human Genome Sequencing Consortium. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921.
3 Venter, J. C., et al. (2001) The sequence of the human genome. Science. 291, 1304-1351.
4 Asai, K., Itou, K., Ueno, Y. and Yada, T. (1998) Recognition of human genes by stochastic parsing, Pacific Symposium on Biocomputing 98, pp. 228-239 (PSB98, 1998).
5 Altschul, S. F., et al.(1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402.
6 Bateman, A., Birney, E., Durbin, R., Eddy, S. R., Howe, K, L. and Sonnhammer, E. L. (2000) The Pfam protein familiesf database. Nucleic Acids Res. 28, 263-266.
7 Bairoch, A. (1992) Prosite: A dictionary of sites and patterns in proteins. Nucleic Acids Res. 20, 2013-2018.
8 Hirokawa, T., Boon-Chieng, S. and Mitaku, S. (1998) SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics 14, 378-379.
9 Horn, F., Vriend, G. & Cohen, F. E. (2001) Collecting and harvesting biological data: the GPCRDB and NucleaRDB information systems. Nucleic Acids Res. 29, 346-349.
10 Crasto, C., Marenco, L., Miller, P, Shepherd G.@(2002) Olfactory Receptor Database: a metadata-driven automated population from sources of gene and protein sequences. Nucl. Acids. Res. 30, 354-360.
11 Hodges PE, Carrico PM, Hogan JD, O'Neill KE, Owen JJ, Mangan M, Davis BP, Brooks JE, Garrels JI. (2002). Annotating the human proteome: the Human Proteome Survey Database (HumanPSDTM) and an in-depth target database for G protein-coupled receptors (GPCR-PDTM) from Incyte Genomics. Nucleic Acids Res 30. 137-141.

Category   Protein Databases

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers