Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

RiceGAAS

http://ricegaas.dna.affrc.go.jp/

Sakata, K.1, Numa, H.1, Nagamura, Y.1, Antonio, B.A.2, Idonuma, A.2, Shimizu, Y.3, Horiuchi, I.3, Matsumoto, T.1, Sasaki, T.1, Higo, K.1

1National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602, Japan
2Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries, 446-1 Ippaizuka, Kamiyokoba, Tsukuba, Ibaraki 305-0854, Japan
3Mitsubishi Space Software Co. Ltd, 1-17-15 Sengen, Tsukuba, Ibaraki 305-0047, Japan

Contact   ksakata@nias.affrc.go.jp


Database Description

An extensive effort of the International Rice Genome Sequencing Project (IRGSP) has resulted in rapid accumulation of genome sequence and the entire rice genome (430 Mb) is expected to be completely sequenced at the quality level of HTG (high-throughput genomic sequence) phase 2 or higher by the end of 2002. This requires a high-throughput annotation scheme to extract biologically useful and timely information from the sequence data on a regular basis. An automated annotation system and database called Rice Genome Automated Annotation System (RiceGAAS) has been developed to execute a reliable and up-to-date analysis of the genome sequence as well as to store and retrieve the results of annotation (1). The system has the following functional features: (i) daily collection of rice genome sequences from GenBank; (ii) execution of 15 analysis programs such as gene prediction and homology searches; (iii) integration of results from various analyses and automatic interpretation of coding regions using an algortithm based on the concept of combining the results of several gene prediction programs with homology searches to achieve a more accurate prediction; (iv) re-execution of analysis, integration and automatic interpretation with the latest entries in reference databases; (v) integrated visualization of the stored data using web-based graphical view; and (vi) data submission mechanism that allows public users to perform fully automated annotation of their own sequences within 24 h. The system can be accessed at http://RiceGAAS.dna.affrc.go.jp/. As of September 2002, about 80,000 genes have been automatically predicted from 400 Mb sequences. The general statistics of the predictions such as the number of analyzed bases, predicted genes, predicted amino acid sequence motifs and predicted LTR (long terminal repeat) pairs are summarized and updated on the top page. All of the collected and analyzed bacterial artificial chromosome (BAC) and P1-derived artificial chromosome (PAC) clones are summarized and grouped into specific chromosomes where the clones are anchored (http://ricegaas.dna.affrc.go.jp/rga-bin/showallclones.pl). Furthermore, the position of each clone on the chromosome can be graphically viewed by clicking the corresponding button on the tabulated list such as http://ricegaas.dna.affrc.go.jp/rga-bin/SelectContig.pl?ChroNO=chro01. We are continuously evaluating the algorithm by comparing genes automatically predicted by RiceGAAS with manually predicted genes in which the results of gene prediction programs and homology search are edited to obtain the most plausible gene model. The evaluation indicates that about 74% of automatically and manually predicted genes are the same at nucleotide level. The details of comparison are available at http://ricegaas.dna.affrc.go.jp/rga-bin/col_accur.pl.

Recent Developments

Recently, we have added FGENESH (monocot version, http://www.softberry.com/berry.phtml?topic=gfind) among the gene prediction programs in the system. We have also improved our original prediction program RiceHMM (http://rgp.dna.affrc.go.jp/RiceHMM/index.html) which is trained for rice to increase sensitivity (2). The algorithm for the automatic integration and interpretation of coding regions has been modified to incorporate FGENESH and improved RiceHMM. The annotation of previously analyzed clones has also been updated to incorporate such algorithmic improvements. The amount of stored data has increased as a result of the acceleration in rice genome sequencing efforts of IRGSP. All rice genome sequences collected by the system can be downloaded by the end of October 2002.

Acknowledgements

This work was supported by grants from the Ministry of Agriculture, Forestry and Fisheries of Japan (Rice Genome Project SY-1101 and GS-1302).

REFERENCES

1 Sakata,K., Nagamura,Y., Numa,H., Antonio,B.A., Nagasaki,H., Idonuma,A., Watanabe,W., Shimizu,Y., Horiuchi,I., Matsumoto,T., Sasaki,T. and Higo,K. (2002) RiceGAAS: an automated annotation system and database for rice genome sequence. Nucleic Acids Res., 30: 98-102.
2 Sakata,K., Nagasaki,H., Idonuma,A., Watanabe,W., Kise,M. and Sasaki,T. (2000) RiceHMM: gene domain prediction program for rice genome sequence. Abstracts of 4th Annual Conference on Computational Genomics. p. 31.

Category   Genomic Databases

Go to the abstract in the NAR 2002 Database Issue.

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers