Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

AllGenes

http://www.allgenes.org

Babenko, V., Brunk, B., Crabtree, J., Diskin, S., Fischer, S., Gan, Y., Li, L., Mazzarelli, J., McWeeney, S., Pinney, D., Pizarro, A., Schug, J., Stoeckert, C.

Department of Genetics and the Center for Bioinformatics, University of Pennsylvania.

Contact   sfischer@pcbi.upenn.edu


Database Description

AllGenes is a human and mouse gene index generated by assembly of publicly available EST and mRNA sequences into putative transcripts and integrated with public genomic sequence (NCBI Human Build 30 and MGSC Mouse Version 3). The putative transcript sequences are subjected to a suite of automated annotation, including: collation into conceptual genes; BLAT alignment onto the genome; protein prediction; protein function prediction (GO assignment); protein similarity and association of description; protein motif assignments; RH marker assignments; gene trap tag sequence assignments; anatomy profile; expression profile; and, a mapping to GeneCards, MGI and IMAGE. The transcripts are also curated, which provides gene name and synonym assignments, and confirms automated annotation. As of September 5, 2002, the gene index contains 229,805 human and 128,615 mouse non-singleton assemblies that cluster to 131,304 human and 90,829 mouse putative genes. 73% of the human and 73% of the mouse assemblies have been confirmed with high quality matches to the genome. 57% of the mouse assemblies have similarity to a known protein sequence and 21% have been assigned a GO (Gene Ontology Consortium) function. AllGenes is built on the GUS genomics database platform developed by our group. The GUS relational schema is an extensive genomics warehouse organized around the central dogma of biology (genes are transcribed to RNA which are translated to proteins). It enables powerful queries not availiable in many other genomics databases. The AllGenes web interface also uses GUS?s boolean query and query history facilities which allow users to compose sophisticated queries built from more basic queries. A sample query finds all mouse RNAs located on chromosome 7 that are expressed in the brain whose products are predicted to be transcription factors.

Recent Developments

On demand display of a putative transcript in the UCSC Genome Browser; display of pre-computed (consistent) BLAT alignments of transcripts against the genome; query by GeneCards? gene name and MGI geneID/gene name; BLAST results included in query history so that they can be composed into advanced queries; querying for gene trap sequences generated by the German Gene Trap Consortium (GGTC) and the Skarnes and Tessier-Lavigne labs.

Acknowledgements

This work was supported by grants the National Institutes of Health (R01HG01539) and the Department of Energy (DE-FG02-DOE00ER62893)

Category   Gene Identification and Structure

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers