Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

Human Olfactory Receptor Data Exploratorium (HORDE)

http://bioinfo.weizmann.ac.il/HORDE/

Olender, T., Glusman, G., Feldmesser, E., Khen, M., Atarot, T., Safra, M., Lancet, D.

The Weizmann Institute of Science Rehovot, Israel

Contact   marilyn.safran@weizmann.ac.il


Database Description

Olfactory receptors (ORs) constitute the largest multi-gene family in multi-cellular organisms. Their evolutionary proliferation has been driven by the need to provide recognition capacity for millions of potential odorants with arbitrary chemical configuration. The Human Olfactory Receptor Data Exploratorium http://bioinfo.weizmann.ac.il/HORDE (HORDE), is a database of human OR genes. It serves as a tool for studying phylogenetics, evolution and functionality of the OR gene and protein super-family. HORDE extracts its data directly from human genome sequence resources, using a semi-automatic data-mining procedure. A collection of 900 genes and pseudogenes were published. Another 116 genes, reported here for the first time, resulted from re-running- the data-mining procedure and curating the results. These novel genes span all 17 families of the OR repertoire, where 16 of them opened new sub-families. Finally, to ensure completeness, we queried Celera's human genome assembly. Using 17 consensus sequences (one from each OR family) as input, we were able to detect 35 additional genes. ORs tend to be disposed in clusters, a phenomenon accounted for by an elaborate process of gene and cluster duplication, as well as gene conversion events. To have an overview of these evolutionary processes, HORDE supplies information on genomic localization as well as cluster organization of the human OR repertoire. Genomic localization was done on the basis of the Aug 01 freeze of the UCSC genome assembly using the BLAT server at UCSC. In parallel, we determined an accurate coordinate for each OR gene on the NCBI assembly (NT contigs) and UDB. The entire repertoire was analyzed to define OR clusters, using the criterion that two consecutive ORs that are more than 0.8Mb apart belong to different clusters. The C@M nomenclature of clusters is used, where C is the chromosome, and M the megabase coordinate on it. In the spirit of GeneCards, HORDE summarizes comprehensive information about each gene into a single OR card, which includes HUGO symbol, gene family and subfamily, aliases, cytogenetic band, closest mouse OR gene, genomic sources, nucleic and protein sequences, and localization and cluster membership, with links to GeneCards, UDB, UCSC, ORBD and NCBI. An example of a HORDE OR card is shown in the supplementary information. HORDE offers several tools for data-retrieval: I) textual, e.g. to access a specific OR card using a HUGO symbol or alias. II) a BLAST server – to search HORDE with a sequence, resulting in a report, linked to HORDE cards. III) group oriented - most suitable for studying the evolution and phylogenetics of this huge super-family. ORs that belong to the same family and/or subfamily, or are located on the same chromosome, can be queried. The results can be further analyzed on-line using CLUSTALW [23]. Finally, the site is equipped with other useful tools, such as conceptual translation, and recognition of transmembrane domains and CDR residues.

Category   Protein Databases

Go to the abstract in the NAR 2003 Database Issue.

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers