WHO

3nd Regional Training Courses in Bioinformatics Applied to Tropical Diseaes in South East Asia
July 26-Aug 6, 2004,
Faculty of Science, Mahidol University, Bangkok, Thailand

ICGEB

Glossary

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z


A

C

Conserved synteny: two or more genes located on the same chromosome in different species regardless of gene order.

Conserved linkage: a group of genes conserved in synteny and order between species.

Contig: contiguous DNA sequence produced from joining overlapping raw sequence reads.

E

EST: expressed sequence tag generated by sequencing one end of a recombinant clone from a cDNA library. ESTs are single-pass reads and therefore prone to contain sequence errors.

F

Finished sequence: complete sequence of a genome with no gaps and an accuracy of > 99.9%.

Full shotgun coverage: genome coverage in random raw sequence required to produce finished sequence, usually 8-10 fold (‘8-10X’).

G

Genome coverage: average number of times a nucleotide is represented by a high-quality base in random raw sequence.

GSS: genome survey sequence generated by sequencing one end of a recombinant clone from a genomic DNA library. The genomic DNA library can in some instances be enriched for the presence of coding regions, for example through use of mung bean nuclease digestion of genomic DNA prior to cloning.

H

Homologs: genes related to each other by descent from a common ancestral DNA sequence.

O

ORF: open reading frame, stretches of codons in the same reading frame uninterrupted by STOP codons and calculated from a six-frame translation of DNA sequence. Comparative genomics terms

Orthologs: homologous genes generated by speciation, i.e related to each other by vertical descent.

P

Paired reads: sequence reads determined from both ends of a cloned insert in a recombinant clone.

Paralogs: homologous genes generated by duplication, i.e related to each other by horizontal descent.

Partial shotgun coverage: typically 3-6X random coverage of a genome which produces sequence data of sufficient quality to enable gene identification but which is not sufficient to produce a finished genome sequence

R

Raw sequence: unassembled sequence reads produced from sequencing of inserts from individual recombinant clones of a genomic DNA library.

S

Scaffold: a group of ordered and orientated contigs known to be physically linked to each other by paired read information.

Singleton: single sequence read that cannot be joined (‘assembled’) into a contig.

SNP: single nucleotide polymorphism

Synteny: Strictly, this refers to the presence of two or more genes on the same chromosome in the same species. However, it is used frequently to mean conservation of orthologous gene location between species, i.e the presence of orthologous genes that are syntenic in one species and also located on the same chromosome in a second species, without regard to gene order.


Last updated: Thursday, August 5, 2004 10:47 (Thailand time)