Nuclc. Acids. Res. OUP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH ARTICLES TABLE OF CONTENTS
Compilation Paper
Categories List
Alphabetical List
Search Summary Papers

ALFRED

http://alfred.med.yale.edu

Rajeevan, H.1, Osier, M.V.1, Cheung, K.-H.2, Deng, H.1, Druskin, L.2, Heinzen, R.1, Kidd, J. R.1, Stein, S.1, Pakstis, A. J.1, Tosches, N. P.2, Yen, C. -C.1, Miller, P. L.2, Kidd, K. K.1

1Department of Genetics, Yale University School of Medicine, 333 Cedar Street, PO Box 208005, New Haven, CT 06520-8005, USA
2Center for Medical Informatics, Yale University School of Medicine, 333 Cedar Street, PO Box 208009, New Haven, CT 06520-8009, USA

Contact   kidd@biomed.med.yale.edu


Database Description

ALFRED (the ALelle FREquency Database) is designed to store and disseminate frequencies of alleles at human autosomal polymorphic sites for multiple defined population samples, primarily for the population genetics and molecular anthropology communities. The focus is on allele frequencies of normal, common DNA variants, i.e., polymorphisms, in samples of anthropologically defined populations. Links are provided to molecular databases for precise definitions and locations of the polymorphisms and to anthropologic databases for linguistic, ethnographic, and demographic information on the populations sampled. References to publications are associated with the frequencies and linked to PubMed whenever possible. Many polymorphisms have linkes to low-tech protocols suitable for small laboratories engaged in anthropologic research. ALFRED has information on 672 polymorphic sites typed on at least one population sample and 288 populations typed for at least one polymorphism. ALFRED is accessible from http://alfred.med.yale.edu

Recent Developments

Recent Developments Effort on the past year has been concentrated in three areas: (1)development of curatorial tools, (2) implementation of a more robust and sustainable Oracle database, and (3) increasing the quantity and quality of data. Curatorial Tools Unseen by the user, these software tools provide integrity checks and allow the curators to more efficiently annotate entries and add web links to appropriate entries in other database. When entering data into ALFRED, our curators are using a controlled vocabulary including official locus names and symbols as a way to achieve data quality. We have implemented automatic checks to enforce this. Integrity checks of the data already in ALFRED are run periodically to ensure data accuracy and consistency. Conversion to Oracle We have converted the entire system from its current Access database implementation to Oracle to allow for the considerable expansion of the data in the coming year. The Oracle version is currently being tested and the queries optimized. It should be the active system by early 2003. Data Expansion and Development With new unpublished allele frequency data from Kidd Lab and frequency data from published articles the number of frequency tables (one sample typed for one site) increased from 3561 (September, 2001) to 6301 (September, 2002). The staff are systematically extracting gene frequency data on DNA sequence variants from recent issues of major human genetics and physical anthropology journals . To ensure the high quality of our data, we perform data curation in an iterative and systematic way before importing data into ALFRED. Descriptions of a large number of loci, polymorphisms, alleles, populations and samples from the literature are nearing completion and will be loaded with the associated allele frequency tables. Additional web links to literature and public databases such as GenBank, PubMed, GDB, OMIM, LocusLink, and Ethnologue have been added for the existing entries. Data Accessibility A Document Type Definition (DTD) has been developed for importing and exporting ALFRED data in XML format. All information in ALFRED can now be put into a single compressed “data dump” file in the declared XML format. The data dump can include either all relevant information (including descriptions) or only the data relevant to statistical analyses. These files are available on request by email to http://alfred.med.yale.edu/alfred/feedback.asp Web viewing enhancement New graphical overviews of the database contents have been implemented to direct users to the more extensive “comparative” aspects of the database. A “sites per population” web page (http://alfred.med.yale.edu/alfred/sitesperpop_graph.asp) shows graphically (and numerically) the number of allele frequency tables for each population. Currently the maximum is 385 for the Han. A “populations per site” web page (http://alfred.med.yale.edu/alfred/popspersite_graph.asp) similarly represents the number of alleles frequency tables for each polymorphic site. Currently the maximum is 49 for the CD4 pentanucleotide repeat polymorphism. Acknowledgements Ongoing funding of ALFRED is provided by NSF grant BCS0096588. Initial funding for ALFRED was provided by NSF grant SBR-9632509 and USPHS grants P01GM57672, R01AA09379, and T15LM07056.

Acknowledgements

Initial funding for ALFRED was provided by NSF grant SBR-9632509 and USPHS grants P01GM57672, R01AA09379, and T15LM07056. Ongoing funding of ALFRED is provided by NSF grant BCS0096588.

Category   Mutation Databases

Go to the abstract in the NAR 2003 Database Issue.

 

Compilation Paper
Categories List
Alphabetical List
Search Summary Papers