Untitled

Genetic Distance

Install Mega and ClustalX (accept the defaults).

1. Copy the sequences from this link into a file on your computer

2. These sequences are already aligned. They form a subset of the sequences from a larger alignment of HIV-1 coding sequence fragments that were downloaded from the Los Alamos database. Some of the gaps in this alignment were introduced by sequences that do not form part of this subset (can you see that this is the case just by looking at the alignment?). You would like to redo the alignment. You can use ClustalX to do this. Click on the option [reset all gaps before alignment].

3. Start Mega. Mega requires a specific sequence format. Under the file menu of Mega you can find an option that allows you to convert files from other formats to Mega format. Use this to convert the clustal format alignment you have just created to Mega format.

4. Use Mega to estimate pairwise distances and standard errors of these distances for the sequences in your alignment. Try each nucleotide model separately, save the output and compare between models.

5. Estimate the distance using the models that include a gamma correction for rate variation between sites. Note: Mega does not provide a method to estimate the parameter a that determines the shape of the gamma distribution. In order to make sensible use of these corrected distances you would need to estimate this parameter using another programme (such as PAUP). Nonetheless, experiment with different values of the parameter a to see what effect this has on your distance estimates (start with very low values of a - around 0.1 - and then try a higher value - say 10, or 20).

6. Compare the distances you get when you use a gamma correction with a shape parameter = 1000 (this is an extreme example). How do these distances compare to the distances you obtained with no gamma correction?

7. Explore the remaining options in the distance calculations using Mega. Mega allows you to calculate pairwise distances based on synonymous and non-synonymous sites separately and to work out distances based on different coding positions.

8. You can define groups of sequences within Mega and work out intra and inter group mean distances. If you have time define two groups and work out these distances. You can also work out intra and inter population sequence diversities. Look in the appropriate help file to see how these are defined.