Multiple Sequence Alignment
 
 


Multiple Sequence alignments and alignment parameters

1. Copy the Plasmodium falciparum erythrocyte membrane proteins (pfemp) sequence fragments (which are in fasta format) from this link and paste to a new file on your computer.

2. Copy sequence fragments from kinetoplastid membrane proteins from Leishmania (which are also in fasta format) to a new file from this link.

3. Using a text editor (or the sequence alignment editor BioEdit if you prefer) align the two sets of sequences by hand. Which alignment was harder and why?

4. Align the sequences using ClustalX and the default parameters and compare with the alignments you produced by hand. Which alignments do you think are better - yours or Clustal's?

5. Test the effect of parameter adjustments on the alignments of the fragments of both proteins by repeating the alignments using ClustalX but setting large (e.g. 90) gap opening penalty and small (or even zero) gap extension penalty in the multiple alignment parameters and then large gap extension penalty and small gap opening penalty.  Try also using small gap opening penalties (<1) and small gap extension penalties (< 0.1).  Notice any changes in the blocks at the bottom of the screen showing quality of alignment.
 



Profile alignments  1. 

Imagine that you are an HIV researcher. You have spent the last several months sequencing a portion of GAG gene from 158 samples which you have collected from patients in your country. You have aligned these sequences using ClustalX and carefully edited the alignment with BioEdit. This is a large number of sequences to work with and it has taken you quite some time to produce the alignment but you are very proud of your efforts. A new sample turns up and after sequencing it you would like to add it to your alignment but you really don't want to start from scratch. You can use the profile alignment method in ClustalX to add the additional sequence to the alignment without disturbing the rest of the alignment as folows:


1. Copy the HIV GAG sequence alignment from here and save them in a file on your computer. This is a sequence alignment in fasta format.

2. Copy the single HIV GAG sequence from here and save it on your computer. This is the sequence that you want to include in the alignment above (but without disturbing the alignment).

3. Open ClustalX and switch to Profile Alignment Mode.

4. Load the HIV alignment as profile one (under the file menu).

5. Load the extra sequence as profile two.

6. To add the extra sequence to the alignment select the 'align sequences to profile one' option under the alignment menu.  Take note of the name of the file where the results will be stored.

Note: ClustalX does not provide a good way of viewing the alignments produced using the profile mode. You should open the alignment produced in BioEdit.



Profile alignments  2.

Over coffee with a collegue you discover that she has sequenced the complete HIV gag gene from a small number of gag sequences. You would like to make an alignment of the entire set of sequences in order to see where on her sequences your sequences match. You realise that, strictly speaking, you are breaking one of the 'rules of thumb' of sequence alignment because her sequences are far longer than yours and normally to produce the best alignment you try to align sequences of comparable length. Nonetheless, just to get a rough idea of how the two alignments compare, you decide to align her alignment against your existing alignment. You know that this can be done using ClustalX as follows:

1. Your collegues sequences can be found here
. Save them to a file.

2. With ClustalX again in the Profile Alignment Mode load your large alignment as profile 1 again.

3. Load your collegues alignment as profile 2

4. Select the align profile 1 to profile 2 option under the alignment menu.

Again you should open the result in BioEdit to see the alignment that has been produced. Since your collegue has sequenced the entire GAG gene can you see which portion of the gene you have sequenced?
 


If you have time repeat the alignment of the variable surface glycoprotein sequences using the QAlign programme http://www.ridom.de/qalign/.  You will have to download and install this yourself. This part of the exercise is not essential and most people are not expected to have time to complete it.