Sequence Alignment for Phylogenetic Analysis
From Bridges Lab Protocols
Contents
Locate Sequences and Generate FASTA File
Generating a FASTA File
- FASTA format is described here, and here you need each sequence to start with a >SEQUENCENAME followed by a return and then the sequence, in this case the protein sequence. An example of a FASTA file would be:
>SEQUENCE_1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK
IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL
MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL
>SEQUENCE_2
SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI
ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH
- Save sequences in notepad, notepad++ or sublime (not Word) as a <FILENAME>.fasta file.
- Sequence names cannot have spaces. Generally its better to name it as mm_Gdf15-NM_004864.4 where mm indicates mouse, Gdf15 is the gene name and NM indicates a RefSeq mRNA. If there are multiple mRNA's for the gene, name them
Create Multiple Sequence Alignment using CLUSTAL Omega
- CLUSTAL Omega is available at https://www.ebi.ac.uk/Tools/msa/clustalo/
- Select output format NEXUS to import into Mr Bayes or PHYLIP format to import into PhyoBayes
- Generate phlogenetic trees with PhyloBayes or Mr. Bayes Using Mr Bayes to For Phlyogenetic Analysis.
PhyloBayes Analysis
- Mark in your notes the software version used.
- The PhyloBayes manual can be found here.