Sequence Alignment for Phylogenetic Analysis

From Bridges Lab Protocols
Revision as of 13:07, 18 April 2019 by Davebridges (Talk | contribs) (Added info about FASTA code)

Jump to: navigation, search

Locate Sequences and Generate FASTA File

Generating a FASTA File

  • FASTA format is described here, and here you need each sequence to start with a >SEQUENCENAME followed by a return and then the sequence, in this case the protein sequence. An example of a FASTA file would be:

>SEQUENCE_1

MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG

LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK

IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL

MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL

>SEQUENCE_2

SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI

ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH

  • Save sequences in notepad, notepad++ or sublime (not Word) as a <FILENAME>.fasta file.
  • Sequence names cannot have spaces. Generally its better to name it as mm_Gdf15-NM_004864.4 where mm indicates mouse, Gdf15 is the gene name and NM indicates a RefSeq mRNA. If there are multiple mRNA's for the gene, name them

Create Multiple Sequence Alignment using CLUSTAL Omega

PhyloBayes Analysis

  • Mark in your notes the software version used.
  • The PhyloBayes manual can be found here.