Difference between revisions of "Sequence Alignment for Phylogenetic Analysis"
From Bridges Lab Protocols
Davebridges (Talk | contribs) (Added PhyloBayes information) |
Davebridges (Talk | contribs) (Added info about FASTA code) |
||
Line 1: | Line 1: | ||
== Locate Sequences and Generate FASTA File == | == Locate Sequences and Generate FASTA File == | ||
+ | === Generating a FASTA File=== | ||
+ | * FASTA format is described [https://zhanglab.ccmb.med.umich.edu/FASTA/ here], and [https://en.wikipedia.org/wiki/FASTA_format here] you need each sequence to start with a >SEQUENCENAME followed by a return and then the sequence, in this case the protein sequence. An example of a FASTA file would be: | ||
+ | |||
+ | <code> | ||
+ | >SEQUENCE_1 | ||
+ | |||
+ | MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG | ||
+ | |||
+ | LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK | ||
+ | |||
+ | IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL | ||
+ | |||
+ | MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL | ||
+ | |||
+ | >SEQUENCE_2 | ||
+ | |||
+ | SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI | ||
+ | |||
+ | ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH | ||
+ | </code> | ||
+ | |||
+ | * Save sequences in notepad, [https://notepad-plus-plus.org/ notepad++] or [https://www.sublimetext.com/ sublime] (not Word) as a <FILENAME>.fasta file. | ||
+ | * Sequence names cannot have spaces. Generally its better to name it as mm_Gdf15-NM_004864.4 where mm indicates mouse, Gdf15 is the gene name and NM indicates a [https://www.ncbi.nlm.nih.gov/refseq/ RefSeq mRNA]. If there are multiple mRNA's for the gene, name them | ||
== Create Multiple Sequence Alignment using CLUSTAL Omega == | == Create Multiple Sequence Alignment using CLUSTAL Omega == | ||
Line 8: | Line 31: | ||
* Generate phlogenetic trees with [http://megasun.bch.umontreal.ca/People/lartillot/www/download.html PhyloBayes] or Mr. Bayes [[Using Mr Bayes to For Phlyogenetic Analysis]]. | * Generate phlogenetic trees with [http://megasun.bch.umontreal.ca/People/lartillot/www/download.html PhyloBayes] or Mr. Bayes [[Using Mr Bayes to For Phlyogenetic Analysis]]. | ||
− | === PhyloBayes Analysis == | + | === PhyloBayes Analysis === |
* Mark in your notes the software version used. | * Mark in your notes the software version used. | ||
* The PhyloBayes manual can be found [http://megasun.bch.umontreal.ca/People/lartillot/www/phylobayes4.1.pdf here]. | * The PhyloBayes manual can be found [http://megasun.bch.umontreal.ca/People/lartillot/www/phylobayes4.1.pdf here]. |
Revision as of 13:07, 18 April 2019
Contents
Locate Sequences and Generate FASTA File
Generating a FASTA File
- FASTA format is described here, and here you need each sequence to start with a >SEQUENCENAME followed by a return and then the sequence, in this case the protein sequence. An example of a FASTA file would be:
>SEQUENCE_1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK
IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL
MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL
>SEQUENCE_2
SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI
ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH
- Save sequences in notepad, notepad++ or sublime (not Word) as a <FILENAME>.fasta file.
- Sequence names cannot have spaces. Generally its better to name it as mm_Gdf15-NM_004864.4 where mm indicates mouse, Gdf15 is the gene name and NM indicates a RefSeq mRNA. If there are multiple mRNA's for the gene, name them
Create Multiple Sequence Alignment using CLUSTAL Omega
- CLUSTAL Omega is available at https://www.ebi.ac.uk/Tools/msa/clustalo/
- Select output format NEXUS to import into Mr Bayes or PHYLIP format to import into PhyoBayes
- Generate phlogenetic trees with PhyloBayes or Mr. Bayes Using Mr Bayes to For Phlyogenetic Analysis.
PhyloBayes Analysis
- Mark in your notes the software version used.
- The PhyloBayes manual can be found here.