Difference between revisions of "Sequence Alignment for Phylogenetic Analysis"
From Bridges Lab Protocols
Davebridges (Talk | contribs) (Added info about FASTA code) |
Davebridges (Talk | contribs) (Added details about BLAST search) |
||
Line 1: | Line 1: | ||
== Locate Sequences and Generate FASTA File == | == Locate Sequences and Generate FASTA File == | ||
+ | * The easiest way to find sequences is to start with a seed sequence then do BLAST searches restricting to RefSeq and the species of interest. | ||
+ | * To find a seed sequence start with NCBI Gene, then find the first Refseq mRNA (should start with NM) then click on that and find the protein (should start with NP) | ||
+ | * Paste that into your FASTA file (see next section) and name accordingly. | ||
+ | * Paste that sequence or its NP id into [https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome NCBI Protein Blast]. | ||
+ | * Set the parameters to: | ||
+ | ** Database: Reference Proteins (refseq_protein) | ||
+ | ** Organism: Start with mouse (''Mus musculus'') or human (''Homo sapiens''), depending on your goal consider adding zebrafish (''Danio rerio''), ''Drosophila melanogaster'', chicken (''Gallus gallus'') and ''Caenorhabditis elegans'' | ||
=== Generating a FASTA File=== | === Generating a FASTA File=== | ||
Line 23: | Line 30: | ||
* Save sequences in notepad, [https://notepad-plus-plus.org/ notepad++] or [https://www.sublimetext.com/ sublime] (not Word) as a <FILENAME>.fasta file. | * Save sequences in notepad, [https://notepad-plus-plus.org/ notepad++] or [https://www.sublimetext.com/ sublime] (not Word) as a <FILENAME>.fasta file. | ||
− | * Sequence names cannot have spaces. Generally its better to name it as mm_Gdf15-NM_004864.4 where mm indicates mouse, Gdf15 is the gene name and NM indicates a [https://www.ncbi.nlm.nih.gov/refseq/ RefSeq mRNA]. If there are multiple mRNA's for the gene, name them | + | * Sequence names cannot have spaces. Generally its better to name it as '''mm_Gdf15-NM_004864.4''' where mm indicates mouse, Gdf15 is the gene name and NM indicates a [https://www.ncbi.nlm.nih.gov/refseq/ RefSeq mRNA]. If there are multiple mRNA's for the gene, name them |
== Create Multiple Sequence Alignment using CLUSTAL Omega == | == Create Multiple Sequence Alignment using CLUSTAL Omega == |
Latest revision as of 13:16, 18 April 2019
Contents
Locate Sequences and Generate FASTA File
- The easiest way to find sequences is to start with a seed sequence then do BLAST searches restricting to RefSeq and the species of interest.
- To find a seed sequence start with NCBI Gene, then find the first Refseq mRNA (should start with NM) then click on that and find the protein (should start with NP)
- Paste that into your FASTA file (see next section) and name accordingly.
- Paste that sequence or its NP id into NCBI Protein Blast.
- Set the parameters to:
- Database: Reference Proteins (refseq_protein)
- Organism: Start with mouse (Mus musculus) or human (Homo sapiens), depending on your goal consider adding zebrafish (Danio rerio), Drosophila melanogaster, chicken (Gallus gallus) and Caenorhabditis elegans
Generating a FASTA File
- FASTA format is described here, and here you need each sequence to start with a >SEQUENCENAME followed by a return and then the sequence, in this case the protein sequence. An example of a FASTA file would be:
>SEQUENCE_1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK
IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL
MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL
>SEQUENCE_2
SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI
ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH
- Save sequences in notepad, notepad++ or sublime (not Word) as a <FILENAME>.fasta file.
- Sequence names cannot have spaces. Generally its better to name it as mm_Gdf15-NM_004864.4 where mm indicates mouse, Gdf15 is the gene name and NM indicates a RefSeq mRNA. If there are multiple mRNA's for the gene, name them
Create Multiple Sequence Alignment using CLUSTAL Omega
- CLUSTAL Omega is available at https://www.ebi.ac.uk/Tools/msa/clustalo/
- Select output format NEXUS to import into Mr Bayes or PHYLIP format to import into PhyoBayes
- Generate phlogenetic trees with PhyloBayes or Mr. Bayes Using Mr Bayes to For Phlyogenetic Analysis.
PhyloBayes Analysis
- Mark in your notes the software version used.
- The PhyloBayes manual can be found here.