Difference between revisions of "Statistical Analysis"

From Bridges Lab Protocols
Jump to: navigation, search
m (typo for comparison)
(added section about accounting for other effects within the Interaction section)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
 
[[Category:Statistics]]
 
[[Category:Statistics]]
 
[[Category:Math]]
 
[[Category:Math]]
 +
[[Category:R]]
  
 
This is based on using either Excel or R for the analysis.  To get data into R, the easiest way is to make the data in excel then import it into R with this command:
 
This is based on using either Excel or R for the analysis.  To get data into R, the easiest way is to make the data in excel then import it into R with this command:
Line 13: Line 14:
 
<pre>=TTEST(GROUPRANGE1, GROUPRANGE2, 2 ,3)</pre>
 
<pre>=TTEST(GROUPRANGE1, GROUPRANGE2, 2 ,3)</pre>
 
* Using R (for more details see http://www.statmethods.net/stats/ttest.html and http://stat.ethz.ch/R-manual/R-patched/library/stats/html/t.test.html):
 
* Using R (for more details see http://www.statmethods.net/stats/ttest.html and http://stat.ethz.ch/R-manual/R-patched/library/stats/html/t.test.html):
 +
If you have to lists of numbers, not in a table then you can test them directly:
 
<pre>
 
<pre>
 
ttest(group1, group2) #this compares two arrays of numbers
 
ttest(group1, group2) #this compares two arrays of numbers
 +
</pre>
 +
If you have a table, named '''dataset''' with columns names '''values''' and '''group'''.  The '''group''' column contains 2 different values (for example WT and KO).  If you have more than 2 values in the group then you need to go to [[#If you are testing one variable with more than two groups(One Way ANOVA)]].
 +
<pre>
 
ttest(values ~ group) #this compares the values column if there are two different variables in the group column.  It will not work if there are more than 2 groups
 
ttest(values ~ group) #this compares the values column if there are two different variables in the group column.  It will not work if there are more than 2 groups
 
</pre>
 
</pre>
Line 26: Line 31:
  
 
==Multiple Comparisons==
 
==Multiple Comparisons==
===If you are testing one variable with more than one value (One Way ANOVA)===
+
===If you are testing one variable with more than two groups(One Way ANOVA)===
 
Not if you are comparing 2 groups to control, but if you are comparing three groups internally.  For example this might be Normal Diet, High Fat Diet, High Protein Diet.  Note that if you do this with just two groups, the result should be the same as a t-test.
 
Not if you are comparing 2 groups to control, but if you are comparing three groups internally.  For example this might be Normal Diet, High Fat Diet, High Protein Diet.  Note that if you do this with just two groups, the result should be the same as a t-test.
 
* Using R, providing data is formatted in a dataframe named '''dataset''' with columns '''group''' and '''values''' (see http://stat.ethz.ch/R-manual/R-patched/library/stats/html/aov.html).  The first step is to do an [http://en.wikipedia.org/wiki/Analysis_of_variance ANOVA], then depending on if the results of this comparison are significant, move on to post-hoc tests such as [http://en.wikipedia.org/wiki/Tukey%27s_range_test TukeyHSD]:
 
* Using R, providing data is formatted in a dataframe named '''dataset''' with columns '''group''' and '''values''' (see http://stat.ethz.ch/R-manual/R-patched/library/stats/html/aov.html).  The first step is to do an [http://en.wikipedia.org/wiki/Analysis_of_variance ANOVA], then depending on if the results of this comparison are significant, move on to post-hoc tests such as [http://en.wikipedia.org/wiki/Tukey%27s_range_test TukeyHSD]:
Line 50: Line 55:
 
Residuals      8  4.59    0.57   
 
Residuals      8  4.59    0.57   
 
</pre>
 
</pre>
*First look at the genotype:diet column.  If this p-value is <0.05 then you have a significant interaction between genotype and diet.  If this is the case move on to [#No Main Effect] to separate out your groups.  If this value is >0.05 then there is no interaction, check if the p value for either of your groups is significant.  If it is (and there is no interaction) then go ahead to [#Main Efect].  In the above example there is no interaction, but there are two main effects:
+
*First look at the genotype:diet column.  If this p-value is <0.05 then you have a significant interaction between genotype and diet.  If this is the case move on to [[#Interaction]] to separate out your groups.  If this value is >0.05 then there is no interaction, check if the p value for either of your groups is significant.  If it is (and there is no interaction) then go ahead to [[#Main Efect]].  In the above example there is no interaction, but there are two main effects:
  
 
====Main Effect====
 
====Main Effect====
Line 59: Line 64:
 
This will generate all possible pairwise comparisons between your groups
 
This will generate all possible pairwise comparisons between your groups
  
====No Main Effect====
+
====Interaction====
 
If there is an interaction, you will need to separate out your groups and compare them separately.  For example this will subset out just "WT" genotypes and analyse those.
 
If there is an interaction, you will need to separate out your groups and compare them separately.  For example this will subset out just "WT" genotypes and analyse those.
 
<pre>
 
<pre>
Line 65: Line 70:
 
wt.fit <- aov(values ~ diet, data=dataset)
 
wt.fit <- aov(values ~ diet, data=dataset)
 
summary(wt.fit) #at this point you can go on to a TukeyHSD if you have >2 diet values and a significant ANOVA
 
summary(wt.fit) #at this point you can go on to a TukeyHSD if you have >2 diet values and a significant ANOVA
TukeyHS(wt.fit)
+
TukeyHSD(wt.fit)
 +
</pre>
 +
This will tell you, separate from the interaction, whether each pairwise comparison is significant.  You will have to repeat this by re-doing subset with each genotype and diet value as needed.  Alternatively you can also account for the effect of one variable on the significance of the other.  For example, if both diet and genotype interact, you may want to know what the effect of diet is, controlling for the effect of genotype.  This is done like this:
 +
<pre>
 +
TukeyHSD(fit.aov, "diet") #This does pairwise comparisons of diet, while accounting for the effect of genotype
 
</pre>
 
</pre>
This will tell you, separate from the interaction, whether each pairwise comparison is significant.  You will have to repeat this by re-doing subset with each genotype and diet value as needed.
 
  
 
==Correlations==
 
==Correlations==
coming later...
 
 
This is when two variables are correlated rather than one of them being discreet
 
This is when two variables are correlated rather than one of them being discreet

Latest revision as of 18:10, 26 June 2012


This is based on using either Excel or R for the analysis. To get data into R, the easiest way is to make the data in excel then import it into R with this command:

dataset <- read.csv("filename.csv") #generates a table called dataset with your values

Single Comparisons

Don't forget to adjust these p-values for multiple comparisons if you are doing more than one test.

If you have 2 groups you want to compare

Use a Student's T-Test

  • Using excel, for unpaired samples. Unless you are comparing paired samples (ie left leg insulin, right leg control) alwayse use this command. This is for a heteroscedastic unpaired test. This means that each group can have unequal variances. For more information see http://office.microsoft.com/en-us/excel-help/ttest-HP005209325.aspx
=TTEST(GROUPRANGE1, GROUPRANGE2, 2 ,3)

If you have to lists of numbers, not in a table then you can test them directly:

ttest(group1, group2) #this compares two arrays of numbers

If you have a table, named dataset with columns names values and group. The group column contains 2 different values (for example WT and KO). If you have more than 2 values in the group then you need to go to #If you are testing one variable with more than two groups(One Way ANOVA).

ttest(values ~ group) #this compares the values column if there are two different variables in the group column.  It will not work if there are more than 2 groups

If you have one group you want to compare to a number

For example you might want to test if a series of numbers are >1

ttest(group1, mu=1, alternative="greater") #this test the alternative hypothesis that the numbers in group1 are > 1

Multiple Comparisons

If you are testing one variable with more than two groups(One Way ANOVA)

Not if you are comparing 2 groups to control, but if you are comparing three groups internally. For example this might be Normal Diet, High Fat Diet, High Protein Diet. Note that if you do this with just two groups, the result should be the same as a t-test.

fit.aov <- aov(values ~ group, data=dataset) #generates an object names fit.aov
summary(fit.aov) #tests for significance of the ANOVA.  If this is less than your alpha (usually 0.05) stop and declare no significant difference.  If < 0.05 go on to next test.
TukeyHSD(fit.aov) #this does a Tukey HSD test

If you are testing two variables simultaneously

For example this could be the effects of diet and genotype. It does not matter how many variables are in each group. If one of the variables is not a factor (instead is a continuous variable like age) then look below for #Correlations:

  • Using R, providing data is formatted in a dataframe named dataset with columns genotype, diet and values (see AOV). The first step is to do an ANOVA, then depending the results, move on to the post-hoc tests such as TukeyHSD or separate your dataset:
fit.aov <- aov(values ~ genotype*diet, data=dataset) #generates an object names fit.aov
summary(fit.aov) #tests for significance of the ANOVA.  

At this stage you will get an output such as this:

              Df Sum Sq Mean Sq F value   Pr(>F)    
genotype       1  25.23   25.23  43.942 0.000164 ***
diet           1 141.45  141.45 246.363 2.71e-07 ***
genotype:diet  1   1.92    1.92   3.344 0.104853    
Residuals      8   4.59    0.57  
  • First look at the genotype:diet column. If this p-value is <0.05 then you have a significant interaction between genotype and diet. If this is the case move on to #Interaction to separate out your groups. If this value is >0.05 then there is no interaction, check if the p value for either of your groups is significant. If it is (and there is no interaction) then go ahead to #Main Efect. In the above example there is no interaction, but there are two main effects:

Main Effect

If there is no interaction, but there is a significant effect for one or both groups then you can go on to look at Post-hoc tests such as TukeyHSD

TukeyHSD(fit.aov)

This will generate all possible pairwise comparisons between your groups

Interaction

If there is an interaction, you will need to separate out your groups and compare them separately. For example this will subset out just "WT" genotypes and analyse those.

wt.dataset <- subset(dataset, genotype=="WT")
wt.fit <- aov(values ~ diet, data=dataset)
summary(wt.fit) #at this point you can go on to a TukeyHSD if you have >2 diet values and a significant ANOVA
TukeyHSD(wt.fit)

This will tell you, separate from the interaction, whether each pairwise comparison is significant. You will have to repeat this by re-doing subset with each genotype and diet value as needed. Alternatively you can also account for the effect of one variable on the significance of the other. For example, if both diet and genotype interact, you may want to know what the effect of diet is, controlling for the effect of genotype. This is done like this:

TukeyHSD(fit.aov, "diet") #This does pairwise comparisons of diet, while accounting for the effect of genotype

Correlations

This is when two variables are correlated rather than one of them being discreet