Difference between revisions of "Statistical Analysis"
Davebridges (Talk | contribs) (wrote most of initial stats page) |
(No difference)
|
Revision as of 17:30, 26 June 2012
This is based on using either Excel or R for the analysis. To get data into R, the easiest way is to make the data in excel then import it into R with this command:
dataset <- read.csv("filename.csv") #generates a table called dataset with your values
Single Comparasons
Don't forget to adjust these p-values for multiple comparasons if you are doing more than one test.
If you have 2 groups you want to compare
Use a Student's T-Test
- Using excel, for unpaired samples. Unless you are comparing paired samples (ie left leg insulin, right leg control) alwayse use this command. This is for a heteroscedastic unpaired test. This means that each group can have unequal variances. For more information see http://office.microsoft.com/en-us/excel-help/ttest-HP005209325.aspx
=TTEST(GROUPRANGE1, GROUPRANGE2, 2 ,3)
- Using R (for more details see http://www.statmethods.net/stats/ttest.html and http://stat.ethz.ch/R-manual/R-patched/library/stats/html/t.test.html):
ttest(group1, group2) #this compares two arrays of numbers ttest(values ~ group) #this compares the values column if there are two different variables in the group column. It will not work if there are more than 2 groups
If you have one group you want to compare to a number
For example you might want to test if a series of numbers are >1
ttest(group1, mu=1, alternative="greater") #this test the alternative hypothesis that the numbers in group1 are > 1
Multiple Comparisons
If you are testing one variable with more than one value (One Way ANOVA)
Not if you are comparing 2 groups to control, but if you are comparing three groups internally. For example this might be Normal Diet, High Fat Diet, High Protein Diet. Note that if you do this with just two groups, the result should be the same as a t-test.
- Using R, providing data is formatted in a dataframe named dataset with columns group and values (see http://stat.ethz.ch/R-manual/R-patched/library/stats/html/aov.html). The first step is to do an ANOVA, then depending on if the results of this comparason are significant, move on to post-hoc tests such as TukeyHSD:
fit.aov <- aov(values ~ group, data=dataset) #generates an object names fit.aov summary(fit.aov) #tests for significance of the ANOVA. If this is less than your alpha (usually 0.05) stop and declare no significant difference. If < 0.05 go on to next test. TukeyHSD(fit.aov) #this does a Tukey HSD test
If you are testing two variables simultaneously
For example this could be the effects of diet and genotype. It does not matter how many variables are in each group. If one of the variables is not a factor (instead is a continuous variable like age) then look below for #Correlations:
- Using R, providing data is formatted in a dataframe named dataset with columns genotype, diet and values (see AOV). The first step is to do an ANOVA, then depending the results, move on to the post-hoc tests such as TukeyHSD or separate your dataset:
fit.aov <- aov(values ~ genotype*diet, data=dataset) #generates an object names fit.aov summary(fit.aov) #tests for significance of the ANOVA.
At this stage you will get an output such as this:
Df Sum Sq Mean Sq F value Pr(>F) genotype 1 25.23 25.23 43.942 0.000164 *** diet 1 141.45 141.45 246.363 2.71e-07 *** genotype:diet 1 1.92 1.92 3.344 0.104853 Residuals 8 4.59 0.57
- First look at the genotype:diet column. If this p-value is <0.05 then you have a significant interaction between genotype and diet. If this is the case move on to [#No Main Effect] to separate out your groups. If this value is >0.05 then there is no interaction, check if the p value for either of your groups is significant. If it is (and there is no interaction) then go ahead to [#Main Efect]. In the above example there is no interaction, but there are two main effects:
Main Effect
If there is no interaction, but there is a significant effect for one or both groups then you can go on to look at Post-hoc tests such as TukeyHSD
TukeyHSD(fit.aov)
This will generate all possible pairwise comparasons between your groups
No Main Effect
If there is an interaction, you will need to separate out your groups and compare them separately. For example this will subset out just "WT" genotypes and analyse those.
wt.dataset <- subset(dataset, genotype=="WT") wt.fit <- aov(values ~ diet, data=dataset) summary(wt.fit) #at this point you can go on to a TukeyHSD if you have >2 diet values and a significant ANOVA TukeyHS(wt.fit)
This will tell you, separate from the interaction, whether each pairwise comparason is significant. You will have to repeat this by re-doing subset with each genotype and diet value as needed.
Correlations
coming later... This is when two variables are correlated rather than one of them being discreet