Statistical Analysis

Revision as of 17:33, 26 June 2012 by Davebridges (Talk | contribs) (typo for comparison)

Revision as of 17:33, 26 June 2012 by Davebridges (Talk | contribs) (typo for comparison)


This is based on using either Excel or R for the analysis. To get data into R, the easiest way is to make the data in excel then import it into R with this command:

dataset <- read.csv("filename.csv") #generates a table called dataset with your values

Single Comparisons

Don't forget to adjust these p-values for multiple comparisons if you are doing more than one test.

If you have 2 groups you want to compare

Use a Student's T-Test

  • Using excel, for unpaired samples. Unless you are comparing paired samples (ie left leg insulin, right leg control) alwayse use this command. This is for a heteroscedastic unpaired test. This means that each group can have unequal variances. For more information see http://office.microsoft.com/en-us/excel-help/ttest-HP005209325.aspx
=TTEST(GROUPRANGE1, GROUPRANGE2, 2 ,3)
ttest(group1, group2) #this compares two arrays of numbers
ttest(values ~ group) #this compares the values column if there are two different variables in the group column.  It will not work if there are more than 2 groups

If you have one group you want to compare to a number

For example you might want to test if a series of numbers are >1

ttest(group1, mu=1, alternative="greater") #this test the alternative hypothesis that the numbers in group1 are > 1

Multiple Comparisons

If you are testing one variable with more than one value (One Way ANOVA)

Not if you are comparing 2 groups to control, but if you are comparing three groups internally. For example this might be Normal Diet, High Fat Diet, High Protein Diet. Note that if you do this with just two groups, the result should be the same as a t-test.

fit.aov <- aov(values ~ group, data=dataset) #generates an object names fit.aov
summary(fit.aov) #tests for significance of the ANOVA.  If this is less than your alpha (usually 0.05) stop and declare no significant difference.  If < 0.05 go on to next test.
TukeyHSD(fit.aov) #this does a Tukey HSD test

If you are testing two variables simultaneously

For example this could be the effects of diet and genotype. It does not matter how many variables are in each group. If one of the variables is not a factor (instead is a continuous variable like age) then look below for #Correlations:

  • Using R, providing data is formatted in a dataframe named dataset with columns genotype, diet and values (see AOV). The first step is to do an ANOVA, then depending the results, move on to the post-hoc tests such as TukeyHSD or separate your dataset:
fit.aov <- aov(values ~ genotype*diet, data=dataset) #generates an object names fit.aov
summary(fit.aov) #tests for significance of the ANOVA.  

At this stage you will get an output such as this:

              Df Sum Sq Mean Sq F value   Pr(>F)    
genotype       1  25.23   25.23  43.942 0.000164 ***
diet           1 141.45  141.45 246.363 2.71e-07 ***
genotype:diet  1   1.92    1.92   3.344 0.104853    
Residuals      8   4.59    0.57  
  • First look at the genotype:diet column. If this p-value is <0.05 then you have a significant interaction between genotype and diet. If this is the case move on to [#No Main Effect] to separate out your groups. If this value is >0.05 then there is no interaction, check if the p value for either of your groups is significant. If it is (and there is no interaction) then go ahead to [#Main Efect]. In the above example there is no interaction, but there are two main effects:

Main Effect

If there is no interaction, but there is a significant effect for one or both groups then you can go on to look at Post-hoc tests such as TukeyHSD

TukeyHSD(fit.aov)

This will generate all possible pairwise comparisons between your groups

No Main Effect

If there is an interaction, you will need to separate out your groups and compare them separately. For example this will subset out just "WT" genotypes and analyse those.

wt.dataset <- subset(dataset, genotype=="WT")
wt.fit <- aov(values ~ diet, data=dataset)
summary(wt.fit) #at this point you can go on to a TukeyHSD if you have >2 diet values and a significant ANOVA
TukeyHS(wt.fit)

This will tell you, separate from the interaction, whether each pairwise comparison is significant. You will have to repeat this by re-doing subset with each genotype and diet value as needed.

Correlations

coming later... This is when two variables are correlated rather than one of them being discreet