Using Bioconductor To Analyse Beadarray Data

From Bridges Lab Protocols
Revision as of 18:04, 21 August 2009 by Davebrid (talk | contribs)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.


Software Requirements

  • R, get from [CRAN]
  • Bioconductor, get from [Bioconductor]
  • Bioconductor packages. Install as needed:
    • beadarray
    • limma
    • annotation data for the array (normally illuminaMousev2BeadID.db)
source("http://www.bioconductor.org/biocLite.R")
biocLite("PACKAGE")

Loading Data

  • At a minimum you need the Probe Profile data (normally a txt file).
  • For all R procedures first change directory to your working directory then next create a new script, and save all executed lines in that script file.
  • Load the beadarray library, indictate dataFile (required), sampleSheet (normally a xls or csv file) and control set (Control Probe, normally a txt file)
data = "FinalReport_SampleProbe.txt"
controls = "ControlProbe.txt"
samplesheet = "Proj_54_12Aug09_WGGEX_SS_name.csv"
BSData = readBeadSummaryData(dataFile = data, qcFile= controls, sampleSheet=samplesheet)
  • You may need to alter either the ProbeID or ControlID to fit the illuminaprobe column from the sampleprobe or controlprobe datasets.
  • This fits the data into the BSData dataframe. Phenotype data can be accessed by pData(BSData) and expression data can be accessed by exprs(BSData).

Data Normalisation

  • Microarray data is typically quantile normalised and log2 transformed:
BSData.quantile = normaliseIllumina(BSData, method="quantile", transform="log2")
  • To examine the effects of normalisation on the dataset use boxplots:
boxplot(as.data.frame(log2(exprs(BSData))),las=2,outline=FALSE, ylab="Intensity (Log2 Scale)")
boxplot(as.data.frame(exprs(BSData.quantile)),las=2,outline=FALSE, ylab="Intensity (Log2 Scale)")
  • Save these boxplots as postscript files.

Clustering Analysis

  • This analysis will generate a euclidean distance matrix then a cluster analysis of that matrix and will show the distribution between replicates. Ideally similar treatments will cluster together.
d = dist(t(exprs(BSData.quantile)))
plot(hclust(d)

Differential Expression Analysis

  • Normalised data can be analysed using the limma package for statistical differences
  • First define groups for each treatment. If a samplesheet was provided correctly and had this information:
samples = pData(BSData)$Sample_Group
  • Otherwise define these groups manually in the order that they were entered (check by looking at pData(BSData)
samples = c("Control", "Control", "Treatment1", "Treatment1, "Treatment2"...)
  • Next the groups are used to set up a statistical design:
library(limma)
samples = as.factor(samples)
design = model.matrix(~0 + samples)
colnames(design) = levels(samples)
fit = lmFit(exprs(BSData.quantile), design)
  • Now set up contrast matrices to define how you want the data analyses. For example you may want to compare some treatments to a control, as well as between some treaments. See the limma user guide for more information about specific analyses. When defining the contrast matrix use the sample group names as defined above.
cont.matrix = makeContrasts(Treatment1vsControl = Treatment1 - Control, Treatment2vsControl = Treatment2 - Control, Treatment1vsTreatment2 = Treatment1 - Treatment2, levels = design)
fit.cont = contrasts.fit(fit, cont.matrix)
ebFit = eBayes(fit.cont)

Generating a Venn Diagram for Differential Expression

  • First define a cutoff criteria for inclusion. One option is to use the decideTests function:
results = decideTests(ebFit)
  • The relevant options are for method and adjust.method
    • method
      • default is "global", which allows for p-value comparasons
    • adjust.method, this defines the false-discovery rate adjustment:
      • default is "BH" for Benjami and Hochberg
      • other options are "none", "fdr" (same as BH), "holm" and "BY"
  • Now use that classification to generate the Venn Diagram. The following will include both up and downregulated genes and color the numbers accordingly:
vennDiagram(results, include="both", col=c("red","green")

Annotation of Expression Sets and Fitted Data

  • To see all possible annotation criteria use:
library(illuminaMousev2BeadID.db)
illuminaMousev2BeadID()
  • Normally you want to annotate with at least the gene symbol and gene name. Add other criteria as required
ids = rownames(exprs(BSData))
GeneName = mget(ids, illuminaMousev2BeadIDGENENAME, ifnotfound = NA)
symbol = mget(ids, illuminaMousev2BeadIDSYMBOL, ifnotfound = NA)
anno = cbind(GeneSymbol = as.character(symbol), GeneName = as.character(GeneName))
  • To add this annotation to the data analysis file:
ebFit$genes = anno
write.fit = (ebFit, file = "Filename.csv", adjust="BH")
  • This example includes a false discovery rate ("BH") adjusted p.value.
  • This function writes a tab-delimited text file containing for each gene (1) the average log-intensity, (2) the log-ratios, (3) moderated t-statistics, (4) t-statistic P-values, (5) F-statistic if available, (6) F-statistic P-values if available, (7) classification if available.
  • To add this annotation data to the expression set:
data = exprs(BSData.quantile)
data = cbind(anno,data)
write.csv = (data, file = "Filename.csv")