Using Bioconductor To Analyse Beadarray Data: Difference between revisions

Latest revision as of 15:34, 2 September 2009

Software Requirements

R, get from CRAN
Bioconductor, get from Bioconductor
Bioconductor packages. Install as needed:
- beadarray
- limma
- annotation data for the array (normally illuminaMousev2BeadID.db)
To install bioconductor packages use:

source("http://www.bioconductor.org/biocLite.R")
biocLite("PACKAGE")

Loading Data

At a minimum you need the Probe Profile data (normally a txt file).
For all R procedures first change directory to your working directory then next create a new script, and save all executed lines in that script file.
Load the beadarray library, indictate dataFile (required), sampleSheet (normally a xls or csv file) and control set (Control Probe, normally a txt file)

data = "FinalReport_SampleProbe.txt"
controls = "ControlProbe.txt"
samplesheet = "Proj_54_12Aug09_WGGEX_SS_name.csv"
BSData = readBeadSummaryData(dataFile = data, qcFile= controls, sampleSheet=samplesheet)

You may need to alter either the ProbeID or ControlID to fit the illuminaprobe column from the sampleprobe or controlprobe datasets.
This fits the data into the BSData dataframe. Phenotype data can be accessed by pData(BSData) and expression data can be accessed by exprs(BSData).

Data Normalisation

Microarray data is typically quantile normalised and log2 transformed:

BSData.quantile = normaliseIllumina(BSData, method="quantile", transform="log2")

To examine the effects of normalisation on the dataset use boxplots:

boxplot(as.data.frame(log2(exprs(BSData))),las=2,outline=FALSE, ylab="Intensity (Log2 Scale)")
boxplot(as.data.frame(exprs(BSData.quantile)),las=2,outline=FALSE, ylab="Intensity (Log2 Scale)")

Save these boxplots as postscript files.

Clustering Analysis

This analysis will generate a euclidean distance matrix then a cluster analysis of that matrix and will show the distribution between replicates. Ideally similar treatments will cluster together.

d = dist(t(exprs(BSData.quantile)))
plot(hclust(d)

Differential Expression Analysis

Normalised data can be analysed using the limma package for statistical differences
First define groups for each treatment. If a samplesheet was provided correctly and had this information:

samples = pData(BSData)$Sample_Group

Otherwise define these groups manually in the order that they were entered, check by looking at pData(BSData)

samples = c("Control", "Control", "Treatment1", "Treatment1, "Treatment2"...)

Next the groups are used to set up a statistical design:

library(limma)
samples = as.factor(samples)
design = model.matrix(~0 + samples)
colnames(design) = levels(samples)
fit = lmFit(exprs(BSData.quantile), design)

Now set up contrast matrices to define how you want the data analyses. For example you may want to compare some treatments to a control, as well as between some treaments. See the limma user guide for more information about specific analyses. When defining the contrast matrix use the sample group names as defined above.

cont.matrix = makeContrasts(Treatment1vsControl = Treatment1 - Control, Treatment2vsControl = Treatment2 - Control, Treatment1vsTreatment2 = Treatment1 - Treatment2, levels = design)
fit.cont = contrasts.fit(fit, cont.matrix)
ebFit = eBayes(fit.cont)

Generating a Venn Diagram for Differential Expression

First define a cutoff criteria for inclusion. One option is to use the decideTests function:

results = decideTests(ebFit)

The relevant options are for method and adjust.method
- method
  - default is "global", which allows for p-value comparasons
- adjust.method, this defines the false-discovery rate adjustment:
  - default is "BH" for Benjami and Hochberg
  - other options are "none", "fdr" (same as BH), "holm" and "BY"
Now use that classification to generate the Venn Diagram. The following will include both up and downregulated genes and color the numbers accordingly:

vennDiagram(results, include="both", col=c("red","green")

Annotation of Expression Sets and Fitted Data

To see all possible annotation criteria use:

library(illuminaMousev2BeadID.db)
illuminaMousev2BeadID()

Normally you want to annotate with at least the gene symbol and gene name. Add other criteria as required

ids = rownames(exprs(BSData))
GeneName = mget(ids, illuminaMousev2BeadIDGENENAME, ifnotfound = NA)
symbol = mget(ids, illuminaMousev2BeadIDSYMBOL, ifnotfound = NA)
anno = cbind(GeneSymbol = as.character(symbol), GeneName = as.character(GeneName))

To add this annotation to the data analysis file:

ebFit$genes = anno
write.fit = (ebFit, file = "Filename.csv", adjust="BH")

This example includes a false discovery rate ("BH") adjusted p.value.
This function writes a tab-delimited text file containing for each gene (1) the average log-intensity, (2) the log-ratios, (3) moderated t-statistics, (4) t-statistic P-values, (5) F-statistic if available, (6) F-statistic P-values if available, (7) classification if available.
To add this annotation data to the expression set:

data = exprs(BSData.quantile)
data = cbind(anno,data)
write.csv = (data, file = "Filename.csv")

Remember that the expression set is Log2 adjusted, so to look at absolute expression levels use 2^value.

@@ Line 1: / Line 1: @@
 [[Category: R]]
 [[Category: Bioinformatics]]
+[[Category: Bioconductor]]
 ==Software Requirements==
-*R, get from [[http://cran.r-project.org/ CRAN]]
+*R, get from [http://cran.r-project.org/ CRAN]
-*Bioconductor, get from [[http://www.bioconductor.org/download Bioconductor]]
+*Bioconductor, get from [http://www.bioconductor.org/download Bioconductor]
 *Bioconductor packages.  Install as needed:
 **beadarray
 **limma
+**annotation data for the array (normally illuminaMousev2BeadID.db)
+*To install bioconductor packages use:
 <pre>
 source("http://www.bioconductor.org/biocLite.R")
@@ Line 21: / Line 24: @@
 controls = "ControlProbe.txt"
 samplesheet = "Proj_54_12Aug09_WGGEX_SS_name.csv"
-BSData = readBeadSummaryData(dataFile = data, qcFile= controls, sampleSheet=samplesheet, controlID="TargetID")
+BSData = readBeadSummaryData(dataFile = data, qcFile= controls, sampleSheet=samplesheet)
 </pre>
+*You may need to alter either the ProbeID or ControlID to fit the illuminaprobe column from the sampleprobe or controlprobe datasets.
+*This fits the data into the BSData dataframe.  Phenotype data can be accessed by pData(BSData) and expression data can be accessed by exprs(BSData).
+==Data Normalisation==
+*Microarray data is typically quantile normalised and log2 transformed:
+<pre>BSData.quantile = normaliseIllumina(BSData, method="quantile", transform="log2")</pre>
+*To examine the effects of normalisation on the dataset use boxplots:
+<pre>
+boxplot(as.data.frame(log2(exprs(BSData))),las=2,outline=FALSE, ylab="Intensity (Log2 Scale)")
+boxplot(as.data.frame(exprs(BSData.quantile)),las=2,outline=FALSE, ylab="Intensity (Log2 Scale)")
+</pre>
+*Save these boxplots as postscript files.
+==Clustering Analysis==
+*This analysis will generate a euclidean distance matrix then a cluster analysis of that matrix and will show the distribution between replicates.  Ideally similar treatments will cluster together.
+<pre>
+d = dist(t(exprs(BSData.quantile)))
+plot(hclust(d)
+</pre>
+==Differential Expression Analysis==
+*Normalised data can be analysed using the limma package for statistical differences
+*First define groups for each treatment.  If a samplesheet was provided correctly and had this information:
+<pre>samples = pData(BSData)$Sample_Group</pre>
+*Otherwise define these groups manually in the order that they were entered, check by looking at pData(BSData)
+<pre>samples = c("Control", "Control", "Treatment1", "Treatment1, "Treatment2"...)</pre>
+*Next the groups are used to set up a statistical design:
+<pre>
+library(limma)
+samples = as.factor(samples)
+design = model.matrix(~0 + samples)
+colnames(design) = levels(samples)
+fit = lmFit(exprs(BSData.quantile), design)
+</pre>
+*Now set up contrast matrices to define how you want the data analyses. For example you may want to compare some treatments to a control, as well as between some treaments.  See the limma user guide for more information about specific analyses.  When defining the contrast matrix use the sample group names as defined above.
+<pre>
+cont.matrix = makeContrasts(Treatment1vsControl = Treatment1 - Control, Treatment2vsControl = Treatment2 - Control, Treatment1vsTreatment2 = Treatment1 - Treatment2, levels = design)
+fit.cont = contrasts.fit(fit, cont.matrix)
+ebFit = eBayes(fit.cont)
+</pre>
+===Generating a Venn Diagram for Differential Expression===
+*First define a cutoff criteria for inclusion.  One option is to use the decideTests function:
+<pre>results = decideTests(ebFit)</pre>
+*The relevant options are for method and adjust.method
+**method
+***default is "global", which allows for p-value comparasons
+**adjust.method, this defines the false-discovery rate adjustment:
+***default is "BH" for Benjami and Hochberg
+***other options are "none", "fdr" (same as BH), "holm" and "BY"
+*Now use that classification to generate the Venn Diagram.  The following will include both up and downregulated genes and color the numbers accordingly:
+<pre>vennDiagram(results, include="both", col=c("red","green")</pre>
+==Annotation of Expression Sets and Fitted Data==
+*To see all possible annotation criteria use:
+<pre>
+library(illuminaMousev2BeadID.db)
+illuminaMousev2BeadID()
+</pre>
+*Normally you want to annotate with at least the gene symbol and gene name.  Add other criteria as required
+<pre>
+ids = rownames(exprs(BSData))
+GeneName = mget(ids, illuminaMousev2BeadIDGENENAME, ifnotfound = NA)
+symbol = mget(ids, illuminaMousev2BeadIDSYMBOL, ifnotfound = NA)
+anno = cbind(GeneSymbol = as.character(symbol), GeneName = as.character(GeneName))
+</pre>
+*To add this annotation to the data analysis file:
+<pre>
+ebFit$genes = anno
+write.fit = (ebFit, file = "Filename.csv", adjust="BH")
+</pre>
+*This example includes a false discovery rate ("BH") adjusted p.value.
+*This function writes a tab-delimited text file containing for each gene (1) the average log-intensity, (2) the log-ratios, (3) moderated t-statistics, (4) t-statistic P-values, (5) F-statistic if available, (6) F-statistic P-values if available, (7) classification if available.
+*To add this annotation data to the expression set:
+<pre>
+data = exprs(BSData.quantile)
+data = cbind(anno,data)
+write.csv = (data, file = "Filename.csv")
+</pre>
+*Remember that the expression set is Log2 adjusted, so to look at absolute expression levels use 2^value.

Using Bioconductor To Analyse Beadarray Data: Difference between revisions

Latest revision as of 15:34, 2 September 2009

Contents

Software Requirements

Loading Data

Data Normalisation

Clustering Analysis

Differential Expression Analysis

Generating a Venn Diagram for Differential Expression

Annotation of Expression Sets and Fitted Data

Navigation menu

Using Bioconductor To Analyse Beadarray Data: Difference between revisions

Latest revision as of 15:34, 2 September 2009

Software Requirements

Loading Data

Data Normalisation

Clustering Analysis

Differential Expression Analysis

Generating a Venn Diagram for Differential Expression

Annotation of Expression Sets and Fitted Data

Navigation menu

Search