Changes

Using Bioconductor To Analyse Beadarray Data

4,495 bytes added, 15:34, 2 September 2009

no edit summary

[[Category: R]]

[[Category: Bioinformatics]]

[[Category: Bioconductor]]

==Software Requirements==

*R, get from [[http://cran.r-project.org/ CRAN]]*Bioconductor, get from [[http://www.bioconductor.org/download Bioconductor]]

*Bioconductor packages. Install as needed:

**beadarray

**limma

**annotation data for the array (normally illuminaMousev2BeadID.db)

*To install bioconductor packages use:

<pre>

source("http://www.bioconductor.org/biocLite.R")

controls = "ControlProbe.txt"

samplesheet = "Proj_54_12Aug09_WGGEX_SS_name.csv"

BSData = readBeadSummaryData(dataFile = data, qcFile= controls, sampleSheet=samplesheet~~, controlID="TargetID"~~)

</pre>

*You may need to alter either the ProbeID or ControlID to fit the illuminaprobe column from the sampleprobe or controlprobe datasets.

*This fits the data into the BSData dataframe. Phenotype data can be accessed by pData(BSData) and expression data can be accessed by exprs(BSData).

==Data Normalisation==

*Microarray data is typically quantile normalised and log2 transformed:

<pre>BSData.quantile = normaliseIllumina(BSData, method="quantile", transform="log2")</pre>

*To examine the effects of normalisation on the dataset use boxplots:

<pre>

boxplot(as.data.frame(log2(exprs(BSData))),las=2,outline=FALSE, ylab="Intensity (Log2 Scale)")

boxplot(as.data.frame(exprs(BSData.quantile)),las=2,outline=FALSE, ylab="Intensity (Log2 Scale)")

</pre>

*Save these boxplots as postscript files.

==Clustering Analysis==

*This analysis will generate a euclidean distance matrix then a cluster analysis of that matrix and will show the distribution between replicates. Ideally similar treatments will cluster together.

<pre>

d = dist(t(exprs(BSData.quantile)))

plot(hclust(d)

</pre>

==Differential Expression Analysis==

*Normalised data can be analysed using the limma package for statistical differences

*First define groups for each treatment. If a samplesheet was provided correctly and had this information:

<pre>samples = pData(BSData)$Sample_Group</pre>

*Otherwise define these groups manually in the order that they were entered, check by looking at pData(BSData)

<pre>samples = c("Control", "Control", "Treatment1", "Treatment1, "Treatment2"...)</pre>

*Next the groups are used to set up a statistical design:

<pre>

library(limma)

samples = as.factor(samples)

design = model.matrix(~0 + samples)

colnames(design) = levels(samples)

fit = lmFit(exprs(BSData.quantile), design)

</pre>

*Now set up contrast matrices to define how you want the data analyses. For example you may want to compare some treatments to a control, as well as between some treaments. See the limma user guide for more information about specific analyses. When defining the contrast matrix use the sample group names as defined above.

<pre>

cont.matrix = makeContrasts(Treatment1vsControl = Treatment1 - Control, Treatment2vsControl = Treatment2 - Control, Treatment1vsTreatment2 = Treatment1 - Treatment2, levels = design)

fit.cont = contrasts.fit(fit, cont.matrix)

ebFit = eBayes(fit.cont)

</pre>

===Generating a Venn Diagram for Differential Expression===

*First define a cutoff criteria for inclusion. One option is to use the decideTests function:

<pre>results = decideTests(ebFit)</pre>

*The relevant options are for method and adjust.method

**method

***default is "global", which allows for p-value comparasons

**adjust.method, this defines the false-discovery rate adjustment:

***default is "BH" for Benjami and Hochberg

***other options are "none", "fdr" (same as BH), "holm" and "BY"

*Now use that classification to generate the Venn Diagram. The following will include both up and downregulated genes and color the numbers accordingly:

<pre>vennDiagram(results, include="both", col=c("red","green")</pre>

==Annotation of Expression Sets and Fitted Data==

*To see all possible annotation criteria use:

<pre>

library(illuminaMousev2BeadID.db)

illuminaMousev2BeadID()

</pre>

*Normally you want to annotate with at least the gene symbol and gene name. Add other criteria as required

<pre>

ids = rownames(exprs(BSData))

GeneName = mget(ids, illuminaMousev2BeadIDGENENAME, ifnotfound = NA)

symbol = mget(ids, illuminaMousev2BeadIDSYMBOL, ifnotfound = NA)

anno = cbind(GeneSymbol = as.character(symbol), GeneName = as.character(GeneName))

</pre>

*To add this annotation to the data analysis file:

<pre>

ebFit$genes = anno

write.fit = (ebFit, file = "Filename.csv", adjust="BH")

</pre>

*This example includes a false discovery rate ("BH") adjusted p.value.

*This function writes a tab-delimited text file containing for each gene (1) the average log-intensity, (2) the log-ratios, (3) moderated t-statistics, (4) t-statistic P-values, (5) F-statistic if available, (6) F-statistic P-values if available, (7) classification if available.

*To add this annotation data to the expression set:

<pre>

data = exprs(BSData.quantile)

data = cbind(anno,data)

write.csv = (data, file = "Filename.csv")

</pre>

*Remember that the expression set is Log2 adjusted, so to look at absolute expression levels use 2^value.

← Older edit

Davebridges

Bureaucrat, administrator

906

edits

Changes

Using Bioconductor To Analyse Beadarray Data

Bridges Lab Protocols