Difference between revisions of "Using Bioconductor To Analyse Microarray Data"

Latest revision as of 15:34, 2 September 2009

Software Requirements

R, get from [CRAN]
Bioconductor, get from [Bioconductor]
Bioconductor packages. Install as needed:
- Biobase
- GEOquery - [1]
- Limma

source("http://www.bioconductor.org/biocLite.R")
biocLite("PACKAGE")

Obtaining GEO Datasets

Open a R terminal
Load Biobase and GEOquery packages

libary(Biobase)
library(GEOquery)

Can load:
- datasets - GDS
- measurements - GSM
- platforms - GPL
- series - GSE

gds <- getGEO("GDS2946")  #load GDS162 dataset
Meta(gds)  #show extracted meta data
table(gds)[1:10,]  #show first ten rows of dataset
eset <- GDS2eSet(gds, do.log=TRUE)  #convert to expression set, by default obtains annotation (GPL) data with log2 transformation
pData(eset)  #phenotype data
sampleNames(eset)  #sample names (GSM)

see [Peter Cock's Page] or [GEOquery Documentation] for more information.

Microarray Analysis

set up design matrix. Use a different integer for each treatment group. The following example is for a contrast between the first seven groups and the last eight groups. For details on other design matrices see chapter 8 of [limma User Guide]

library(limma)  #load limma package
library(affyPLM)  #load affyPLM package
eset.norm <- normalize.ExpressionSet.quantiles(eset)  #normalize expression set by quantile method
pData(eset)  #to see phenotype annotation data
design=model.matrix(~ -1+factor(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)  #set design matirx
colnames(design) <- c("obese","lean")  # give names to the treatment groups
design  #check the design matrix
fit <- lmFit(eset.norm, design)  #Fit data to linear model
cont.matrix <- makeContrasts(Obese.vs.Lean=obese-lean, levels=design)
fit.cont <- contrasts.fit(fit, cont.matrix)
fit.cont.eb <- eBayes(fit.norm)  #Empirical Bayes
write.csv(fit.cont.eb, file="filename.csv")  #write to CSV file

Clustering Analysis

Bioconductor packages can calculate distance matrices:

hc <- hclust(dist(t(exprs(eset.norm))))
plot(hc)

Difference between revisions of "Using Bioconductor To Analyse Microarray Data"

Latest revision as of 15:34, 2 September 2009

Contents

Software Requirements

Obtaining GEO Datasets

Microarray Analysis

Clustering Analysis

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 1: / Line 1: @@
 [[Category:R]]
 [[Category:Bioinformatics]]
+[[Category: Bioconductor]]
 ==Software Requirements==
@@ Line 8: / Line 9: @@
 **Biobase
 **GEOquery - [http://www.bioconductor.org/packages/1.8/bioc/html/GEOquery.html]
+**Limma
 <pre>
 source("http://www.bioconductor.org/biocLite.R")
@@ Line 26: / Line 28: @@
 **series - '''GSE'''
 <pre>
-gds <- getGEO("GDS162")  #load GDS162 dataset
+gds <- getGEO("GDS2946")  #load GDS162 dataset
 Meta(gds)  #show extracted meta data
 table(gds)[1:10,]  #show first ten rows of dataset
@@ Line 38: / Line 40: @@
 *set up design matrix.  Use a different integer for each treatment group.  The following example is for a contrast between the first seven groups and the last eight groups.  For details on other design matrices see chapter 8 of [[http://www.bioconductor.org/packages/2.3/bioc/vignettes/limma/inst/doc/usersguide.pdf limma User Guide]]
 <pre>
+library(limma)  #load limma package
+library(affyPLM)  #load affyPLM package
+eset.norm <- normalize.ExpressionSet.quantiles(eset)  #normalize expression set by quantile method
 pData(eset)  #to see phenotype annotation data
-design <- model.matrix(~(c(1,1,1,1,1,1,1,0,0,0,0,0,0,0,0)),eset)  #for four replicates of each treatment group,
+design=model.matrix(~ -1+factor(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)  #set design matirx
-colnames(design) <- c("resistant","sensitive")  # give names to the treatment groups
+colnames(design) <- c("obese","lean")  # give names to the treatment groups
 design  #check the design matrix
-fit <- lmFit(eset,design)
+fit <- lmFit(eset.norm, design)  #Fit data to linear model
-fit.eb <- eBayes(fit)
+cont.matrix <- makeContrasts(Obese.vs.Lean=obese-lean, levels=design)
+fit.cont <- contrasts.fit(fit, cont.matrix)
+fit.cont.eb <- eBayes(fit.norm)  #Empirical Bayes
+write.csv(fit.cont.eb, file="filename.csv")  #write to CSV file
+</pre>
+==Clustering Analysis==
+Bioconductor packages can calculate distance matrices:
+<pre>
+hc <- hclust(dist(t(exprs(eset.norm))))
+plot(hc)
 </pre>