Difference between revisions of "Using Bioconductor To Analyse Microarray Data"

From Bridges Lab Protocols
Jump to: navigation, search
m
m
 
(25 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
[[Category:R]]
 
[[Category:R]]
 
[[Category:Bioinformatics]]
 
[[Category:Bioinformatics]]
 +
[[Category: Bioconductor]]
  
 
==Software Requirements==
 
==Software Requirements==
Line 7: Line 8:
 
*Bioconductor packages.  Install as needed:
 
*Bioconductor packages.  Install as needed:
 
**Biobase
 
**Biobase
**GEOquery
+
**GEOquery - [http://www.bioconductor.org/packages/1.8/bioc/html/GEOquery.html]
 +
**Limma
 
<pre>
 
<pre>
 
source("http://www.bioconductor.org/biocLite.R")
 
source("http://www.bioconductor.org/biocLite.R")
Line 19: Line 21:
 
libary(Biobase)
 
libary(Biobase)
 
library(GEOquery)
 
library(GEOquery)
 +
</pre>
 +
*Can load:
 +
**datasets - '''GDS'''
 +
**measurements - '''GSM'''
 +
**platforms - '''GPL'''
 +
**series - '''GSE'''
 +
<pre>
 +
gds <- getGEO("GDS2946")  #load GDS162 dataset
 +
Meta(gds)  #show extracted meta data
 +
table(gds)[1:10,]  #show first ten rows of dataset
 +
eset <- GDS2eSet(gds, do.log=TRUE)  #convert to expression set, by default obtains annotation (GPL) data with log2 transformation
 +
pData(eset)  #phenotype data
 +
sampleNames(eset)  #sample names (GSM)
 +
</pre>
 +
*see [[http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/r/geo/ Peter Cock's Page]] or [[http://www.bioconductor.org/packages/1.8/bioc/html/GEOquery.html GEOquery Documentation]] for more information.
 +
 +
==Microarray Analysis==
 +
*set up design matrix.  Use a different integer for each treatment group.  The following example is for a contrast between the first seven groups and the last eight groups.  For details on other design matrices see chapter 8 of [[http://www.bioconductor.org/packages/2.3/bioc/vignettes/limma/inst/doc/usersguide.pdf limma User Guide]]
 +
<pre>
 +
library(limma)  #load limma package
 +
library(affyPLM)  #load affyPLM package
 +
eset.norm <- normalize.ExpressionSet.quantiles(eset)  #normalize expression set by quantile method
 +
pData(eset)  #to see phenotype annotation data
 +
design=model.matrix(~ -1+factor(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)  #set design matirx
 +
colnames(design) <- c("obese","lean")  # give names to the treatment groups
 +
design  #check the design matrix
 +
fit <- lmFit(eset.norm, design)  #Fit data to linear model
 +
cont.matrix <- makeContrasts(Obese.vs.Lean=obese-lean, levels=design)
 +
fit.cont <- contrasts.fit(fit, cont.matrix)
 +
fit.cont.eb <- eBayes(fit.norm)  #Empirical Bayes
 +
write.csv(fit.cont.eb, file="filename.csv")  #write to CSV file
 +
</pre>
 +
 +
==Clustering Analysis==
 +
Bioconductor packages can calculate distance matrices:
 +
<pre>
 +
hc <- hclust(dist(t(exprs(eset.norm))))
 +
plot(hc)
 
</pre>
 
</pre>

Latest revision as of 15:34, 2 September 2009


Software Requirements

  • R, get from [CRAN]
  • Bioconductor, get from [Bioconductor]
  • Bioconductor packages. Install as needed:
    • Biobase
    • GEOquery - [1]
    • Limma
source("http://www.bioconductor.org/biocLite.R")
biocLite("PACKAGE")

Obtaining GEO Datasets

  • Open a R terminal
  • Load Biobase and GEOquery packages
libary(Biobase)
library(GEOquery)
  • Can load:
    • datasets - GDS
    • measurements - GSM
    • platforms - GPL
    • series - GSE
gds <- getGEO("GDS2946")  #load GDS162 dataset
Meta(gds)  #show extracted meta data
table(gds)[1:10,]  #show first ten rows of dataset
eset <- GDS2eSet(gds, do.log=TRUE)  #convert to expression set, by default obtains annotation (GPL) data with log2 transformation
pData(eset)  #phenotype data
sampleNames(eset)  #sample names (GSM)

Microarray Analysis

  • set up design matrix. Use a different integer for each treatment group. The following example is for a contrast between the first seven groups and the last eight groups. For details on other design matrices see chapter 8 of [limma User Guide]
library(limma)  #load limma package
library(affyPLM)  #load affyPLM package
eset.norm <- normalize.ExpressionSet.quantiles(eset)  #normalize expression set by quantile method
pData(eset)  #to see phenotype annotation data
design=model.matrix(~ -1+factor(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)  #set design matirx
colnames(design) <- c("obese","lean")  # give names to the treatment groups
design  #check the design matrix
fit <- lmFit(eset.norm, design)  #Fit data to linear model
cont.matrix <- makeContrasts(Obese.vs.Lean=obese-lean, levels=design)
fit.cont <- contrasts.fit(fit, cont.matrix)
fit.cont.eb <- eBayes(fit.norm)  #Empirical Bayes
write.csv(fit.cont.eb, file="filename.csv")  #write to CSV file

Clustering Analysis

Bioconductor packages can calculate distance matrices:

hc <- hclust(dist(t(exprs(eset.norm))))
plot(hc)