Difference between revisions of "Using Bioconductor To Analyse Microarray Data"
From Bridges Lab Protocols
Davebridges (Talk | contribs) m |
Davebridges (Talk | contribs) m |
||
(20 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[[Category:R]] | [[Category:R]] | ||
[[Category:Bioinformatics]] | [[Category:Bioinformatics]] | ||
+ | [[Category: Bioconductor]] | ||
==Software Requirements== | ==Software Requirements== | ||
Line 7: | Line 8: | ||
*Bioconductor packages. Install as needed: | *Bioconductor packages. Install as needed: | ||
**Biobase | **Biobase | ||
− | **GEOquery - | + | **GEOquery - [http://www.bioconductor.org/packages/1.8/bioc/html/GEOquery.html] |
+ | **Limma | ||
<pre> | <pre> | ||
source("http://www.bioconductor.org/biocLite.R") | source("http://www.bioconductor.org/biocLite.R") | ||
Line 24: | Line 26: | ||
**measurements - '''GSM''' | **measurements - '''GSM''' | ||
**platforms - '''GPL''' | **platforms - '''GPL''' | ||
− | **series - '''GSE''' | + | **series - '''GSE''' |
<pre> | <pre> | ||
− | gds <- getGEO(" | + | gds <- getGEO("GDS2946") #load GDS162 dataset |
+ | Meta(gds) #show extracted meta data | ||
+ | table(gds)[1:10,] #show first ten rows of dataset | ||
+ | eset <- GDS2eSet(gds, do.log=TRUE) #convert to expression set, by default obtains annotation (GPL) data with log2 transformation | ||
+ | pData(eset) #phenotype data | ||
+ | sampleNames(eset) #sample names (GSM) | ||
+ | </pre> | ||
+ | *see [[http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/r/geo/ Peter Cock's Page]] or [[http://www.bioconductor.org/packages/1.8/bioc/html/GEOquery.html GEOquery Documentation]] for more information. | ||
+ | |||
+ | ==Microarray Analysis== | ||
+ | *set up design matrix. Use a different integer for each treatment group. The following example is for a contrast between the first seven groups and the last eight groups. For details on other design matrices see chapter 8 of [[http://www.bioconductor.org/packages/2.3/bioc/vignettes/limma/inst/doc/usersguide.pdf limma User Guide]] | ||
+ | <pre> | ||
+ | library(limma) #load limma package | ||
+ | library(affyPLM) #load affyPLM package | ||
+ | eset.norm <- normalize.ExpressionSet.quantiles(eset) #normalize expression set by quantile method | ||
+ | pData(eset) #to see phenotype annotation data | ||
+ | design=model.matrix(~ -1+factor(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,2) #set design matirx | ||
+ | colnames(design) <- c("obese","lean") # give names to the treatment groups | ||
+ | design #check the design matrix | ||
+ | fit <- lmFit(eset.norm, design) #Fit data to linear model | ||
+ | cont.matrix <- makeContrasts(Obese.vs.Lean=obese-lean, levels=design) | ||
+ | fit.cont <- contrasts.fit(fit, cont.matrix) | ||
+ | fit.cont.eb <- eBayes(fit.norm) #Empirical Bayes | ||
+ | write.csv(fit.cont.eb, file="filename.csv") #write to CSV file | ||
+ | </pre> | ||
+ | |||
+ | ==Clustering Analysis== | ||
+ | Bioconductor packages can calculate distance matrices: | ||
+ | <pre> | ||
+ | hc <- hclust(dist(t(exprs(eset.norm)))) | ||
+ | plot(hc) | ||
</pre> | </pre> | ||
− |
Latest revision as of 15:34, 2 September 2009
Contents
Software Requirements
- R, get from [CRAN]
- Bioconductor, get from [Bioconductor]
- Bioconductor packages. Install as needed:
- Biobase
- GEOquery - [1]
- Limma
source("http://www.bioconductor.org/biocLite.R") biocLite("PACKAGE")
Obtaining GEO Datasets
- Open a R terminal
- Load Biobase and GEOquery packages
libary(Biobase) library(GEOquery)
- Can load:
- datasets - GDS
- measurements - GSM
- platforms - GPL
- series - GSE
gds <- getGEO("GDS2946") #load GDS162 dataset Meta(gds) #show extracted meta data table(gds)[1:10,] #show first ten rows of dataset eset <- GDS2eSet(gds, do.log=TRUE) #convert to expression set, by default obtains annotation (GPL) data with log2 transformation pData(eset) #phenotype data sampleNames(eset) #sample names (GSM)
- see [Peter Cock's Page] or [GEOquery Documentation] for more information.
Microarray Analysis
- set up design matrix. Use a different integer for each treatment group. The following example is for a contrast between the first seven groups and the last eight groups. For details on other design matrices see chapter 8 of [limma User Guide]
library(limma) #load limma package library(affyPLM) #load affyPLM package eset.norm <- normalize.ExpressionSet.quantiles(eset) #normalize expression set by quantile method pData(eset) #to see phenotype annotation data design=model.matrix(~ -1+factor(c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,2) #set design matirx colnames(design) <- c("obese","lean") # give names to the treatment groups design #check the design matrix fit <- lmFit(eset.norm, design) #Fit data to linear model cont.matrix <- makeContrasts(Obese.vs.Lean=obese-lean, levels=design) fit.cont <- contrasts.fit(fit, cont.matrix) fit.cont.eb <- eBayes(fit.norm) #Empirical Bayes write.csv(fit.cont.eb, file="filename.csv") #write to CSV file
Clustering Analysis
Bioconductor packages can calculate distance matrices:
hc <- hclust(dist(t(exprs(eset.norm)))) plot(hc)