Introduction to Microarray Analysis

Chia sẻ bởi Nguyễn Xuân Vũ | Ngày 18/03/2024 | 6

Chia sẻ tài liệu: Introduction to Microarray Analysis thuộc Sinh học

Nội dung tài liệu:

Introduction to Microarray Analysis

Uma Chandran PhD, MSIS
Department of Biomedical Informatics
[email protected]
412-623-7841
12/17/08
My Background
Bioinformatics Analysis Service
UPCI
Department of Biomedical Informatics
Clinical Genomics Facility –
Runs expression, SNP and microRNA microarrays
Bioinformatics tightly integrated with data analysis
Expression, SNP, proteomic, integration of proteomic and genomic data
Workshop Objectives
Introduction to microarray analysis
Understand general principles
BRB Array Tools from NCI
HSLS also offers Array Assist, Genespring GX
Not an advanced analysis course – offered through DBMI and Biostatistics
Not a statistics course
Will discuss some statistical issues
Should consult literature, statistician to understand methods in detail



What is a microarray
Probes on chips
Detect target RNA in samples
High throughput
10000s of specific probes
Measure global gene expression
Glass beads, chips, slides


Bioinformatic approaches for analysis
Measuring 10000s of data points simultaneously
High dimensional data
10 Exp x 50K = 500K
How to find real differences over the noise
Statistical approaches

Bioinformatic approaches for analysis
Class Comparison
Which genes are up or down in tumors v normal, untreated v treated
Class Discovery
Within the tumor samples, are there subgroups that have a specific expression profile?
Class prediction, pathway analysis etc
Challenges in microarray analysis
Different platforms
Ilumina, Affymetrix, Agilent….
Many file types, many data formats
Need to learn platform dependent methods and software required
Analysis
How to get started?
Which methods? Which software? Many freely available tools. Some commercial
How to interpret results



Public databases
Many sources for public data – labs, consortia, government
Publications require that data files including raw files be made public
GEO –http://www.ncbi.nlm.nih.gov/geo/
Array Express - http://www.ebi.ac.uk/arrayexpress/#ae-main[0]

What tools to use
Gene Spring GX (HSLS)
Today’s exercise using
BRB arrays tools from NCI
Excel Interface
First install R statistical package from Bioconductor
Fairly easy to use if you don’t have access to commercial tools
Analysis is robust, display and graphics are minimal
Learn concepts using BRB
GEO data for Exercise
Raetz et al;
Characterization of T-ALL, T-LL, B-ALL
T-ALL and T-LL are morphologically indistinguishable
Are there expression differences?
Class comparison, class discovery, prediction
Files
C:DesktopscmcclassHSLSclass
Treeview, Cluster
Files may also be under C:

Hands on #1
Google GEO
Query Raetz et al;
Open cel files for class
.cel files
Affymetrix files
Has many files including .dat, .cel. chp
Need Affy software to open these files
Freely downloadable
Microarray analysis – Data Preprocessing
Objective
Convert image of thousands of signals to a a signal value for each gene or probe set
Multiple step
Image analysis
Background and noise subtraction
Normalization
Expression value for a gene or probe set
Image analysis and bkg, noise usually done by proprietary software
Gene 1 100
Gene 2 150
Gene 3 75
.
Gene10000 500
Normalization
Corrects for variation in hybridization etc
Assumption that no global change in gene expression
Without normalization
Intensity value for gene will be lower on Chip B
Many genes will appear to be downregulated when in reality they are not

Gene 1 100
Gene 2 150
Gene 3 75
.
Gene10000 500
50
75
32

250
Treated Control
How to normalize?
Many methods – Affy MAS5.0
Median scaling – median intensity for all chips should be the same
Known genes, house keeping, invariant genes
Quantile - RMA
Normalization method may differ depending on platform
Illumina – cubic spline
Affymetrix
Choose method
.cel to .chp file
Which method to choose?
Know the biology




BRB – Array tools
Website
Excel plug in; R and fortran
Import, choose correct format
From .cel files
Process using GCRMA or MAS5.0
Or directly from processed files
Attaches annotation
Create experiment labels
Ilumina Format
Hands on #2
Open Excel
Click on Array Tools
Look at data import options
Wizard
General format: Affy, non Affy
Affy data
Import .cel files or already normalized files
Various normalization options
Clicking OK will import data – don’t do this now because time and memory intensive
Already normalized files: look at MAS5.0.txt
For the next step, we will work with data that has already been imported into BRB



Hands on continued
Look at a normalized file MAS5.0.txt
Open this file in Excel
Absent, Present calls – unique to Affy
This already normalized file can also be imported into BRB
Quality control
Will not go into detail here because platform specific
Read the literature for platform
Open file lableled MAS5.0 report. Folder: cel file to import
Scale factor
% P calls
Should be at least ~40%
If RNA quality poor, then fewer present calls
Control probes
GAPDH has 3/, middle and 5’ probes
Ratio of 3’/5’
Other spike in probes for over cDNA to cRNA synthesis
For hybridization
Background/noise
All platforms will have control probes and quality metrics


BRB import
Spot filters
For Affy array, check off
For other arrays, could exclude if negative
Set to threshold
Normalization
If importing already normalized as in MAS5.0, check off
If RMA, already normalized, check off
Other methods that do not automatically normalized, choose a method here
Gene Filters
For now, leave 20%
Rest, check off

Hands on
Go to BRB analysis folder
BRB analysisProjectRaetz.xls file
This file shows what an imported project looks like
Experiment Descriptor is the class label for each experiment

Data Analysis
Part 2- Data analysis
Class discovery
Class comparison
Class prediction
Biological annotation
Pathway analysis

Class Discovery
Objective?
Can data tell us which classes are similar?
Are there subgroups?
Do T-ALL, T-LL, B-ALL fall into distinct groups?
Methods
Hierarchical clustering
K-means, SOM etc
These are Unsupervised Methods
Class Ids are not known to the algorithm
For example, does not know which one is cancer or non cancer
Do the expression values differentiate, does it discover new classes
Hands on – Class discovery
Multidimensional scaling in BRB
Raetz.xls
Choose defaults in BRB
Eisen’s Cluster
Filter
Accept
Adjust
Cluster
Different Clustering metrics will give diff results
Not a very robust method but very popular
Use as exploratory tool


Multidimensional scaling - MDS
Hands on - Hierarchical Clustering
Eisen Cluster and Treeview
Import data
Filter
Filter or not to filter, %P calls, SD etc
Accept filter
Adjust data
Log transform (important), center, normalize
Clustering
Cluster array or genes
Gene computationally intensive
Choose distance metric
.cdt file created
Open with Treeview

Class comparison – differential expression analysis
What genes are up regulated between control and test or multiple test conditions
Normal v tumor
Treated v untreated
Fold change
Not sufficient, need statistics
Statistics
t test, non-parametric, fdr,

Class comparison
Many analysis methods
May produce different results
Different underlying statistics and methods
t test
t test with permutations
SAM
Emperical bayesian
Depends on underlying assumptions about data
High throughput data with many rows and few samples
What is the distribution
Variance from gene to gene
Save raw data files to try different methods and compare results
Fold change does not take variation into account
Modified from madB
http://nciarray.nci.nih.gov/
Hypothesis Testing
Normal
Tumor
d
Null hypothesis
Alternative hypotheses
mean1
mean2
Statistical power
t test
Test hypothesis that the two means are not statistically different
Adding “confidence” to the fold change value
Mean
Standard deviation
Sample size
Calculates statistic
You choose cutoff or threshold
Give me gene list at a cutoff of p <0.05
95% confidence that the mean for that gene between control are treated are different
Experimental Design – Very important!!!
Sample size
How many samples in test and control
Will depend on many factors such as whether tissue culture or tissue sample
Power analysis

Replicates
Technical v biological
Biological replicates is more important for more heterogenous samples Need replicates for statistical analysis



To pool or not to pool
Depends on objective

Sample acquistion or extraction
Laser captered or gross dissected

All experimental steps from sample acquisition to hybridization
Microarray experiments are very expensive. So, plan experiments carefully


t tests
Results might look like
At a p<0.05, there are 300 genes up and 200 genes downregulated
95% confidence that the means of these genes in the two groups is different
At a p < 0.05, x genes up and y genes down with a fold change of at least 3.0

Multiple comparison
Microarrays have multiple comparison problem
p <= 0.05 says that 95% confidence means are different; therefore 5% due to chance
5% of 10000 is 500
500 genes are picked up by chance
Suppose t tests selects 1000 genes at a p of 0.05
500/1000 ;Approximately 50% of the genes will be false
Very high false discovery rate; need more confidence
How to correct?
Correction for multiple comparison
p value and a corrected p value

Corrections for multiple comparisons
Involve corrections to the p value so that the actual p value is higher
Bonferroni
Benjamin-Hochberg
Significance Analysis of Microarrays
Tusher et al. at Stanford
Hands on BRB

Class comparison
Choose comparison
Which tests are available?
P value cutoff
How is multiple correction testing being done?
Stringent p value, fdr
How is the output reported?
Can you figure out how many genes are regulated at different p values and different cutoffs
How to interpret results
Look at gene lists generated by our analysis v those generated in the paper



BRB – Hands on
Check Experiment desc file
Set up Class Comparison
T-ALL v T-LL
Choose p value
Random variance
Options
Save file
Run
BRB – Class Comparison
Output folder
Check the .html file
Look at results
P value
Fold change
Annotation
Click on annotation
Cut and paste save into Excel
Many studies, many methods
Dupuy and Simon, JNCI; 2007
How to manipulate Gene lists
Create gene lists
Venn Diagram
Can be done even though study done on different platforms
Compare MAS and RMA
Venn Diagram
Compare B-ALL v T-LL and T-LL v B-ALL

Venn Diagram
http://www.pangloss.com/seidel/Protocols/venn.cgi

http://ncrr.pnl.gov/software/VennDiagramPlotter.stm
Conclusion
Other analysis
Class prediction
Gene list from class comparison can be used in pathway analysis
HSLS pathway workshops on Ingenuity, DAVID, Pathway Architect
Future:
Integrate expression data with other data such as snp or microRNA
GEO has some data analysis features


* Một số tài liệu cũ có thể bị lỗi font khi hiển thị do dùng bộ mã không phải Unikey ...

Người chia sẻ: Nguyễn Xuân Vũ
Dung lượng: | Lượt tài: 1
Loại file:
Nguồn : Chưa rõ
(Tài liệu chưa được thẩm định)