Investigation of the effects of MAS5, RMA and GCRMA preprocessing methods on an affymetrix zebrafish genechip dataset using statistical and network parameters
Author
Öztürk, Ahmet Raşit
Advisor
Konu, Özlen
Date
2010Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
78
views
views
30
downloads
downloads
Abstract
Microarray data preprocessing is an important determinant of the accuracy and
repeatability of expression profiling studies. Recent studies have focused on
comparison of preprocessing methodologies using differential expression analysis of
spike-in datasets and qRT-PCR confirmations. Other approaches include comparison
of array-wise and probe-wise correlation and of selected gene network parameters.
However, zebrafish GeneChip datasets have not been used in such comparisons;
furthermore, detailed analysis of upregulated and downregulated gene sets with
respect to known network parameters are not well characterized across different
preprocessing methodologies. In this study we re-analyzed a public zebrafish hypoxia
microarray dataset (GSE4989; Marques et al. 2008) using MAS5, RMA, and gcRMA
methods. Comparisons were made in terms of differentially expressed gene sets and
defined network parameters, namely, clustering coefficient, degree distribution, and
betwenness centrality. Our findings indicated that gcRMA and RMA exhibited greater
similarity to each other in terms of differentially expressed genes, and network
parameters. In addition, the network analysis demonstrated that upregulated and
downregulated gene sets had distinct network structures; downregulated probesets had
greater clustering coefficients and degree distributions for positively correlated
probesets in all three preprocessing methods. However, gcRMA and RMA methods
accentuated this difference further than MAS5 did, suggesting that preprocessing
methods differ in their modulation of gene expression network structure. A selected
group of probesets that showed invariant network structure parameters across RMA,
gcRMA and MAS5 was determined and analyzed functionally for the zebrafish
hypoxia dataset. The results of this thesis suggest that preprocessing methods may
alter network structure of the datasets differentially with respect to upregulated and
downregulated gene sets. Accordingly, it might be beneficial to filter differentially
expressed genes that are robust to such network topology modulation to increase the
repeatability of gene sets.