Investigation of the effects of MAS5, RMA and GCRMA preprocessing methods on an affymetrix zebrafish genechip dataset using statistical and network parameters
Öztürk, Ahmet Raşit
Item Usage Stats
Microarray data preprocessing is an important determinant of the accuracy and repeatability of expression profiling studies. Recent studies have focused on comparison of preprocessing methodologies using differential expression analysis of spike-in datasets and qRT-PCR confirmations. Other approaches include comparison of array-wise and probe-wise correlation and of selected gene network parameters. However, zebrafish GeneChip datasets have not been used in such comparisons; furthermore, detailed analysis of upregulated and downregulated gene sets with respect to known network parameters are not well characterized across different preprocessing methodologies. In this study we re-analyzed a public zebrafish hypoxia microarray dataset (GSE4989; Marques et al. 2008) using MAS5, RMA, and gcRMA methods. Comparisons were made in terms of differentially expressed gene sets and defined network parameters, namely, clustering coefficient, degree distribution, and betwenness centrality. Our findings indicated that gcRMA and RMA exhibited greater similarity to each other in terms of differentially expressed genes, and network parameters. In addition, the network analysis demonstrated that upregulated and downregulated gene sets had distinct network structures; downregulated probesets had greater clustering coefficients and degree distributions for positively correlated probesets in all three preprocessing methods. However, gcRMA and RMA methods accentuated this difference further than MAS5 did, suggesting that preprocessing methods differ in their modulation of gene expression network structure. A selected group of probesets that showed invariant network structure parameters across RMA, gcRMA and MAS5 was determined and analyzed functionally for the zebrafish hypoxia dataset. The results of this thesis suggest that preprocessing methods may alter network structure of the datasets differentially with respect to upregulated and downregulated gene sets. Accordingly, it might be beneficial to filter differentially expressed genes that are robust to such network topology modulation to increase the repeatability of gene sets.