Survival analysis and its applications in identifying genes, signatures, and pathways in human cancers
Cancer literature makes use of survival analyses focused on gene expression based on univariable or multivariable regression. However, there is still a need to understand whether a) incorporating exon or isoform information on expression would improve estimation of survival in cancer patients; and b) applying multivariable regression to gene sets would allow to obtain cancer-specific independent gene signatures in cancer. Differential usage of individual exons, as well as transcripts, are phenomena common to cancerous tissue when compared to normal tissue. The glioblastoma, GBM; liver cancer LIHC; stomach adenocarcinoma, STAD; and breast carcinoma, BRCA datasets from The Cancer Genome Atlas (TCGA) were investigated to identify individual exons and transcripts with transcriptome-wide impact and significance on survival. Aggregation analyses of exons revealed the important genes for survival in each dataset, including GNA12 in STAD, AKAP13 in LIHC and RBMXL1 and CARS1 in BRCA. GSEA was applied on gene sets formed from the exon-based analysis, revealing distinct enrichment profiles for each dataset as well as overlaps for certain GO terms and KEGG pathways. In the second focus of this thesis, multivariable analyses on gene sets whose expressions were obtained from UCSC Xena were used to create two Shiny applications: one for dataset-specific analyses and one for analyses across TCGA-PANCAN. The dataset specific SmulTCan application incorporates Cox regression analyses with expressions of input genes of the user’s choice. The SmulTCan application contains additional model validation, best subset selection and prognostic analyses. The ClusterHR application performs clustering analyses with Cox regression results, while it can also be used for bicluster identification and comparison. The axon-guidance ligand-receptor gene sets Slit-Robo, netrins-receptors and Semas-receptors were used for demonstrating the apps. Several hazard ratio signatures and best subsets that can differentiate between prognostic outcomes have been identified from the input gene sets, as well as ligand-receptor pairs with prognostic significance.