A weakly supervised clustering method for cancer subgroup identification

buir.advisorOkan, Öznur Taştan
dc.contributor.authorÖzçelik, Duygu
dc.date.accessioned2016-08-25T10:22:14Z
dc.date.available2016-08-25T10:22:14Z
dc.date.copyright2016-07
dc.date.issued2016-07
dc.date.submitted2016-08-18
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (M.S.): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2016.en_US
dc.descriptionIncludes bibliographical references (leaves 86-95).en_US
dc.description.abstractEach cancer type is a heteregonous disease consisting of subtypes, which may be distinguished at the molecular, histopathological, and clinical level. Identifying the patient subtypes of a cancer type is critically important as the unique molecular characteristics of a particular patient subgroup reveal distinct disease states and opens up possibilities for targeted therapeutic regimens. Traditionally, unsupervised clustering techniques are applied on the genomic data of the tumor samples and the patient clusters are found to be of interest if they can be associated with a clinical outcome variable such as the survival of patients. In lieu of this unsupervised framework, we propose a weakly supervised clustering framework, WS-RFClust, in which the clustering partitions are guided with the clinical outcome of interest. In WS-RFClust a random forest is trained to classify the patients based on a categorical clinical variable of interest. We use the partitions of patients on the tree ensemble to construct a patient similarity matrix, which is then used as input to a clustering algorithm. WS-RFClust inherently uses the nonlinear subspace of the original features that is learned in the classiffication step for clustering. In this study, we demonstrate the effectiveness of WS-RFClust on hand-written digit datasets, which captures salient structural similarities of digit pairs. Finally, we employ WS-RFClust to find breast cancer subtypes using mRNA, protein and microRNA expressions as features. Our results on breast cancer subtype identiffication problem show that WS-RFClust could identify patients more effectively in comparison to the commonly used unsupervised clustering methods.en_US
dc.description.degreeM.S.en_US
dc.description.statementofresponsibilityby Duygu Özçelik.en_US
dc.format.extentxix, 95 leaves : charts (some color)en_US
dc.identifier.itemidB153989
dc.identifier.urihttp://hdl.handle.net/11693/32162
dc.language.isoEnglishen_US
dc.publisherBilkent Universityen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectClusteringen_US
dc.subjectWeakly supervised clusteringen_US
dc.subjectSubspace clusteringen_US
dc.subjectCancer subtype identi cationen_US
dc.subjectPatient subgroup identi cationen_US
dc.titleA weakly supervised clustering method for cancer subgroup identificationen_US
dc.title.alternativeKanser alt gruplarının keşfi için zayıf gözetimli bir kümeleme metoduen_US
dc.typeThesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
10121991.pdf
Size:
7.87 MB
Format:
Adobe Portable Document Format
Description:
Full printable version
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: