Cluster labeling improvement by utilizing data fusion and Wikipedia

buir.advisorCan, Fazlı
dc.contributor.authorAyduğan, Gökçe
dc.date.accessioned2017-08-25T11:32:05Z
dc.date.available2017-08-25T11:32:05Z
dc.date.copyright2017-07
dc.date.issued2017-07
dc.date.submitted2017-08-15
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (M.S.): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2017.en_US
dc.descriptionIncludes bibliographical references (leaves 53-57).en_US
dc.description.abstractA cluster is a set of related documents. Cluster labeling is the process of assigning descriptive labels to clusters. This study investigates several cluster labeling approaches and presents novel methods. The rst uses clusters themselves and extracts important terms, which distinguish clusters from each other, with different statistical feature selection methods. Then it applies di erent data fusion methods for combining their outcomes. Our results show that although it provides statistically signi cantly better results for some cases, it is not a stable and reliable labeling method. This can be explained by the fact that a good label may not occur in the cluster at all. The second exploits Wikipedia as an external resource and uses its anchor texts and categories to enrich the label pool. Labeling with Wikipedia anchor text fails because the suggested labels tend to focus on minor topics. Although the minor topics are related to the main topic, they do not exactly describe it. After this observation, we use categories of Wikipedia pages to improve our label pool in two ways. The rst fuses important terms and Wikipedia categories with rank based fusion methods. The second looks relatedness of Wikipedia pages to the clusters and use only categories of related pages. The experimental results show that both methods provide statistically signi - cantly better results than the other cluster labeling approaches that we examine in this study.en_US
dc.description.degreeM.S.en_US
dc.description.statementofresponsibilityby Gökçe Ayduğan.en_US
dc.format.extentxiii, 69 leaves : charts (some color) ; 29 cm.en_US
dc.identifier.itemidB156101
dc.identifier.urihttp://hdl.handle.net/11693/33553
dc.language.isoEnglishen_US
dc.publisherBilkent Universityen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectCluster Labelingen_US
dc.subjectData Fusionen_US
dc.subjectWikipediaen_US
dc.titleCluster labeling improvement by utilizing data fusion and Wikipediaen_US
dc.title.alternativeVeri birleştirme ve Wikipedia kullanarak küme etiketlemenin iyileştirilmesien_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
gokce_tez.pdf
Size:
2.95 MB
Format:
Adobe Portable Document Format
Description:
Full printable version
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: