dc.contributor.advisor | Can, Fazlı | |
dc.contributor.author | Ayduğan, Gökçe | |
dc.date.accessioned | 2017-08-25T11:32:05Z | |
dc.date.available | 2017-08-25T11:32:05Z | |
dc.date.copyright | 2017-07 | |
dc.date.issued | 2017-07 | |
dc.date.submitted | 2017-08-15 | |
dc.identifier.uri | http://hdl.handle.net/11693/33553 | |
dc.description | Cataloged from PDF version of article. | en_US |
dc.description | Thesis (M.S.): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2017. | en_US |
dc.description | Includes bibliographical references (leaves 53-57). | en_US |
dc.description.abstract | A cluster is a set of related documents. Cluster labeling is the process of assigning
descriptive labels to clusters. This study investigates several cluster labeling
approaches and presents novel methods. The rst uses clusters themselves and
extracts important terms, which distinguish clusters from each other, with different
statistical feature selection methods. Then it applies di erent data fusion
methods for combining their outcomes. Our results show that although it provides
statistically signi cantly better results for some cases, it is not a stable and
reliable labeling method. This can be explained by the fact that a good label
may not occur in the cluster at all. The second exploits Wikipedia as an external
resource and uses its anchor texts and categories to enrich the label pool. Labeling
with Wikipedia anchor text fails because the suggested labels tend to focus
on minor topics. Although the minor topics are related to the main topic, they
do not exactly describe it. After this observation, we use categories of Wikipedia
pages to improve our label pool in two ways. The rst fuses important terms and
Wikipedia categories with rank based fusion methods. The second looks relatedness
of Wikipedia pages to the clusters and use only categories of related pages.
The experimental results show that both methods provide statistically signi -
cantly better results than the other cluster labeling approaches that we examine
in this study. | en_US |
dc.description.statementofresponsibility | by Gökçe Ayduğan. | en_US |
dc.format.extent | xiii, 69 leaves : charts (some color) ; 29 cm. | en_US |
dc.language.iso | English | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.subject | Cluster Labeling | en_US |
dc.subject | Data Fusion | en_US |
dc.subject | Wikipedia | en_US |
dc.title | Cluster labeling improvement by utilizing data fusion and Wikipedia | en_US |
dc.title.alternative | Veri birleştirme ve Wikipedia kullanarak küme etiketlemenin iyileştirilmesi | en_US |
dc.type | Thesis | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.publisher | Bilkent University | en_US |
dc.description.degree | M.S. | en_US |
dc.identifier.itemid | B156101 | |