dc.contributor.advisor | Can, Fazlı | |
dc.contributor.author | Türel, Anıl | |
dc.date.accessioned | 2016-01-08T18:15:40Z | |
dc.date.available | 2016-01-08T18:15:40Z | |
dc.date.issued | 2011 | |
dc.identifier.uri | http://hdl.handle.net/11693/15255 | |
dc.description | Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2011. | en_US |
dc.description | Thesis (Master's) -- Bilkent University, 2011. | en_US |
dc.description | Includes bibliographical references leaves 58-62. | en_US |
dc.description.abstract | Search engines present query results as a long ordered list of web snippets divided
into several pages. Post-processing of information retrieval results for easier access
to the desired information is an important research problem. A post-processing
technique is clustering search results by topics and labeling these groups to reflect
the topic of each cluster. In this thesis, we present a novel search result clustering
approach to split the long list of documents returned by search engines into
meaningfully grouped and labeled clusters. Our method emphasizes clustering
quality by using cover coefficient and sequential k-means clustering algorithms.
Cluster labeling is crucial because meaningless or confusing labels may mislead
users to check wrong clusters for the query and lose extra time. Additionally,
labels should reflect the contents of documents within the cluster accurately. To
be able to label clusters effectively, a new cluster labeling method based on term
weighting is introduced. We also present a new metric that employs precision and
recall to assess the success of cluster labeling. We adopt a comparative evaluation
strategy to derive the relative performance of the proposed method with respect
to the two prominent search result clustering methods: Suffix Tree Clustering
and Lingo. Moreover, we perform the experiments using the publicly available
Ambient and ODP-239 datasets. Experimental results show that the proposed
method can successfully achieve both clustering and labeling tasks. | en_US |
dc.description.statementofresponsibility | Türel, Anıl | en_US |
dc.format.extent | xiv, 67 leaves | en_US |
dc.language.iso | English | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.subject | Search result clustering | en_US |
dc.subject | cluster labeling | en_US |
dc.subject | web information retrieval | en_US |
dc.subject | clustering evaluation | en_US |
dc.subject | labeling evaluation | en_US |
dc.subject.lcc | TK5105.884 .T87 2011 | en_US |
dc.subject.lcsh | Search engines--Programming. | en_US |
dc.subject.lcsh | Web search engines--Mathematical models. | en_US |
dc.subject.lcsh | Information storage and retrieval systems. | en_US |
dc.subject.lcsh | Information retrieval. | en_US |
dc.subject.lcsh | Internet searching. | en_US |
dc.title | A new approach to search result clustering and labeling | en_US |
dc.type | Thesis | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.publisher | Bilkent University | en_US |
dc.description.degree | M.S. | en_US |