A new approach to search result clustering and labeling

buir.advisorCan, Fazlı
dc.contributor.authorTürel, Anıl
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionAnkara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2011.en_US
dc.descriptionThesis (Master's) -- Bilkent University, 2011.en_US
dc.descriptionIncludes bibliographical references leaves 58-62.en_US
dc.description.abstractSearch engines present query results as a long ordered list of web snippets divided into several pages. Post-processing of information retrieval results for easier access to the desired information is an important research problem. A post-processing technique is clustering search results by topics and labeling these groups to reflect the topic of each cluster. In this thesis, we present a novel search result clustering approach to split the long list of documents returned by search engines into meaningfully grouped and labeled clusters. Our method emphasizes clustering quality by using cover coefficient and sequential k-means clustering algorithms. Cluster labeling is crucial because meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able to label clusters effectively, a new cluster labeling method based on term weighting is introduced. We also present a new metric that employs precision and recall to assess the success of cluster labeling. We adopt a comparative evaluation strategy to derive the relative performance of the proposed method with respect to the two prominent search result clustering methods: Suffix Tree Clustering and Lingo. Moreover, we perform the experiments using the publicly available Ambient and ODP-239 datasets. Experimental results show that the proposed method can successfully achieve both clustering and labeling tasks.en_US
dc.description.statementofresponsibilityTürel, Anılen_US
dc.format.extentxiv, 67 leavesen_US
dc.publisherBilkent Universityen_US
dc.subjectSearch result clusteringen_US
dc.subjectcluster labelingen_US
dc.subjectweb information retrievalen_US
dc.subjectclustering evaluationen_US
dc.subjectlabeling evaluationen_US
dc.subject.lccTK5105.884 .T87 2011en_US
dc.subject.lcshSearch engines--Programming.en_US
dc.subject.lcshWeb search engines--Mathematical models.en_US
dc.subject.lcshInformation storage and retrieval systems.en_US
dc.subject.lcshInformation retrieval.en_US
dc.subject.lcshInternet searching.en_US
dc.titleA new approach to search result clustering and labelingen_US
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
805.7 KB
Adobe Portable Document Format