A new approach to search result clustering and labeling

Date
2011
Advisor
Instructor
Source Title
Information Retrieval Technology
Print ISSN
0302-9743
Electronic ISSN
Publisher
Springer, Berlin, Heidelberg
Volume
7097
Issue
Pages
283 - 292
Language
English
Type
Conference Paper
Journal Title
Journal ISSN
Volume Title
Abstract

Search engines present query results as a long ordered list of web snippets divided into several pages. Post-processing of retrieval results for easier access of desired information is an important research problem. In this paper, we present a novel search result clustering approach to split the long list of documents returned by search engines into meaningfully grouped and labeled clusters. Our method emphasizes clustering quality by using cover coefficient-based and sequential k-means clustering algorithms. A cluster labeling method based on term weighting is also introduced for reflecting cluster contents. In addition, we present a new metric that employs precision and recall to assess the success of cluster labeling. We adopt a comparative strategy to derive the relative performance of the proposed method with respect to two prominent search result clustering methods: Suffix Tree Clustering and Lingo. Experimental results in the publicly available AMBIENT and ODP-239 datasets show that our method can successfully achieve both clustering and labeling tasks. © 2011 Springer-Verlag Berlin Heidelberg.

Course
Other identifiers
Book Title
Keywords
Cluster labeling, Search result clustering, Web information retrieval, Cluster content, Cluster labeling, Clustering approach, Clustering methods, Clustering quality, Data sets, K-Means clustering algorithm, Post processing, Precision and recall, Query results, Relative performance, Research problems, Search results, Suffix-trees, Term weighting, Web information retrieval, Clustering algorithms, Content based retrieval, Information retrieval, Infrared devices, World Wide Web, Search engines
Citation