A new approach to search result clustering and labeling

Türel, Anıl

A new approach to search result clustering and labeling

Files

0006009.pdf (805.7 KB)

Date

2011

Authors

Türel, Anıl

Advisor

Can, Fazlı

BUIR Usage Stats

2
views

28
downloads

Abstract

Search engines present query results as a long ordered list of web snippets divided into several pages. Post-processing of information retrieval results for easier access to the desired information is an important research problem. A post-processing technique is clustering search results by topics and labeling these groups to reflect the topic of each cluster. In this thesis, we present a novel search result clustering approach to split the long list of documents returned by search engines into meaningfully grouped and labeled clusters. Our method emphasizes clustering quality by using cover coefficient and sequential k-means clustering algorithms. Cluster labeling is crucial because meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able to label clusters effectively, a new cluster labeling method based on term weighting is introduced. We also present a new metric that employs precision and recall to assess the success of cluster labeling. We adopt a comparative evaluation strategy to derive the relative performance of the proposed method with respect to the two prominent search result clustering methods: Suffix Tree Clustering and Lingo. Moreover, we perform the experiments using the publicly available Ambient and ODP-239 datasets. Experimental results show that the proposed method can successfully achieve both clustering and labeling tasks.

Keywords

Search result clustering, cluster labeling, web information retrieval, clustering evaluation, labeling evaluation

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Permalink

http://hdl.handle.net/11693/15255

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

A new approach to search result clustering and labeling

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

A new approach to search result clustering and labeling

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type