Incremental cluster-based retrieval using compressed cluster-skipping inverted files

We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental-CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size. © 2008 ACM.

Source Title

ACM Transactions on Information Systems

Publisher

Association for Computing Machinery

Keywords

Best match, Cluster-based retrieval (CBR), Cluster-skipping inverted index structure (CS-IIS), Full search (FS), Index compression, Inverted index structure (IIS), CPU time efficiency, Efficiency improvements, Inverted files, Query evaluation, Query lengths, Single structure, Storage overhead, Bits, Data storage equipment, Query processing, Information retrieval systems

Permalink

http://hdl.handle.net/11693/23106

Published Version (Please cite this version)

http://dx.doi.org/10.1145/1361684.1361688

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Article

Full item page

Incremental cluster-based retrieval using compressed cluster-skipping inverted files

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Incremental cluster-based retrieval using compressed cluster-skipping inverted files

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type