Incremental cluster-based retrieval using compressed cluster-skipping inverted files

Altingovde, I. S.; Demir, E.; Can, F.; Ulusoy, Özgür

Incremental cluster-based retrieval using compressed cluster-skipping inverted files

buir.contributor.author	Ulusoy, Özgür
dc.citation.epage	15:36	en_US
dc.citation.issueNumber	3	en_US
dc.citation.spage	15:1	en_US
dc.citation.volumeNumber	26	en_US
dc.contributor.author	Altingovde, I. S.	en_US
dc.contributor.author	Demir, E.	en_US
dc.contributor.author	Can, F.	en_US
dc.contributor.author	Ulusoy, Özgür	en_US
dc.date.accessioned	2016-02-08T10:08:59Z
dc.date.available	2016-02-08T10:08:59Z
dc.date.issued	2008-06	en_US
dc.department	Department of Computer Engineering	en_US
dc.description.abstract	We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental-CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size. © 2008 ACM.	en_US
dc.identifier.doi	10.1145/1361684.1361688	en_US
dc.identifier.issn	1046-8188	en_US
dc.identifier.uri	http://hdl.handle.net/11693/23106	en_US
dc.language.iso	English	en_US
dc.publisher	Association for Computing Machinery	en_US
dc.relation.isversionof	http://dx.doi.org/10.1145/1361684.1361688	en_US
dc.source.title	ACM Transactions on Information Systems	en_US
dc.subject	Best match	en_US
dc.subject	Cluster-based retrieval (CBR)	en_US
dc.subject	Cluster-skipping inverted index structure (CS-IIS)	en_US
dc.subject	Full search (FS)	en_US
dc.subject	Index compression	en_US
dc.subject	Inverted index structure (IIS)	en_US
dc.subject	CPU time efficiency	en_US
dc.subject	Efficiency improvements	en_US
dc.subject	Inverted files	en_US
dc.subject	Query evaluation	en_US
dc.subject	Query lengths	en_US
dc.subject	Single structure	en_US
dc.subject	Storage overhead	en_US
dc.subject	Bits	en_US
dc.subject	Data storage equipment	en_US
dc.subject	Query processing	en_US
dc.subject	Information retrieval systems	en_US
dc.title	Incremental cluster-based retrieval using compressed cluster-skipping inverted files	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Incremental cluster-based retrieval using compressed cluster-skipping inverted files.pdf
Size:: 504.02 KB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

Collections

Scholarly Publications - Computer Engineering