Incremental cluster-based retrieval using compressed cluster-skipping inverted files

buir.contributor.authorUlusoy, Özgür
dc.citation.epage15:36en_US
dc.citation.issueNumber3en_US
dc.citation.spage15:1en_US
dc.citation.volumeNumber26en_US
dc.contributor.authorAltingovde, I. S.en_US
dc.contributor.authorDemir, E.en_US
dc.contributor.authorCan, F.en_US
dc.contributor.authorUlusoy, Özgüren_US
dc.date.accessioned2016-02-08T10:08:59Z
dc.date.available2016-02-08T10:08:59Z
dc.date.issued2008-06en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractWe propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental-CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size. © 2008 ACM.en_US
dc.description.provenanceMade available in DSpace on 2016-02-08T10:08:59Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2008en
dc.identifier.doi10.1145/1361684.1361688en_US
dc.identifier.issn1046-8188en_US
dc.identifier.urihttp://hdl.handle.net/11693/23106en_US
dc.language.isoEnglishen_US
dc.publisherAssociation for Computing Machineryen_US
dc.relation.isversionofhttp://dx.doi.org/10.1145/1361684.1361688en_US
dc.source.titleACM Transactions on Information Systemsen_US
dc.subjectBest matchen_US
dc.subjectCluster-based retrieval (CBR)en_US
dc.subjectCluster-skipping inverted index structure (CS-IIS)en_US
dc.subjectFull search (FS)en_US
dc.subjectIndex compressionen_US
dc.subjectInverted index structure (IIS)en_US
dc.subjectCPU time efficiencyen_US
dc.subjectEfficiency improvementsen_US
dc.subjectInverted filesen_US
dc.subjectQuery evaluationen_US
dc.subjectQuery lengthsen_US
dc.subjectSingle structureen_US
dc.subjectStorage overheaden_US
dc.subjectBitsen_US
dc.subjectData storage equipmenten_US
dc.subjectQuery processingen_US
dc.subjectInformation retrieval systemsen_US
dc.titleIncremental cluster-based retrieval using compressed cluster-skipping inverted filesen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Incremental cluster-based retrieval using compressed cluster-skipping inverted files.pdf
Size:
504.02 KB
Format:
Adobe Portable Document Format
Description:
Full printable version