Inverted index compression based on term and document identifier reassignment

buir.advisorAykanat, Cevdet
dc.contributor.authorBaykan, İzzet Çağrı
dc.date.accessioned2016-01-08T18:07:54Z
dc.date.available2016-01-08T18:07:54Z
dc.date.issued2008
dc.descriptionAnkara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2008.en_US
dc.descriptionThesis (Master's) -- Bilkent University, 2008.en_US
dc.descriptionIncludes bibliographical references leaves 43-46.en_US
dc.description.abstractCompression of inverted indexes received great attention in recent years. An inverted index consists of lists of document identifiers, also referred as posting lists, for each term. Compressing an inverted index reduces the size of the index, which also improves the query performance due to the reduction on disk access times. In recent studies, it is shown that reassigning document identifiers has great effect in compression of an inverted index. In this work, we propose a novel technique that reassigns both term and document identifiers of an inverted index by transforming the matrix representation of the index into a block-diagonal form, which improves the compression ratio dramatically. We adapted row-net hypergraph-partitioning model for the transformation into block-diagonal form, which improves the compression ratio by as much as 50%. To the best of our knowledge, this method performs more effectively than previous inverted index compression techniques.en_US
dc.description.provenanceMade available in DSpace on 2016-01-08T18:07:54Z (GMT). No. of bitstreams: 1 0003644.pdf: 462685 bytes, checksum: 7e18c1b31752682fffd8fb679539e7de (MD5)en
dc.description.statementofresponsibilityBaykan, İzzet Çağrıen_US
dc.format.extentix, 46 leaves, graphsen_US
dc.identifier.itemidBILKUTUPB109724
dc.identifier.urihttp://hdl.handle.net/11693/14779
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectInverted indexen_US
dc.subjectInverted index compressionen_US
dc.subjectBlock-diagonal formen_US
dc.subjectDocument identifier reassignmenten_US
dc.subjectHypergraph partitioningen_US
dc.subject.lccQA76.9.T48 B39 2008en_US
dc.subject.lcshText processing (Computer science)en_US
dc.subject.lcshInformation storage and retrieval systems.en_US
dc.subject.lcshInformation retrieval.en_US
dc.titleInverted index compression based on term and document identifier reassignmenten_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0003644.pdf
Size:
451.84 KB
Format:
Adobe Portable Document Format