Parallel text retrieval on temporally versioned document collections

buir.advisorAykanat, Cevdet
dc.contributor.authorGür, Özlem
dc.date.accessioned2016-01-08T18:07:53Z
dc.date.available2016-01-08T18:07:53Z
dc.date.issued2008
dc.descriptionAnkara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2008.en_US
dc.descriptionThesis (Master's) -- Bilkent University, 2008.en_US
dc.descriptionIncludes bibliographical references leaves 57-61.en_US
dc.description.abstractIn recent years, as the access to the Internet is getting easier and cheaper, the amount and the rate of change of the online data presented to the Internet users are increasing at an astonishing rate. This ever-changing nature of the Internet causes an ever-decaying and replenishing information collection where newly presented data generally replaces old and sometimes valuable data. There are many recent studies aiming to preserve this valuable temporal data and size and number of temporal Web data collections are increasing. We believe that soon, information retrieval systems responding to time-range queries in a reasonable amount of time will emerge as a means of accessing vast temporal Web data collections. Due to tremendous size of temporal data and excessive number of query submissions per unit time, temporal information retrieval systems will have to utilize parallelism as much as possible. In parallel systems, in order to index collections using inverted indices, a strategy on distribution of the inverted indices has to be followed. In this study, the feasibility of time-based partitioned versus term-based partitioned temporalweb inverted-indices is analyzed and a novel parallel text retrieval system for answering temporal web queries is implemented considering the number of queries processed in unit time. Moreover, we investigate the performance of skip-list based and randomized-select based ranking schemes on time-based and termbased partitioned inverted indexes. Finally, we compare time-balanced and sizebalanced time-based partitioning schemes. The experimental results at small to medium number of processors reveal that for medium to long length queries time-based partitioning works better.en_US
dc.description.provenanceMade available in DSpace on 2016-01-08T18:07:53Z (GMT). No. of bitstreams: 1 0003643.pdf: 570004 bytes, checksum: 7f4ebc3428b64b67ec68576d25f560a3 (MD5)en
dc.description.statementofresponsibilityGür, Özlemen_US
dc.format.extentxii, 61 leaves, graphsen_US
dc.identifier.itemidBILKUTUPB109729
dc.identifier.urihttp://hdl.handle.net/11693/14778
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectTemporally versioned document collectionsen_US
dc.subjectParallel text retrievalen_US
dc.subjectInverted index partitioningen_US
dc.subjectQuery processingen_US
dc.subjectSearch enginesen_US
dc.subject.lccQA76.5 .G87 2008en_US
dc.subject.lcshParallel processing (Electronic computers).en_US
dc.subject.lcshText processing (Computer science).en_US
dc.titleParallel text retrieval on temporally versioned document collectionsen_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0003643.pdf
Size:
556.64 KB
Format:
Adobe Portable Document Format