Parallel text retrieval on temporally versioned document collections
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Print ISSN
Electronic ISSN
Publisher
Volume
Issue
Pages
Language
Type
Journal Title
Journal ISSN
Volume Title
Attention Stats
Usage Stats
views
downloads
Series
Abstract
In recent years, as the access to the Internet is getting easier and cheaper, the amount and the rate of change of the online data presented to the Internet users are increasing at an astonishing rate. This ever-changing nature of the Internet causes an ever-decaying and replenishing information collection where newly presented data generally replaces old and sometimes valuable data. There are many recent studies aiming to preserve this valuable temporal data and size and number of temporal Web data collections are increasing. We believe that soon, information retrieval systems responding to time-range queries in a reasonable amount of time will emerge as a means of accessing vast temporal Web data collections. Due to tremendous size of temporal data and excessive number of query submissions per unit time, temporal information retrieval systems will have to utilize parallelism as much as possible. In parallel systems, in order to index collections using inverted indices, a strategy on distribution of the inverted indices has to be followed. In this study, the feasibility of time-based partitioned versus term-based partitioned temporalweb inverted-indices is analyzed and a novel parallel text retrieval system for answering temporal web queries is implemented considering the number of queries processed in unit time. Moreover, we investigate the performance of skip-list based and randomized-select based ranking schemes on time-based and termbased partitioned inverted indexes. Finally, we compare time-balanced and sizebalanced time-based partitioning schemes. The experimental results at small to medium number of processors reveal that for medium to long length queries time-based partitioning works better.