Parallel text retrieval on PC clusters

Date

2003

Editor(s)

Advisor

Aykanat, Cevdet

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

Print ISSN

Electronic ISSN

Publisher

Volume

Issue

Pages

Language

English

Type

Journal Title

Journal ISSN

Volume Title

Attention Stats
Usage Stats
1
views
7
downloads

Series

Abstract

The inverted index partitioning problem is investigated for parallel text retrieval systems. The objective is to perform efficient query processing on an inverted index distributed across a PC cluster. Alternative strategies are considered and evaluated for inverted index partitioning, where index entries are distributed according to their document-ids or term-ids. The performance of both partitioning schemes depend on the total number of disk accesses and the total volume of communication in the system. In document-id partitioning, the total volume of communication is naturally minimum, whereas the total number of disk accesses may be larger compared to term-id partitioning. On the other hand, in term-id partitioning the total number of disk accesses is already equivalent to the lower bound achieved by the sequential algorithm, albeit the total communication volume may be quite large. The studies done so far perform these partitioning schemes in a round-robin fashion and compare the performance of them by simulation. In this work, a parallel text retrieval system is designed and implemented on a PC cluster. We adopted hypergraph-theoretical partitioning models and carried out performance comparison of round-robin and hypergraph-theoretical partitioning schemes on our parallel text retrieval system. We also designed and implemented a query interface and a user interface of our system.

Course

Other identifiers

Book Title

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Citation

Published Version (Please cite this version)