Browsing by Subject "inverted index"

Now showing 1 - 2 of 2

Open Access
Characteristics of Web-based textual communications
(2012) Küçükyılmaz, Tayfun
In this thesis, we analyze different aspects of Web-based textual communications and argue that all such communications share some common properties. In order to provide practical evidence for the validity of this argument, we focus on two common properties by examining these properties on various types of Web-based textual communications data. These properties are: All Web-based communications contain features attributable to their author and reciever; and all Web-based communications exhibit similar heavy tailed distributional properties. In order to provide practical proof for the validity of our claims, we provide three practical, real life research problems and exploit the proposed common properties of Web-based textual communications to find practical solutions to these problems. In this work, we first provide a feature-based result caching framework for real life search engines. To this end, we mined attributes from user queries in order to classify queries and estimate a quality metric for giving admission and eviction decisions for the query result cache. Second, we analyzed messages of an online chat server in order to predict user and mesage attributes. Our results show that several user- and message-based attributes can be predicted with significant occuracy using both chat message- and writing-style based features of the chat users. Third, we provide a parallel framework for in-memory construction of term partitioned inverted indexes. In this work, in order to minimize the total communication time between processors, we provide a bucketing scheme that is based on term-based distributional properties of Web page contents.
Open Access
Parallel text retrieval on PC clusters
(2003) Çatal, Aytül
The inverted index partitioning problem is investigated for parallel text retrieval systems. The objective is to perform efficient query processing on an inverted index distributed across a PC cluster. Alternative strategies are considered and evaluated for inverted index partitioning, where index entries are distributed according to their document-ids or term-ids. The performance of both partitioning schemes depend on the total number of disk accesses and the total volume of communication in the system. In document-id partitioning, the total volume of communication is naturally minimum, whereas the total number of disk accesses may be larger compared to term-id partitioning. On the other hand, in term-id partitioning the total number of disk accesses is already equivalent to the lower bound achieved by the sequential algorithm, albeit the total communication volume may be quite large. The studies done so far perform these partitioning schemes in a round-robin fashion and compare the performance of them by simulation. In this work, a parallel text retrieval system is designed and implemented on a PC cluster. We adopted hypergraph-theoretical partitioning models and carried out performance comparison of round-robin and hypergraph-theoretical partitioning schemes on our parallel text retrieval system. We also designed and implemented a query interface and a user interface of our system.