Browsing by Subject "Inverted files"

Now showing 1 - 5 of 5

Open Access
Algorithms for within-cluster searches using inverted files
(Springer, 2006-11) Altıngövde, İsmail Şengör; Can, Fazlı; Ulusoy, Özgür
Information retrieval over clustered document collections has two successive stages: first identifying the best-clusters and then the best-documents in these clusters that are most similar to the user query. In this paper, we assume that an inverted file over the entire document collection is used for the latter stage. We propose and evaluate algorithms for within-cluster searches, i.e., to integrate the best-clusters with the best-documents to obtain the final output including the highest ranked documents only from the best-clusters. Our experiments on a TREC collection including 210,158 documents with several query sets show that an appropriately selected integration algorithm based on the query length and system resources can significantly improve the query evaluation efficiency. © Springer-Verlag Berlin Heidelberg 2006.
Open Access
Cluster based collaborative filtering with inverted indexing
(2005) Subakan, Özlem Nurcan
Collectively, a population contains vast amounts of knowledge and modern communication technologies that increase the ease of communication. However, it is not feasible for a single person to aggregate the knowledge of thousands or millions of data and extract useful information from it. Collaborative information systems are attempts to harness the knowledge of a population and to present it in a simple, fast and fair manner. Collaborative filtering has been successfully used in domains where the information content is not easily parse-able and traditional information filtering techniques are difficult to apply. Collaborative filtering works over a database of ratings for the items which are rated by users. The computational complexity of these methods grows linearly with the number of customers which can reach to several millions in typical commercial applications. To address the scalability concern, we have developed an efficient collaborative filtering technique by applying user clustering and using a specific inverted index structure (so called cluster-skipping inverted index structure) that is tailored for clustered environments. We show that the predictive accuracy of the system is comparable with the collaborative filtering algorithms without clustering, whereas the efficiency is far more improved.
Open Access
Compressed multi-framed signature files: an index structure for fast information retrieval
(ACM, 1999-02-03) Koçberber, Seyit; Can, Fazlı
A new indexing method, called Compressed Multi-Framed Signature File (C-MFSF), that uses a partial query evaluation strategy with compressed signature bit slices is presented. In C-MFSF, a signature file is divided into variable sized compressed vertical frames with different on-bit densities to optimize the response time. Experiments with a real database of 152,850 records show that a response time less than 150 milliseconds is possible. For multi-term queries C-MFSF obtains the query results with fewer disk accesses than the inverted files. The method requires no indexing vocabulary. These attributes have important implications; for example, web search engines process multi-term queries in very large databases with sizeable vocabularies.
Open Access
Incremental cluster-based retrieval using compressed cluster-skipping inverted files
(Association for Computing Machinery, 2008-06) Altingovde, I. S.; Demir, E.; Can, F.; Ulusoy, Özgür
We propose a unique cluster-based retrieval (CBR) strategy using a new cluster-skipping inverted file for improving query processing efficiency. The new inverted file incorporates cluster membership and centroid information along with the usual document information into a single structure. In our incremental-CBR strategy, during query evaluation, both best(-matching) clusters and the best(-matching) documents of such clusters are computed together with a single posting-list access per query term. As we switch from term to term, the best clusters are recomputed and can dynamically change. During query-document matching, only relevant portions of the posting lists corresponding to the best clusters are considered and the rest are skipped. The proposed approach is essentially tailored for environments where inverted files are compressed, and provides substantial efficiency improvement while yielding comparable, or sometimes better, effectiveness figures. Our experiments with various collections show that the incremental-CBR strategy using a compressed cluster-skipping inverted file significantly improves CPU time efficiency, regardless of query length. The new compressed inverted file imposes an acceptable storage overhead in comparison to a typical inverted file. We also show that our approach scales well with the collection size. © 2008 ACM.
Open Access
Static index pruning in web search engines: combining term and document popularities with query views
(Association for Computing Machinery, 2012) Altingovde, I. S.; Ozcan, R.; Ulusoy, O.
Static index pruning techniques permanently remove a presumably redundant part of an inverted file, to reduce the file size and query processing time. These techniques differ in deciding which parts of an index can be removed safely; that is, without changing the top-ranked query results. As defined in the literature, the query view of a document is the set of query terms that access to this particular document, that is, retrieves this document among its top results. In this paper, we first propose using query views to improve the quality of the top results compared against the original results. We incorporate query views in a number of static pruning strategies, namely term-centric, document-centric, term popularity based and document access popularity based approaches, and show that the new strategies considerably outperform their counterparts especially for the higher levels of pruning and for both disjunctive and conjunctive query processing. Additionally,we combine the notions of term and document access popularity to form new pruning strategies, and further extend these strategies with the query views. The new strategies improve the result quality especially for the conjunctive query processing, which is the default and most common search mode of a search engine. © 2012 ACM.