Browsing by Subject "Web search engines"

Now showing 1 - 11 of 11

Open Access
Analysis of Web search queries with very few or no results
(2012) Sarıgil, Erdem
Nowadays search engines have significant impacts on people’s life with the rapid growth of World Wide Web. There are billions of web pages that include a huge amount of information. Search engines are indispensable tools for finding information on the Web. Despite the continuous efforts to improve the web search quality, a non-negligible fraction of user queries end up with very few or even no matching results in leading commercial web search engines. In this thesis, we provide the first detailed characterization of such queries based on an analysis of a real-life query log. Our experimental setup allows us to characterize the queries with few/no results and compare the mechanisms employed by the three major search engines to handle them. Furthermore, we build machine learning models for the prediction of query suggestion patterns and no-answer queries.
Open Access
Characterizing web search queries that match very few or no results
(ACM, 2012-11) Altıngövde, İ. Ş.; Blanco, R.; Cambazoğlu, B. B.; Özcan, Rıfat; Sarıgil, Erdem; Ulusoy, Özgür
Despite the continuous efforts to improve the web search quality, a non-negligible fraction of user queries end up with very few or even no matching results in leading web search engines. In this work, we provide a detailed characterization of such queries based on an analysis of a real-life query log. Our experimental setup allows us to characterize the queries with few/no results and compare the mechanisms employed by the major search engines in handling them.
Open Access
Cost-aware strategies for query result caching in Web search engines
(Association for Computing Machinery, 2011) Ozcan, R.; Altingovde, I. S.; Ulusoy, O.
Search engines and large-scale IR systems need to cache query results for efficiency and scalability purposes. Static and dynamic caching techniques (as well as their combinations) are employed to effectively cache query results. In this study, we propose cost-aware strategies for static and dynamic caching setups. Our research is motivated by two key observations: (i) query processing costs may significantly vary among different queries, and (ii) the processing cost of a query is not proportional to its popularity (i.e., frequency in the previous logs). The first observation implies that cache misses have different, that is, nonuniform, costs in this context. The latter observation implies that typical caching policies, solely based on query popularity, can not always minimize the total cost. Therefore, we propose to explicitly incorporate the query costs into the caching policies. Simulation results using two large Web crawl datasets and a real query log reveal that the proposed approach improves overall system performance in terms of the average query execution time. © 2011 ACM.
Open Access
Efficient result caching mechanisms in search engines
(2014) Sazoğlu, Fethi Burak
The performance of a search engine depends on its components such as crawler, indexer and processor. The query latency, accuracy and recency of the results play crucial role in determining the performance. High performance can be provided with powerful hardware in the data center, but keeping the operational costs restrained is mandatory for search engines for commercial durability. This thesis focuses on techniques to boost the performance of search engines by means of reducing both the number of queries issued to the backend and the cost to process a query stream. This can be accomplished by taking advantage of the temporal locality of the queries. Caching the result for a recently issued query removes the need to reprocess this query when it is issued again by the same or different user. Therefore, deploying query result cache decreases the load on the resources of the search engine which increases the processing power. The main objective of this thesis is to improve search engine performance by enhancing productivity of result cache. This is done by endeavoring to maximize the cache hit rate and minimizing the processing cost by using the per query statistics such as frequency, timestamp and cost. While providing high hit rates and low processing costs improves performance, the freshness of the queries in the cache has to be considered as well for user satisfaction. Therefore, a variety of techniques are examined in this thesis to bound the staleness of cache results without blasting the backend with refresh queries. The offered techniques are demonstrated to be efficient by using real query log data from a commercial search engine.
Open Access
Exploiting navigational queries for result presentation and caching in Web search engines
(John Wiley & Sons, Inc., 2011) Ozcan, R.; Altingovde, I. S.; Ulusoy, O.
Caching of query results is an important mechanism for efficiency and scalability of web search engines. Query results are cached and presented in terms of pages, which typically include 10 results each. In navigational queries, users seek a particular website, which would be typically listed at the top ranks (maybe, first or second) by the search engine, if found. For this type of query, caching and presenting results in the 10-per-page manner may waste cache space and network bandwidth. In this article, we propose nonuniform result page models with varying numbers of results for navigational queries. The experimental results show that our approach reduces the cache miss count by up to 9.17% (because of better utilization of cache space). Furthermore, bandwidth usage, which is measured in terms of number of snippets sent, is also reduced by 71% for navigational queries. This means a considerable reduction in the number of transmitted network packets, i.e., a crucial gain especially for mobile-search scenarios. A user study reveals that users easily adapt to the proposed result page model and that the efficiency gains observed in the experiments can be carried over to real-life situations. © 2011 ASIS&T.
Open Access
Exploiting query views for static index pruning in web search engines
(ACM, 2009-11) Altıngövde, İsmail Şengör; Özcan, Rıfat; Ulusoy, Özgür
We propose incorporating query views in a number of static pruning strategies, namely term-centric, document-centric and access-based approaches. These query-view based strategies considerably outperform their counterparts for both disjunctive and conjunctive query processing in Web search engines. Copyright 2009 ACM.
Open Access
A financial cost metric for result caching
(ACM, 2013-07-08) Sazoğlu, Fethi Burak; Cambazoğlu, B. B.; Özcan, R.; Altıngövde, I. S.; Ulusoy, Özgür
Web search engines cache results of frequent and/or recent queries. Result caching strategies can be evaluated using different metrics, hit rate being the most well-known. Recent works take the processing overhead of queries into account when evaluating the performance of result caching strategies and propose cost-aware caching strategies. In this paper, we propose a financial cost metric that goes one step beyond and takes also the hourly electricity prices into account when computing the cost. We evaluate the most well-known static, dynamic, and hybrid result caching strategies under this new metric. Moreover, we propose a financial-cost-aware version of the well-known LRU strategy and show that it outperforms the original LRU strategy in terms of the financial cost metric. Copyright © 2013 ACM.
Open Access
A five-level static cache architecture for web search engines
(Elsevier Ltd, 2012) Ozcan, R.; Altingovde, I. S.; Cambazoglu, B. B.; Junqueira, F. P.; Ulusoy, Özgür
Caching is a crucial performance component of large-scale web search engines, as it greatly helps reducing average query response times and query processing workloads on backend search clusters. In this paper, we describe a multi-level static cache architecture that stores five different item types: query results, precomputed scores, posting lists, precomputed intersections of posting lists, and documents. Moreover, we propose a greedy heuristic to prioritize items for caching, based on gains computed by using items' past access frequencies, estimated computational costs, and storage overheads. This heuristic takes into account the inter-dependency between individual items when making its caching decisions, i.e.; after a particular item is cached, gains of all items that are affected by this decision are updated. Our simulations under realistic assumptions reveal that the proposed heuristic performs better than dividing the entire cache space among particular item types at fixed proportions. © 2010 Elsevier Ltd. All rights reserved.
Open Access
Longitudinal analysis of search engine query logs - temporal coverage
(2012) Yılmaz, Oğuz
The internet is growing day-by-day and the usage of web search engines is continuously increasing. Main page of browsers started by internet users is typically the home page of a search engine. To navigate a certain web site, most of the people prefer to type web sites’ name to search engine interface instead of using internet browsers’ address bar. Considering this important role of search engines as the main entry point to the web, we need to understand Web searching trends that are emerging over time. We believe that temporal analysis of returned query results by search engines reveals important insights for the current situation and future directions of web searching. In this thesis, we provide a large-scale analysis of the evolution of query results obtained from a real search engine at two distant points in time, namely, in 2007 and 2010, for a set of 630000 real queries. Our analyses in this work attempt to find answers to several critical questions regarding the evolution of Web search results. We believe that this work, being a large-scale longitudinal analysis of query results, would shed some light on those questions.
Open Access
Site-based dynamic pruning for query processing in search engines
(ACM, 2008-07) Altıngövde İsmail Şengör; Demir, Engin; Can, Fazlı; Ulusoy, Özgür
Web search engines typically index and retrieve at the page level. In this study, we investigate a dynamic pruning strategy that allows the query processor to first determine the most promising websites and then proceed with the similarity computations for those pages only within these sites.
Open Access
Strategies for setting time-to-live values in result caches
(ACM, 2013-10-11) Sazoğlu, Fethi Burak; Cambazoğlu, B. B.; Özcan, R.; Altıngövde, İsmail Şengör; Ulusoy, Özgür
In web query result caching, staleness of queries are often bounded via a time-to-live (TTL) mechanism, which expires the validity of cached query results at some point in time. In this work, we evaluate the performance of three alternative TTL mechanisms: time-based TTL, frequency-based TTL, and click-based TTL. Moreover, we propose hybrid approaches obtained by pair-wise combination of these mechanisms. Our results indicate that combining time-based TTL with frequency-based TTL yields superior performance (i.e., lower stale query traffic and less redundant computation) than using a particular mechanism in isolation. Copyright is held by the owner/author(s).