Efficient result caching mechanisms in search engines
Author
Sazoğlu, Fethi Burak
Advisor
Ulusoy, Özgür
Date
2014Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
68
views
views
19
downloads
downloads
Abstract
The performance of a search engine depends on its components such as
crawler, indexer and processor. The query latency, accuracy and recency of the
results play crucial role in determining the performance. High performance can
be provided with powerful hardware in the data center, but keeping the operational
costs restrained is mandatory for search engines for commercial durability.
This thesis focuses on techniques to boost the performance of search engines by
means of reducing both the number of queries issued to the backend and the cost
to process a query stream. This can be accomplished by taking advantage of the
temporal locality of the queries. Caching the result for a recently issued query
removes the need to reprocess this query when it is issued again by the same or
different user. Therefore, deploying query result cache decreases the load on the
resources of the search engine which increases the processing power. The main
objective of this thesis is to improve search engine performance by enhancing productivity
of result cache. This is done by endeavoring to maximize the cache hit
rate and minimizing the processing cost by using the per query statistics such as
frequency, timestamp and cost. While providing high hit rates and low processing
costs improves performance, the freshness of the queries in the cache has to
be considered as well for user satisfaction. Therefore, a variety of techniques are
examined in this thesis to bound the staleness of cache results without blasting
the backend with refresh queries. The offered techniques are demonstrated to be
efficient by using real query log data from a commercial search engine.