Data sensitive approximate query approaches in metric spaces
Author
Dilek, Merve
Advisor
Körpeoğlu, İbrahim
Date
2011Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
68
views
views
24
downloads
downloads
Abstract
Similarity searching is the task of retrieval of relevant information from datasets.
We are particularly interested in datasets that contain complex and unstructured
data such as images, videos, audio recordings, protein and DNA sequences. The
relevant information is typically defined using one of two common query types: a
range query involves retrieval of all the objects within a specified distance to the
query object; whereas a k-nearest neighbor query deals with obtaining k closest
database objects to the query object. A variety of index structures based on the
notion of metric spaces have been offered to process these two query types.
The query performances of the proposed index structures have not been satisfactory
particularly for high dimensional datasets. As a solution, various approximate
similarity search methods offering the users a quality/time trade-off
have been proposed. The rationale is that the users might be willing to tolerate
query precision to retrieve query results relatively faster. The proposed approximate
searching schemes usually have strong connections to the underlying data
structures, making the comparison of the quality of the essence of their ideas
difficult.
In this thesis we investigate various approximation approaches to decrease the
response time of similarity queries. These approaches use a variety of statistics
about the dataset in order to obtain dynamic (at the time of querying) and specific
guidance on the approximation for each query object individually. The experiments
are performed on top of a simple underlying pivot-based index structure
to minimize the effects of the index to our approximation schemes. The results
show that it is possible to improve the performance/precision of the approximation
based on data and query object sensitive guidance.