Selective replicated declustering for arbitrary queries
Author
Oktay, K. Yasin
Türk, Ata
Aykanat, Cevdet
Date
2009-08Source Title
European Conference on Parallel Processing. Euro-Par 2009: Euro-Par 2009 Parallel Processing
Publisher
Springer
Pages
375 - 386
Language
English
Type
Conference PaperItem Usage Stats
66
views
views
49
downloads
downloads
Metadata
Show full item recordAbstract
Data declustering is used to minimize query response times in data intensive applications. In this technique, query retrieval process is parallelized by distributing the data among several disks and it is useful in applications such as geographic information systems that access huge amounts of data. Declustering with replication is an extension of declustering with possible data replicas in the system. Many replicated declustering schemes have been proposed. Most of these schemes generate two or more copies of all data items. However, some applications have very large data sizes and even having two copies of all data items may not be feasible. In such systems selective replication is a necessity. Furthermore, existing replication schemes are not designed to utilize query distribution information if such information is available. In this study we propose a replicated declustering scheme that decides both on the data items to be replicated and the assignment of all data items to disks when there is limited replication capacity. We make use of available query information in order to decide replication and partitioning of the data and try to optimize aggregate parallel response time. We propose and implement a Fiduccia-Mattheyses-like iterative improvement algorithm to obtain a two-way replicated declustering and use this algorithm in a recursive framework to generate a multi-way replicated declustering. Experiments conducted with arbitrary queries on real datasets show that, especially for low replication constraints, the proposed scheme yields better performance results compared to existing replicated declustering schemes. © 2009 Springer.
Keywords
Data declusteringData items
Data replica
Data-intensive application
Declustering
Declustering scheme
Iterative improvements
Query distributions
Query information
Query response
Query retrieval
Real data sets
Response time
Selective replication
Very large datum
Artificial intelligence
Bioinformatics
Disks (machine components)
Disks (structural components)
Distributed computer systems
Geographic information systems
Response time (computer systems)
Permalink
http://hdl.handle.net/11693/28697Published Version (Please cite this version)
http://dx.doi.org/10.1007/978-3-642-03869-3_37Collections
Related items
Showing items related by title, author, creator and subject.
-
A histogram-based approach for object-based query-by-shape-and-color in image and video databases
Şaykol, E.; Güdükbay U.; Ulusoy, Ö. (Elsevier, 2005)Considering the fact that querying by low-level object features is essential in image and video data, an efficient approach for querying and retrieval by shape and color is proposed. The approach employs three specialized ... -
Static index pruning in web search engines: combining term and document popularities with query views
Altingovde, I. S.; Ozcan, R.; Ulusoy, O. (Association for Computing Machinery, 2012)Static index pruning techniques permanently remove a presumably redundant part of an inverted file, to reduce the file size and query processing time. These techniques differ in deciding which parts of an index can be ... -
Cost-aware strategies for query result caching in Web search engines
Ozcan, R.; Altingovde, I. S.; Ulusoy, O. (Association for Computing Machinery, 2011)Search engines and large-scale IR systems need to cache query results for efficiency and scalability purposes. Static and dynamic caching techniques (as well as their combinations) are employed to effectively cache query ...