Utilizing query logs for data replication and placement in big data applications

buir.advisorAykanat, Cevdet
dc.contributor.authorTürk, Ata
dc.date.accessioned2016-01-08T18:22:56Z
dc.date.available2016-01-08T18:22:56Z
dc.date.issued2012
dc.descriptionAnkara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2012.en_US
dc.descriptionThesis (Ph. D.) -- Bilkent University, 2012.en_US
dc.descriptionIncludes bibliographical refences.en_US
dc.description.abstractThe growth in the amount of data in todays computing problems and the level of parallelism dictated by the large-scale computing economics necessitates highlevel parallelism for many applications. This parallelism is generally achieved via data-parallel solutions that require effective data clustering (partitioning) or declustering schemes (depending on the application requirements). In addition to data partitioning/declustering, data replication, which is used for data availability and increased performance, has also become an inherent feature of many applications. The data partitioning/declustering and data replication problems are generally addressed separately. This thesis is centered around the idea of performing data replication and data partitioning/declustering simultenously to obtain replicated data distributions that yield better parallelism. To this end, we utilize query-logs to propose replicated data distribution solutions and extend the well known Fiduccia-Mattheyses (FM) iterative improvement algorithm so that it can be used to generate replicated partitioning/declustering of data. For the replicated declustering problem, we propose a novel replicated declustering scheme that utilizes query logs to improve the performance of a parallel database system. We also extend our replicated declustering scheme and propose a novel replicated re-declustering scheme such that in the face of drastic query pattern changes or server additions/removals from the parallel database system, new declustering solutions that require low migration overheads can be computed. For the replicated partitioning problem, we show how to utilize an effective single-phase replicated partitioning solution in two well-known applications (keyword-based search and Twitter). For these applications, we provide the algorithmic solutions we had to devise for solving the problems that replication brings, the engineering decisions we made so as to obtain the greatest benefits from the proposed data distribution, and the implementation details for realistic systems. Obtained results indicate that utilizing query-logs and performing replication and partitioning/declustering in a single phase improves parallel performance.en_US
dc.description.provenanceMade available in DSpace on 2016-01-08T18:22:56Z (GMT). No. of bitstreams: 1 0006398.pdf: 930978 bytes, checksum: 95966ffa3249776af1a240b44e93d617 (MD5)en
dc.description.statementofresponsibilityTürk, Ataen_US
dc.format.extentxviii, 128 leavesen_US
dc.identifier.urihttp://hdl.handle.net/11693/15682
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectreplicationen_US
dc.subjectiterative improvementen_US
dc.subjectquery-log-awareen_US
dc.subjectpartitioningen_US
dc.subjectdeclusteringen_US
dc.subject.lccQA76.9.B32 T87 2012en_US
dc.subject.lcshElectronic data processing--Backup processing alternatives.en_US
dc.subject.lcshData recovery (Computer science)en_US
dc.subject.lcshPartition (Mathematics)en_US
dc.subject.lcshIterative methods (Mathematics)en_US
dc.titleUtilizing query logs for data replication and placement in big data applicationsen_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelDoctoral
thesis.degree.namePh.D. (Doctor of Philosophy)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0006398.pdf
Size:
909.16 KB
Format:
Adobe Portable Document Format