A bipartite graph model for placement, scheduling and replication in data grids

Dal, Burcu

A bipartite graph model for placement, scheduling and replication in data grids

buir.advisor	Aykanat, Cevdet
dc.contributor.author	Dal, Burcu
dc.date.accessioned	2016-01-08T18:19:22Z
dc.date.available	2016-01-08T18:19:22Z
dc.date.issued	2012
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references leaves 63-68.	en_US
dc.description.abstract	Data grids provide geographically distributed resources for applications that generate and utilize large data sets. However, there are some issues that hinder to ensure fast access to data and low turnaround time for the jobs in data grids. To address these issues, several data replication and job scheduling strategies have been introduced to offer high data availability, low bandwidth consumption, and reduced turnaround time for grid systems. Multiple copies of existing data are maintained at different locations via data replication. Data replication strategies are broadly categorized as static and dynamic. In static replication strategies, replication is performed during the system design, and replica decisions are generally based on a cost model that includes data access costs, bandwidth characteristics and storage constraints of the grid system. In dynamic replication strategies, the replication operation is managed at runtime so that the system adapts to the changes in user request patterns dynamically. Job scheduling strategies fall under two main categories: online mode and batch mode. The online mode scheduler assigns tasks to sites as soon as they arrive. In the batch mode, the complete set of jobs are taken into account and scheduled at the same time by using all the grid information. In this thesis, we propose a bipartite graph model for tasks and files in the grid system, and then we partition this graph to obtain a data placement and job scheduling strategy. The obtained parts are further refined in order to be assigned to grid sites by using a KL-based heuristic that takes the bandwidth and hop information between sites into account. Replication is achieved by replicating a certain amount of most accessed files chosen prior to the partitioning process. Experimental results indicate that the increase in the partitioning quality reflects positively on the mapping quality. Morever, it is observed that the communication cost is notably decreased when the data replication is applied. Hence, our results show that by replicating a small amount of data files and placing files onto sites using bipartite graph model, we can obtain performance improvement for scheduling jobs compared to no replication.	en_US
dc.description.statementofresponsibility	Dal, Burcu	en_US
dc.format.extent	xiii, 68 leaves, illustrations	en_US
dc.identifier.itemid	B133861
dc.identifier.uri	http://hdl.handle.net/11693/15493
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Data Grids	en_US
dc.subject	Bipartite Graph	en_US
dc.subject	Data Placement	en_US
dc.subject	Job Scheduling	en_US
dc.subject	Data Replication	en_US
dc.subject.lcc	QA76.9.C58 D35 2012	en_US
dc.subject.lcsh	Computational grids (Computer systems)	en_US
dc.subject.lcsh	Graph theory.	en_US
dc.subject.lcsh	Electronic data processing--Backup processing alternatives.	en_US
dc.subject.lcsh	Data recovery (Computer science)	en_US
dc.title	A bipartite graph model for placement, scheduling and replication in data grids	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 0006228.pdf
Size:: 885.5 KB
Format:: Adobe Portable Document Format

Download

Collections

Graduate School of Engineering and Science