Browsing by Subject "Replication"

Now showing 1 - 5 of 5

Open Access
Balance preserving min-cut replication set for a K-way hypergraph partitioning
(2010) Yazıcı, Volkan
Replication is a widely used technique in information retrieval and database systems for providing fault-tolerance and reducing parallelization and processing costs. Combinatorial models based on hypergraph partitioning are proposed for various problems arising in information retrieval and database systems. We consider the possibility of using vertex replication to improve the quality of hypergraph partitioning. In this study, we focus on the Balance Preserving Min-Cut Replication Set (BPMCRS) problem, where we are initially given a maximum replication capacity and a K-way hypergraph partition with an initial imbalance ratio. The objective in the BPMCRS problem is finding optimal vertex replication sets for each part of the given partition such that the initial cutsize of the partition is improved as much as possible and the initial imbalance is either preserved or reduced under the given replication capacity constraint. In order to address the BPMCRS problem, we propose a model based on a unique blend of coarsening and integer linear programming (ILP) schemes. This coarsening algorithm is based on the Dulmage-Mendelsohn decomposition. Experiments show that the ILP formulation coupled with the Dulmage-Mendelsohn decomposition-based coarsening provides high quality results in feasible execution times for reducing the cost of a given K-way hypergraph partition.
Open Access
Investigation of individual factors impacting the effectiveness of requirements inspections: a replicated experiment
(Springer, 2014-02) Albayrak, Ö.; Carver, J. C.
This paper presents a replication of an empirical study regarding the impact of individual factors on the effectiveness of requirements inspections. Experimental replications are important for verifying results and investigating the generality of empirical studies. We utilized the lab package and procedures from the original study, with some changes and additions, to conduct the replication with 69 professional developers in three different companies in Turkey. In general the results of the replication were consistent with those of the original study. The main result from the original study, which is supported in the replication, was that inspectors whose degree is in a field related to software engineering are less effective during a requirements inspection than inspectors whose degrees are in other fields. In addition, we found that Company, Experience, and English Proficiency impacted inspection effectiveness.
Open Access
Minimizing communication through computational redundancy in parallel iterative solvers
(2011) Torun, Fahreddin Şükrü
Sparse matrix vector multiplication (SpMxV) of the form y = Ax is a kernel operation in iterative linear solvers used in scientific applications. In these solvers, the SpMxV operation is performed repeatedly with the same sparse matrix through iterations until convergence. Depending on the matrix and its decomposition, parallel SpMxV operation necessitates communication among processors in the parallel environment. The communication can be reduced by intelligent decomposition. However, we can further decrease the communication through data replication and redundant computation. The communication occurs due to the transfer of x-vector entries in row-parallel SpMxV computation. The input vector x of the next iteration is computed from the output vector of the current iteration through linear vector operations. Hence, a processor may compute a y-vector entry redundantly, which leads to a x-vector entry in the following iteration, instead of receiving that x-vector entry from another processor. Thus, redundant computation of that y-vector entry may lead to reduction in communication. In this thesis, we devise a directed-graph-based model that correctly captures the computation and communication pattern for above-mentioned iterative solvers. Moreover, we formulate the communication minimization by utilizing redundant computation of y-vector entries as a combinatorial problem on this directed graph model. We propose two heuristics to solve this combinatorial problem. Experimental results indicate that the communication reducing strategy by redundantly computing is promising.
Open Access
Query-log aware replicated declustering
(Institute of Electrical and Electronics Engineers, 2013) Turk, A.; Oktay, K. Y.; Aykanat, Cevdet
Data declustering and replication can be used to reduce I/O times related with processing of data intensive queries. Declustering parallelizes the query retrieval process by distributing the data items requested by queries among several disks. Replication enables alternative disk choices for individual disk items and thus provides better query parallelism options. In general, existing replicated declustering schemes do not consider query log information and try to optimize all possible queries for a specific query type, such as range or spatial queries. In such schemes, it is assumed that two or more copies of all data items are to be generated and scheduling of these copies to disks are discussed. However, in some applications, generation of even two copies of all of the data items is not feasible, since data items tend to have very large sizes. In this work, we assume that there is a given limit on disk capacities and thus on replication amounts. We utilize existing query-log information to propose a selective replicated declustering scheme, in which we select the data items to be replicated and decide on their scheduling onto disks while respecting disk capacities. We propose and implement an iterative improvement algorithm to obtain a two-way replicated declustering and use this algorithm in a recursive framework to generate a multiway replicated declustering. Then we improve the obtained multiway replicated declustering by efficient refinement heuristics. Experiments conducted on realistic data sets show that the proposed scheme yields better performance results compared to existing replicated declustering schemes. © 1990-2012 IEEE.
Open Access
Replicated partitioning for undirected hypergraphs
(Academic Press, 2012) Selvitopi, R. O.; Turk, A.; Aykanat, Cevdet
Hypergraph partitioning (HP) and replication are diverse but powerful tools that are traditionally applied separately to minimize the costs of parallel and sequential systems that access related data or process related tasks. When combined together, these two techniques have the potential of achieving significant improvements in performance of many applications. In this study, we provide an approach involving a tool that simultaneously performs replication and partitioning of the vertices of an undirected hypergraph whose vertices represent data and nets represent task dependencies among these data. In this approach, we propose an iterative-improvement-based replicated bipartitioning heuristic, which is capable of move, replication, and unreplication of vertices. In order to utilize our replicated bipartitioning heuristic in a recursive bipartitioning framework, we also propose appropriate cut-net removal, cut-net splitting, and pin selection algorithms to correctly encapsulate the two most commonly used cutsize metrics. We embed our replicated bipartitioning scheme into the state-of-the-art multilevel HP tool PaToH to provide an effective and efficient replicated HP tool, rpPaToH. The performance of the techniques proposed and the tools developed is tested over the undirected hypergraphs that model the communication costs of parallel query processing in information retrieval systems. Our experimental analysis indicates that the proposed technique provides significant improvements in the quality of the partitions, especially under low replication ratios. © 2012 Elsevier Inc. All rights reserved.