Browsing by Subject "Computer Architecture"
Now showing 1 - 8 of 8
- Results Per Page
- Sort Options
Item Open Access Algorithms for efficient vectorization of repeated sparse power system network computations(IEEE, 1995) Aykanat, Cevdet; Özgü, Ö.; Güven, N.Standard sparsity-based algorithms used in power system appllcations need to be restructured for efficient vectorization due to the extremely short vectors processed. Further, intrinsic architectural features of vector computers such as chaining and sectioning should also be exploited for utmost performance. This paper presents novel data storage schemes and vectorization alsorim that resolve the recurrence problem, exploit chaining and minimize the number of indirect element selections in the repeated solution of sparse linear system of equations widely encountered in various power system problems. The proposed schemes are also applied and experimented for the vectorization of power mismatch calculations arising in the solution phase of FDLF which involves typical repeated sparse power network computations. The relative performances of the proposed and existing vectorization schemes are evaluated, both theoretically and experimentally on IBM 3090ArF.Item Open Access A general purpose VLSI median filter and its applications for image processing(IEEE, 1989) Karaman, Mustafa; Onural, Levent; Atalar, AbdullahA general-purpose median filter configuration consisting of two single-chip median filters is proposed. One of the chips is designed for applications requiring variable word-length and variable window size, whereas the other is for real-time applications. The architectures of the chips are based on odd/even transposition sorting. The chips are implemented in 3-μm M2CMOS using full-custom VLSI design techniques. The chips together with a reasonable external hardware can be used for the realizations of many median filtering techniques. The VLSI design procedure of the chips and their applications to different median filtering techniques for image processing are presented.Item Open Access Image-space decomposition algorithms for sort-first parallel volume rendering of unstructured grids(Springer, 2000) Kutluca, H.; Kurç, T. M.; Aykanat, CevdetTwelve adaptive image-space decomposition algorithms are presented for sort-first parallel direct volume rendering (DVR) of unstructured grids on distributed-memory architectures. The algorithms are presented under a novel taxonomy based on the dimension of the screen decomposition, the dimension of the workload arrays used in the decomposition, and the scheme used for workload-array creation and querying the workload of a region. For the 2D decomposition schemes using 2D workload arrays, a novel scheme is proposed to query the exact number of screen-space bounding boxes of the primitives in a screen region in constant time. A probe-based chains-on-chains partitioning algorithm is exploited for load balancing in optimal 1D decomposition and iterative 2D rectilinear decomposition (RD). A new probe-based optimal 2D jagged decomposition (OJD) is proposed which is much faster than the dynamic-programming based OJD scheme proposed in the literature. The summed-area table is successfully exploited to query the workload of a rectangular region in constant time in both OJD and RD schemes for the subdivision of general 2D workload arrays. Two orthogonal recursive bisection (ORB) variants are adapted to relax the straight-line division restriction in conventional ORB through using the medians-of-medians approach on regular mesh and quadtree superimposed on the screen. Two approaches based on the Hilbert space-filling curve and graph-partitioning are also proposed. An efficient primitive classification scheme is proposed for redistribution in 1D, and 2D rectilinear and jagged decompositions. The performance comparison of the decomposition algorithms is modeled by establishing appropriate quality measures for load-balancing, amount of primitive replication and parallel execution time. The experimental results on a Parsytec CC system using a set of benchmark volumetric datasets verify the validity of the proposed performance models. The performance evaluation of the decomposition algorithms is also carried out through the sort-first parallelization of an efficient DVR algorithm.Item Open Access A parallel progressive radiosity algorithm based on patch data circulation(Pergamon Press, 1996) Aykanat, Cevdet; Çapin, T. K.; Özgüç, B.Abstract - Current research on radiosity has concentrated on increasing the accuracy and the speed of the solution. Although algorithmic and meshing techniques decrease the execution time, still excessive computational power is required for complex scenes. Hence, parallelism can be exploited for speeding up the method further. This paper aims at providing a thorough examination of parallelism in the basic progressive refinement radiosity, and investigates its parallelization on distributed-memory parallel architectures. A synchronous scheme, based on static task assignment, is proposed to achieve better coherence for shooting patch selections. An efficient global circulation scheme is proposed for the parallel light distribution computations, which reduces the total volume of concurrent communication by an asymptotical factor. The proposed parallel algorithm is implemented on an Intel's iPSC/2 hypercube multicomputer. Load balance qualities of the proposed static assignment schemes are evaluated experimentally. The effect of coherence in the parallel light distribution computations on the shooting patch selection sequence is also investigated. Theoretical and experimental evaluation is also presented to verify that the proposed parallelization scheme yields equally good performance on multicomputers implementing the simplest (e.g. ring) as well as the richest (e.g. hypercube) interconnection topologies. This paper also proposes and presents a parallel load re-balancing scheme which enhances our basic parallel radiosity algorithm to be usable in the parallelization of radiosity methods adopting adaptive subdivision and meshing techniques. Copyright © 1996 Elsevier Science Ltd.Item Open Access A parallel scaled conjugate-gradient algorithm for the solution phase of gathering radiosity on hypercubes(Springer, 1997) Kurç, T. M.; Aykanat, Cevdet; Özgüç, B.Gathering radiosity is a popular method for investigating lighting effects in a closed environment. In lighting simulations, with fixed locations of objects and light sources, the intensity and color and/or reflectivity vary. After the form-factor values are computed, the linear system of equations is solved repeatedly to visualize these changes. The scaled conjugate-gradient method is a powerful technique for solving large sparse linear systems of equations with symmetric positive definite matrices. We investigate this method for the solution phase. The nonsymmetric form-factor matrix is transformed into a symmetric matrix. We propose an efficient data redistribution scheme to achieve almost perfect load balance. We also present several parallel algorithms for form-factor computation.Item Open Access A rule-based video database system architecture(Elsevier, 2002) Dönderler, M. E.; Ulusoy, Özgür; Güdükbay, UğurWe propose a novel architecture for a video database system incorporating both spatio-temporal and semantic (keyword, event/activity and category-based) query facilities. The originality of our approach stems from the fact that we intend to provide full support for spatio-temporal, relative object-motion and similarity-based object-trajectory queries by a rule-based system utilizing a knowledge-base while using an object-relational database to answer semantic-based queries. Our method of extracting and modeling spatio-temporal relations is also a unique one such that we segment video clips into shots using spatial relationships between objects in video frames rather than applying a traditional scene detection algorithm. The technique we use is simple, yet novel and powerful in terms of effectiveness and user query satisfaction: video clips are segmented into shots whenever the current set of relations between objects changes and the video frames, where these changes occur, are chosen as keyframes. The directional, topological and third-dimension relations used for shots are those of the keyframes selected to represent the shots and this information is kept, along with frame numbers of the keyframes, in a knowledge-base as Prolog facts. The system has a comprehensive set of inference rules to reduce the number of facts stored in the knowledge-base because a considerable number of facts, which otherwise would have to be stored explicitly, can be derived by rules with some extra effort. © 2002 Elsevier Science Inc. All rights reserved.Item Open Access A study of two transaction-processing architectures for distributed real-time data base systems(Elsevier, 1995) Ulusoy, ÖzgürA real-time data base system (RTDBS) is designed to provide timely response to the transactions of data-intensive applications. Processing a transaction in a distributed RTDBS environment presents the design choice of how to provide access to remote data referenced by the transaction. Satisfaction of the timing constraints of transactions should be the primary factor to be considered in scheduling accesses to remote data. In this article, we describe and analyze two different alternative approaches to this fundamental design decision. With the first alternative, transaction operations are executed at the sites where required data pages reside. The other alternative is based on transmitting data pages wherever they are needed. Although the latter approach is characterized by large message volumes carrying data pages, it is shown in our experiments to perform better than the other approach under most of the work loads and system configurations tested. The performance metric used in the evaluations is the fraction of transactions that satisfy their timing constraints. © 1995.Item Open Access Transaction processing in distributed active real-time database systems(Elsevier, 1998) Ulusoy, ÖzgürAn active real-time database system (ARTDBS) is designed to provide timely response to the critical situations that are defined on database states. Although a number of studies have already addressed various issues in ARTDBSs, little attention has been paid to scheduling transactions in a distributed ARTDBS environment. In this paper, 2 we describe a detailed performance model of a distributed ARTDBS and investigate various performance issues in time-cognizant transaction processing in ARTDBSs. The experiments conducted evaluate the performance under various types of active workload and different distributed transaction-processing architectures. The performance metric used in the evaluations is the fraction of transactions that violate their timing constraints. We also describe and evaluate a nested transaction execution scheme that improves the real-time performance under high levels of active workload. © 1998 Elsevier Science Inc. All rights reserved.