Browsing by Subject "Systems analysis"

Now showing 1 - 20 of 23

Open Access
Adaptive prefetching for shared cache based chip multiprocessors
(IEEE, 2009-04) Kandemir, M.; Zhang, Y.; Öztürk, Özcan
Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle tradeoffs between memory bandwidth and performance. In a shared L2 based CMP, multiple cores compete for the shared on-chip cache space and limited off-chip pin bandwidth. Purely software based prefetching techniques tend to increase this contention, leading to degradation in performance. In some cases, prefetches can become harmful by kicking out useful data from the shared cache whose next usage is earlier than the prefetched data, and the fraction of such harmful prefetches usually increases when we increase the number of cores used for executing a multi-threaded application code. In this paper, we propose two complementary techniques to address the problem of harmful prefetches in the context of shared L2 based CMPs. These techniques, namely, suppressing select data prefetches (if they are found to be harmful) and pinning select data in the L2 cache (if they are found to be frequent victim of harmful prefetches), are evaluated in this paper using two embedded application codes. Our experiments demonstrate that these two techniques are very effective in mitigating the impact of harmful prefetches, and as a result, we extract significant benefits from software prefetching even with large core counts. © 2009 EDAA.
Open Access
Analysis of assembly systems for interdeparture time variability and throughput
(Taylor & Francis, 2002) Sabuncuoğlu, İ.; Erel, E.; Kok, A. G.
This paper studies the effect of the number of component stations (parallelism), work transfer, processing time distributions, buffers and buffer allocation schemes on throughput and interdeparture time variability of assembly systems, As an alternative to work transfer, variability transfer is introduced and its effectiveness is assessed. Previous research has indicated that the optimal throughput displays an anomaly at certain processing time distributions and, this phenomenon is now thoroughly analyzed and the underlying details are uncovered. This study also yields several new findings that convey important practical implications.
Open Access
Analysis of design parameters in SIL-4 safety-critical computer
(IEEE, 2017-01) Ahangari, Hamzeh; Özkök, Y. I.; Yıldırım, A.; Say, F.; Atik, Funda; Öztürk, Özcan
Nowadays, Safety-critical computers are extensively used in may civil domains like transportation including railways, avionics and automotive. We noticed that in design of some previous works, some critical safety design parameters like failure diagnostic coverage (DC) or common cause failure (CCF) ratio have not been seriously taken into account. Moreover, in some cases safety has not been compared with standard safety levels (IEC-61508 SIL1-SIL4) or even have not met them. Most often, it is not very clear that which part of the system is the Achilles' heel and how design can be improved to reach standard safety levels. Motivated by such design ambiguities, we aim to study the effect of various design parameters on safety in some prevalent safety configurations: 1oo2 and 2oo3. 1oo1 is also used as a reference. By employing Markov modeling, sensitivity of safety to each of the following critical design parameters is analyzed: failure rate of processing element, failure diagnostics coverage, common cause failures and repair rates. This study gives a deeper sense regarding influence of variation in design parameters over safety. Consequently, to meet appropriate safety integrity level, instead of improving some system parts blindly, it will be possible to make an informed decision on more relevant parameters. © 2017 IEEE.
Open Access
Auction based scheduling for distributed systems
(International Institute of Informatics and Systemics, 2006) Zarifoğlu, Emrah; Sabuncuoğlu, İhsan
Businesses deal with huge databases over a geographically distributed supply network. When this is combined with scheduling and planning needs, it becomes too difficult to handle. Recently, Fast Consumer Goods sector tends to consolidate their manufacturing facilities on a single supplier serving to a distributed customer network. This decentralized structure causes imperfect information sharing between customers and the supplier. We model this problem as a single machine distributed scheduling problem with job agents representing the customers and the machine agent representing the supplier. We developed Auction Based Algorithm by exploiting the opportunity to use game theoretic approach to solve the problem in the decentralized utility case. Results of our extensive computational experiments indicate that Auction Based Algorithm converges to the upper bound found for the total utility measure.
Open Access
Cellular manufacturing system design using a holonistic approach
(Taylor & Francis, 2000) Aktürk, M. S.; Türkcan, A.
We propose an integrated algorithm that will solve the part-family and machine-cell formation problem by simultaneously considering the within-cell layout problem. To the best of our knowledge, this is the first study that considers the efficiency of both individual cells and the overall system in monetary terms. Each cell should make at least a certain amount of profit to attain self-sufficiency, while we maximize the total profit of the system using a holonistic approach. The proposed algorithm provides two alternative solutions; one with independent cells and the other one with inter-cell movement. Our computational experiments indicate that the results are very encouraging for a set of randomly generated problems.
Open Access
Code scheduling for optimizing parallelism and data locality
(Springer, 2010-08-09) Yemliha, T.; Kandemir, M.; Öztürk, Özcan; Kultursay, E.; Muralidhara, S. P.
As chip multiprocessors proliferate, programming support for these devices is likely to receive a lot of attention in the near future. Parallelism and data locality are two critical issues in a chip multiprocessor environment. Unfortunately, most of the published work in the literature focuses only on one of these problems, and this can prevent one from achieving the best possible performance. The main goal of this paper is to propose and evaluate a compiler-directed code parallelization scheme, which considers both parallelism and data locality at the same time. Our compiler captures the inherent parallelism and data reuse in the application code being analyzed using a novel representation called the locality-parallelism graph (LPG). Our partitioning/scheduling algorithm assigns the nodes of this graph to the processors in the architecture and schedules them for execution. We implemented this algorithm and evaluated its effectiveness using a set of benchmark codes. The results collected so far indicate that our approach improves overall execution latency significantly. In this paper, we also introduce an ILP (Integer Linear Programming) based formulation of the problem, and implement the schedule obtained by the ILP solver. The results indicate that our approach gets within 4% of the ILP solution. © 2010 Springer-Verlag.
Open Access
Distributed interactive video system design and analysis
(Institute of Electrical and Electronics Engineers, 1997) Wu, Tsong-Ho; Korpeoglu, I.; Cheng, Bo-Chao
The interactive video (IV) market has been expected to capture a significant share of the huge potential revenues to be generated by the business and residential markets. The level of revenues generated depends on the completion rate of calls the service provider can support, no matter what the IV system or network condition. Thus, a cost-effective, scalable fault-tolerant IV system is needed to maximize the video call completion rate at an affordable cost. This article describes design methodologies for a scalable, fault-tolerant IV system and an IV system design and analysis research prototype called IVSDNA (IV System Designer and Analyzer). The IVSDNA prototype is designed to help network planners and engineers to evaluate quantitative trade-offs (in terms of network communications costs, video storage costs, and degree of system fault tolerance) between two major IV system architectures (centralized and distributed) with a variety of video distribution methods, replication strategies, and fault-tolerant access protocols.
Open Access
The fractional Fourier domain decomposition
(Elsevier, 1999) Kutay, M. A.; Özaktaş, H.; Özaktaş, Haldun M.; Arıkan, Orhan
We introduce the fractional Fourier domain decomposition. A procedure called pruning, analogous to truncation of the singular-value decomposition, underlies a number of potential applications, among which we discuss fast implementation of space-variant linear systems.
Open Access
Hybrid stacked memory architecture for energy efficient embedded chip-multiprocessors based on compiler directed approach
(IEEE, 2015-12) Onsori, Salman; Asad, A.; Öztürk, Özcan; Fathy, M.
Energy consumption becomes the most critical limitation on the performance of nowadays embedded system designs. On-chip memories due to major contribution in overall system energy consumption are always significant issue for embedded systems. Using conventional memory technologies in future designs in nano-scale era causes a drastic increase in leakage power consumption and temperature-related problems. Emerging non-volatile memory (NVM) technologies are promising replacement for conventional memory structure in embedded systems due to its attractive characteristics such as near-zero leakage power, high density and non-volatility. Recent advantages of NVM technologies can significantly mitigate the issue of memory leakage power. However, they introduce new challenges such as limited write endurance and high write energy consumption which restrict them for adoption in modern memory systems. In this article, we propose a stacked hybrid memory system to minimize energy consumption for 3D embedded chip-multiprocessors (eCMP). For reaching this target, we present a convex optimization-based model to distribute data blocks between SRAM and NVM banks based on data access pattern derived by compiler. Our compiler-assisted hybrid memory architecture can achieve up to 51.28 times improvement in lifetime. In addition, experimental results show that our proposed method reduce energy consumption by 56% on average compared to the traditional memory design where single technology is used. © 2015 IEEE.
Open Access
Interaction of design and operational parameters in periodic review kanban systems
(Taylor & Francis, 2003) Erhun, F.; Aktürk, M. S.; Türkcan, A.
In this study, we propose an analytical model to determine the withdrawal cycle length, kanban sizes and number of kanbans simultaneously in a multi-item, multi-stage, multi-period, capacitated periodic review kanban system. Traditionally, research in kanban systems has been separated into 'design level' research, and 'shop floor level' research. In both veins of research, especially for the design level, various algorithms have been developed. However, it is important to state that there is a significant relationship between the design parameters and the operational issues, such as kanban schedules and actual lead times. Therefore, we analytically consider the interdependencies between the design and operational decisions, and evaluate their impact on each other.
Open Access
Market-driven approach based on Markov decision theory for optimal use of resources in software development
(Institution of Engineering and Technology, 2004) Noppen, J.; Aksit, M.; Nicola, V.; Tekinerdogan, B.
Changes in requirements may have a severe impact on development processes. For example, if requirements change during the course of a software development activity, it may be necessary to reschedule development activities so that the new requirements can be addressed in a timely manner. Unfortunately, current software development methods do not provide explicit means to adapt development processes with respect to changes in requirements. The paper proposes a method based on Markov decision theory, which determines the estimated optimal development schedule with respect to probabilistic product demands and resource constraints. This method is supported by a tool and applied to an industrial case.
Open Access
NS-SRAM: neighborhood solidarity SRAM for reliability enhancement of SRAM memories
(IEEE, 2016-08-09) Alouani, I.; Ahangari, Hamzeh; Öztürk, Özcan; Niar, S.
Technology shift and voltage scaling increased the susceptibility of Static Random Access Memories (SRAMs) to errors dramatically. In this paper, we present NS-SRAM, for Neighborhood Solidarity SRAM, a new technique to enhance error resilience of SRAMs by exploiting the adjacent memory bit data. Bit cells of a memory line are paired together in circuit level to mutually increase the static noise margin and critical charge of a cell. Unlike existing techniques, NS-SRAM aims to enhance both Bit Error Rate (BER) and Soft Error rate (SER) at the same time. Due to auto-adaptive joiners, each of the adjacent cells' nodes is connected to its counterpart in the neighbor bit. NS-SRAM enhances read-stability by increasing critical Read Static Noise Margin (RSNM), thereby decreasing faults when circuit operates under voltage scaling. It also increases hold-stability and critical charge to mitigate soft-errors. By the proposed technique, reliability of SRAM based structures such as cache memories and register files can drastically be improved with comparable area overhead to existing hardening techniques. Moreover it does not require any extra-memory, does not impact the memory effective size, and has no negative impact on performance. © 2016 IEEE.
Open Access
On the design of dynamic associative neural memories
(IEEE, 1994) Savran, M. E.; Morgül, Ö.
We consider the design problem for a class of discrete-time and continuous-time neural networks. We obtain a characterization of all connection weights that store a given set of vectors into the network; that is, each given vector becomes an equilibrium point of the network. We also give sufficient conditions that guarantee the asymptotic stability of these equilibrium points.
Open Access
On-chip memory space partitioning for chip multiprocessors using polyhedral algebra
(The Institution of Engineering and Technology, 2010) Ozturk, O.; Kandemir, M.; Irwin, M. J.
One of the most important issues in designing a chip multiprocessor is to decide its on-chip memory organisation. While it is possible to design an application-specific memory architecture, this may not necessarily be the best option, in particular when storage demands of individual processors and/or their data sharing patterns can change from one point in execution to another for the same application. Here, two problems are formulated. First, we show how a polyhedral method can be used to design, for array-based data-intensive embedded applications, an application-specific hybrid memory architecture that has both shared and private components. We evaluate the resulting memory configurations using a set of benchmarks and compare them to pure private and pure shared memory on-chip multiprocessor architectures. The second approach proposed consider dynamic configuration of software-managed on-chip memory space to adapt to the runtime variations in data storage demand and interprocessor sharing patterns. The proposed framework is fully implemented using an optimising compiler, a polyhedral tool, and a memory partitioner (based on integer linear programming), and is tested using a suite of eight data-intensive embedded applications. © 2010 © The Institution of Engineering and Technology.
Open Access
Online solutions for scalable file server systems
(ACM, 2006) Tse, Savio S. H.
We propose three online algorithms for scalable file server systems. A scalable file server is expected to provide rather stable services while the numbers of users, tasks, and data volumes keep increasing. One of the purposes of parallel and distributed approaches is to achieve scalability. Sufficient hardware resources are essential for good services; however, a good coordination of them is also indispensable, as parallel and distributed resources need to complement the shortages of each other, and it falls on the shoulders of the algorithmic and architectural designs. In this paper, we address the load balancing problem in scalable file servers. The three online approximate algorithms proposed is for placing and deleting documents in a system of M distributed file servers located in a cluster in order to balance the loads and required storage spaces among all servers. In [7], we have proposed some algorithms without allowing re-allocation. In this paper, by paying the re-allocation cost, we have several improvements on some existing results. © 2006 ACM.
Open Access
Optimization-based power and thermal management for dark silicon aware 3D chip multiprocessors using heterogeneous cache hierarchy
(Elsevier BV, 2017) Asad, A.; Ozturk, O.; Fathy, M.; Jahed-Motlagh, M. R.
Management of a problem recently known as “dark silicon” is a new challenge in multicore designs. Prior innovative studies have addressed the dark silicon problem in the fields of power-efficient core design. However, addressing dark silicon challenges in uncore component designs such as cache hierarchy, on-chip interconnect etc. that consume significant portion of the on-chip power consumption is largely unexplored. In this paper, for the first time, we propose an integrated approach which considers the impact of power consumption of core and uncore components simultaneously to improve multi/many-core performance in the dark silicon era. The proposed approach dynamically (1) predicts the changing program behavior on each core; (2) re-determines frequency/voltage, cache capacity and technology in each level of the cache hierarchy based on the program's scalability in order to satisfy the power and temperature constraints. In the proposed architecture, for future chip-multiprocessors (CMPs), we exploit emerging technologies such as non-volatile memories (NVMs) and 3D techniques to combat dark silicon. Also, for the first time, we propose a detailed power model which is useful for future dark silicon CMPs power modeling. Experimental results on SPEC 2000/2006 benchmarks show that the proposed method improves throughput by about 54.3% and energy-delay product by about 61% on average, respectively, in comparison with the conventional CMP architecture with homogenous cache system. (A preliminary short version of this work was presented in the 18th Euromicro Conference on Digital System Design (DSD), 2015.) © 2017 Elsevier B.V.
Open Access
Process variation aware thread mapping for chip multiprocessors
(IEEE, 2009-04) Hong, S.; Narayanan, S. H. K.; Kandemir, M.; Özturk, Özcan
With the increasing scaling of manufacturing technology, process variation is a phenomenon that has become more prevalent. As a result, in the context of Chip Multiprocessors (CMPs) for example, it is possible that identically-designed processor cores on the chip have non-identical peak frequencies and power consumptions. To cope with such a design, each processor can be assumed to run at the frequency of the slowest processor, resulting in wasted computational capability. This paper considers an alternate approach and proposes an algorithm that intelligently maps (and remaps) computations onto available processors so that each processor runs at its peak frequency. In other words, by dynamically changing the thread-to-processor mapping at runtime, our approach allows each processor to maximize its performance, rather than simply using chip-wide lowest frequency amongst all cores and highest cache latency. Experimental evidence shows that, as compared to a process variation agnostic thread mapping strategy, our proposed scheme achieves as much as 29% improvement in overall execution latency, average improvement being 13% over the benchmarks tested. We also demonstrate in this paper that our savings are consistent across different processor counts, latency maps, and latency distributions.With the increasing scaling of manufacturing technology, process variation is a phenomenon that has become more prevalent. As a result, in the context of Chip Multiprocessors (CMPs) for example, it is possible that identically-designed processor cores on the chip have non-identical peak frequencies and power consumptions. To cope with such a design, each processor can be assumed to run at the frequency of the slowest processor, resulting in wasted computational capability. This paper considers an alternate approach and proposes an algorithm that intelligently maps (and remaps) computations onto available processors so that each processor runs at its peak frequency. In other words, by dynamically changing the thread-to-processor mapping at runtime, our approach allows each processor to maximize its performance, rather than simply using chip-wide lowest frequency amongst all cores and highest cache latency. Experimental evidence shows that, as compared to a process variation agnostic thread mapping strategy, our proposed scheme achieves as much as 29% improvement in overall execution latency, average improvement being 13% over the benchmarks tested. We also demonstrate in this paper that our savings are consistent across different processor counts, latency maps, and latency distributions. © 2009 EDAA.
Open Access
Revisitation of the simulation methodologies and applications in manufacturing
(2011) Ramanan, R.; Sabuncuoğlu, İhsan
Manufacturing is one of the largest application areas of simulation. For the purpose of understanding where, how and why the simulation is used in the manufacturing, this survey classifies the manufacturing system into two broad areas viz. manufacturing system design and manufacturing system operations. The two broad areas are further subdivided for this study. The survey discusses the evolution of the subdivisions before detailing the need of simulation in each of the sub divisions of the manufacturing systems. Finally, a discussion is made in order to understand where the research is heading for and identifying the future directions.
Open Access
A scratch-pad memory aware dynamic loop scheduling algorithm
(IEEE, 2008-03) Öztürk, Özcan; Kandemir, M.; Narayanan, S. H. K.
Executing array based applications on a chip multiprocessor requires effective loop parallelization techniques. One of the critical issues that need to be tackled by an optimizing compiler in this context is loop scheduling, which distributes the iterations of a loop to be executed in parallel across the available processors. Most of the existing work in this area targets cache based execution platforms. In comparison, this paper proposes the first dynamic loop scheduler, to our knowledge, that targets scratch-pad memory (SPM) based chip multiprocessors, and presents an experimental evaluation of it. The main idea behind our approach is to identify the set of loop iterations that access the SPM and those that do not. This information is exploited at runtime to balance the loads of the processors involved in executing the loop nest at hand. Therefore, the proposed dynamic scheduler takes advantage of the SPM in performing the loop iteration-to-processor mapping. Our experimental evaluation with eight array/loop intensive applications reveals that the proposed scheduler is very effective in practice and brings between 13.7% and 41.7% performance savings over a static loop scheduling scheme, which is also tested in our experiments. © 2008 IEEE.
Open Access
Shared scratch pad memory space management across applications
(Inderscience Publishers, 2009) Ozturk, Ozcan; Kandemir, M.; Son, S. W.; Kolcu, I.
Scratch Pad Memories (SPMs) have received considerable attention lately as on-chip memory building blocks. The main characteristic that distinguishes an SPM from a conventional cache memory is that the data flow is controlled by software. The main focus of this paper is the management of an SPM space shared by multiple applications that can potentially share data. The proposed approach has three major components; a compiler analysis phase, a runtime space partitioner, and a local partitioning phase. Our experimental results show that the proposed approach leads to minimum completion time among all alternate memory partitioning schemes tested.