Browsing by Subject "Embedded systems"
Now showing 1 - 18 of 18
Results Per Page
Sort Options
Item Open Access Access pattern-based code compression for memory-constrained systems(Association for Computing Machinery, 2008-09) Ozturk, O.; Kandemir, M.; Chen, G.As compared to a large spectrum of performance optimizations, relatively less effort has been dedicated to optimize other aspects of embedded applications such as memory space requirements, power, real-time predictability, and reliability. In particular, many modern embedded systems operate under tight memory space constraints. One way of addressing this constraint is to compress executable code and data as much as possible. While researchers on code compression have studied efficient hardware and software based code compression strategies, many of these techniques do not take application behavior into account; that is, the same compression/decompression strategy is used irrespective of the application being optimized. This article presents an application-sensitive code compression strategy based on control flow graph (CFG) representation of the embedded program. The idea is to start with a memory image wherein all basic blocks of the application are compressed, and decompress only the blocks that are predicted to be needed in the near future. When the current access to a basic block is over, our approach also decides the point at which the block could be compressed. We propose and evaluate several compression and decompression strategies that try to reduce memory requirements without excessively increasing the original instruction cycle counts. Some of our strategies make use of profile data, whereas others are fully automatic. Our experimental evaluation using seven applications from the MediaBench suite and three large embedded applications reveals that the proposed code compression strategy is very successful in practice. Our results also indicate that working at a basic block granularity, as opposed to a procedure granularity, is important for maximizing memory space savings. © 2008 ACM.Item Open Access Adaptive compute-phase prediction and thread prioritization to mitigate memory access latency(ACM, 2014-06) Aktürk, İsmail; Öztürk, ÖzcanThe full potential of chip multiprocessors remains unex- ploited due to the thread oblivious memory access sched- ulers used in off-chip main memory controllers. This is especially pronounced in embedded systems due to limita- Tions in memory. We propose an adaptive compute-phase prediction and thread prioritization algorithm for memory access scheduling for embedded chip multiprocessors. The proposed algorithm eficiently categorize threads based on execution characteristics and provides fine-grained priori- Tization that allows to differentiate threads and prioritize their memory access requests accordingly. The threads in compute phase are prioritized among the threads in mem- ory phase. Furthermore, the threads in compute phase are prioritized among themselves based on the potential of mak- ing more progress in their execution. Compared to the prior works First-Ready First-Come First-Serve (FR-FCFS) and Compute-phase Prediction with Writeback-Refresh Overlap (CP-WO), the proposed algorithm reduces the execution time of the generated workloads up to 23.6% and 12.9%, respectively. Copyright 2014 ACM.Item Open Access An aspect-oriented tool framework for developing process-sensitive embedded user assistance systems(2011) Tekinerdoǧan, B.; Bozbey, S.; Mester, Y.; Turançiftci, E.; Alkişlar L.Process-sensitive embedded user assistance aims to provide the end-user the necessary guidance based on the state of the process that is being followed. Unfortunately, the development of these systems is not trivial and has to meet several challenges. The main difficulties appear to be related to integration of process-sensitive guidance in the application and the crosscutting behavior of help concerns. To address these issues we developed an aspect-oriented tool framework Assistant-Pro that can be used to develop process-sensitive embedded user assistance for multiple applications. The framework provides tools for defining the process model, defining guidance related to process steps, and modularizing and weaving help concerns in the target application for which user guidance needs to be provided. The framework has been developed and validated in the context of Aselsan, a large Turkish defense electronics company. © 2011 Springer-Verlag Berlin Heidelberg.Item Open Access Big-data streaming applications scheduling based on staged multi-armed bandits(Institute of Electrical and Electronics Engineers, 2016) Kanoun, K.; Tekin, C.; Atienza, D.; Van Der Schaar, M.Several techniques have been recently proposed to adapt Big-Data streaming applications to existing many core platforms. Among these techniques, online reinforcement learning methods have been proposed that learn how to adapt at run-time the throughput and resources allocated to the various streaming tasks depending on dynamically changing data stream characteristics and the desired applications performance (e.g., accuracy). However, most of state-of-the-art techniques consider only one single stream input in its application model input and assume that the system knows the amount of resources to allocate to each task to achieve a desired performance. To address these limitations, in this paper we propose a new systematic and efficient methodology and associated algorithms for online learning and energy-efficient scheduling of Big-Data streaming applications with multiple streams on many core systems with resource constraints. We formalize the problem of multi-stream scheduling as a staged decision problem in which the performance obtained for various resource allocations is unknown. The proposed scheduling methodology uses a novel class of online adaptive learning techniques which we refer to as staged multi-armed bandits (S-MAB). Our scheduler is able to learn online which processing method to assign to each stream and how to allocate its resources over time in order to maximize the performance on the fly, at run-time, without having access to any offline information. The proposed scheduler, applied on a face detection streaming application and without using any offline information, is able to achieve similar performance compared to an optimal semi-online solution that has full knowledge of the input stream where the differences in throughput, observed quality, resource usage and energy efficiency are less than 1, 0.3, 0.2 and 4 percent respectively.Item Open Access CDs have fingerprints too(Springer, Berlin, Heidelberg, 2009) Hammouri G.; Dana, Aykutlu; Sunar, B.We introduce a new technique for extracting unique fingerprints from identical CDs. The proposed technique takes advantage of manufacturing variability found in the length of the CD lands and pits. Although the variability measured is on the order of 20 nm, the technique does not require the use of microscopes or any advanced equipment. Instead, we show that the electrical signal produced by the photodetector inside the CD reader is sufficient to measure the desired variability. We investigate the new technique by analyzing data collected from 100 identical CDs and show how to extract a unique fingerprint for each CD. Furthermore, we introduce a technique for utilizing fuzzy extractors over the Lee metric without much change to the standard code offset construction. Finally, we identify specific parameters and a code construction to realize the proposed fuzzy extractor and convert the derived fingerprints into 128-bit cryptographic keys. © 2009 Springer.Item Open Access Effective kernel mapping for OpenCL applications in heterogeneous platforms(Institute of Electrical and Electronics Engineers, 2012-09) Albayrak, Ömer Erdil; Aktürk, İsmail; Öztürk, ÖzcanMany core accelerators are being deployed in many systems to improve the processing capabilities. In such systems, application mapping need to be enhanced to maximize the utilization of the underlying architecture. Especially in GPUs mapping becomes critical for multi-kernel applications as kernels may exhibit different characteristics. While some of the kernels run faster on GPU, others may refer to stay in CPU due to the high data transfer overhead. Thus, heterogeneous execution may yield to improved performance compared to executing the application only on CPU or only on GPU. In this paper, we propose a novel profiling-based kernel mapping algorithm to assign each kernel of an application to the proper device to improve the overall performance of an application. We use profiling information of kernels on different devices and generate a map that identifies which kernel should run on where to improve the overall performance of an application. Initial experiments show that our approach can effectively map kernels on CPU and GPU, and outperforms to a CPU-only and GPU-only approach. © 2012 IEEE.Item Open Access A heterogeneous memory organization with minimum energy consumption in 3D chip-multiprocessors(IEEE, 2016-05) Asad, Arghavan; Onsori, Salman; Fathy, M.; Jahed-Motlagh, M. R.; Raahemifar, K.Main memories play an important role in overall energy consumption of embedded systems. Using conventional memory technologies in future designs in nanoscale era cause a drastic increase in leakage power consumption and temperature-related problems. Emerging non-volatile memory (NVM) technologies offer many desirable characteristics such as near-zero leakage power, high density and non-volatility. They can significantly mitigate the issue of memory leakage power in future embedded chip-multiprocessor (eCMP) systems. However, they suffer from challenges such as limited write endurance and high write energy consumption which restrict them for adoption in modern memory systems. In this article, we propose a stacked hybrid memory system for 3D chip-multiprocessors to take advantages of both traditional and non-volatile memory technologies. For reaching this target, we present a convex optimization-based model that minimizes the system energy consumption while satisfy endurance constraint in order to design a reliable memory system. Experimental results show that the proposed method improves energy-delay product (EDP) and performance by about 44.8% and 13.8% on average respectively compared with the traditional memory design where single technology is used. © 2016 IEEE.Item Open Access Hybrid stacked memory architecture for energy efficient embedded chip-multiprocessors based on compiler directed approach(IEEE, 2015-12) Onsori, Salman; Asad, A.; Öztürk, Özcan; Fathy, M.Energy consumption becomes the most critical limitation on the performance of nowadays embedded system designs. On-chip memories due to major contribution in overall system energy consumption are always significant issue for embedded systems. Using conventional memory technologies in future designs in nano-scale era causes a drastic increase in leakage power consumption and temperature-related problems. Emerging non-volatile memory (NVM) technologies are promising replacement for conventional memory structure in embedded systems due to its attractive characteristics such as near-zero leakage power, high density and non-volatility. Recent advantages of NVM technologies can significantly mitigate the issue of memory leakage power. However, they introduce new challenges such as limited write endurance and high write energy consumption which restrict them for adoption in modern memory systems. In this article, we propose a stacked hybrid memory system to minimize energy consumption for 3D embedded chip-multiprocessors (eCMP). For reaching this target, we present a convex optimization-based model to distribute data blocks between SRAM and NVM banks based on data access pattern derived by compiler. Our compiler-assisted hybrid memory architecture can achieve up to 51.28 times improvement in lifetime. In addition, experimental results show that our proposed method reduce energy consumption by 56% on average compared to the traditional memory design where single technology is used. © 2015 IEEE.Item Open Access Implications of non-volatile memory as primary storage for database management systems(IEEE, 2017) Mustafa, Naveed Ul; Armejach, A.; Öztürk, Özcan; Cristal, A.; Unsal, O. S.Traditional Database Management System (DBMS) software relies on hard disks for storing relational data. Hard disks are cheap, persistent, and offer huge storage capacities. However, data retrieval latency for hard disks is extremely high. To hide this latency, DRAM is used as an intermediate storage. DRAM is significantly faster than disk, but deployed in smaller capacities due to cost and power constraints, and without the necessary persistency feature that disks have. Non-Volatile Memory (NVM) is an emerging storage class technology which promises the best of both worlds. It can offer large storage capacities, due to better scaling and cost metrics than DRAM, and is non-volatile (persistent) like hard disks. At the same time, its data retrieval time is much lower than that of hard disks and it is also byte-addressable like DRAM. In this paper, we explore the implications of employing NVM as primary storage for DBMS. In other words, we investigate the modifications necessary to be applied on a traditional relational DBMS to take advantage of NVM features. As a case study, we have modified the storage engine (SE) of PostgreSQL enabling efficient use of NVM hardware. We detail the necessary changes and challenges such modifications entail and evaluate them using a comprehensive emulation platform. Results indicate that our modified SE reduces query execution time by up to 40% and 14.4% when compared to disk and NVM storage, with average reductions of 20.5% and 4.5%, respectively. © 2016 IEEE.Item Open Access Improving application behavior on heterogeneous manycore systems through kernel mapping(2013) Albayrak, O. E.; Akturk, I.; Ozturk, O.Many-core accelerators are being more frequently deployed to improve the system processing capabilities. In such systems, application mapping must be enhanced to maximize utilization of the underlying architecture. Especially, in graphics processing units (GPUs), mapping kernels that are part of multi-kernel applications has a great impact on overall performance, since kernels may exhibit different characteristics on different CPUs and GPUs. While some kernels run faster on GPUs, others may perform better in CPUs. Thus, heterogeneous execution may yield better performance than executing the application only on a CPU or only on a GPU. In this paper, we investigate on two approaches: a novel profiling-based adaptive kernel mapping algorithm to assign each kernel of an application to the proper device, and a Mixed-Integer Programming (MIP) implementation to determine optimal mapping. We utilize profiling information for kernels on different devices and generate a map that identifies which kernel should run where in order to improve the overall performance of an application. Initial experiments show that our approach can efficiently map kernels on CPUs and GPUs, and outperforms CPU-only and GPU-only approaches. © 2013 Elsevier B.V. All rights reserved.Item Open Access Instruction-level reliability improvement for embedded systems(IEEE, 2020-09) Tekgül, Hakan; Öztürk, ÖzcanWith the increasing number of applications in embedded computing systems, it became indispensable for the system designers to consider multiple objectives including power, performance, and reliability. Among these, reliability is a bigger constraint for safety critical applications. For example, fault tolerance of transportation systems has become very critical with the use of many embedded on-board devices. There are many techniques proposed in the past decade to increase the fault tolerance of such systems. However, many of these techniques come with a significant overhead, which make them infeasible in most of the embedded execution scenarios. Motivated by this observation, our main contribution in this paper is to propose and evaluate an instruction criticality based reliable source code generation algorithm. Specifically, we propose an instruction ranking formula based on our detailed fault injection experiments. We use instruction rankings along with the overhead tolerance limits and generate a source code with increased fault tolerance. The primary goal behind this work is to improve reliability of an application while keeping the performance effects minimal. We apply state-of-the-art reliability techniques to evaluate our approach on a set of benchmarks. Our experimental results show that, the proposed approach achieves up to 8% decrease in error rates with only 10% performance overhead. The error rates further decrease with higher overhead tolerances.Item Open Access A modular real-time fieldbus architecture for mobile robotic platforms(Institute of Electrical and Electronics Engineers, 2011-03) Saranlı U.; Avcı, A.; Öztürk, M. C.The design and construction of complex and reconfigurable embedded systems such as small autonomous mobile robots is a challenging task that involves the selection, interfacing, and programming of a large number of sensors and actuators. Facilitating this tedious process requires modularity and extensibility both in hardware and software components. In this paper, we introduce the universal robot bus (URB), a real-time fieldbus architecture that facilitates rapid integration of heterogeneous sensor and actuator nodes to a central processing unit (CPU) while providing a software abstraction that eliminates complications arising from the lack of hardware homogeneity. Motivated by our primary application area of mobile robotics, URB is designed to be very lightweight and efficient, with real-time support for Recommended Standard (RS) 232 or universal serial bus connections to a central computer and inter-integrated circuit (I(2)C), controller area network, or RS485 bus connections to embedded nodes. It supports automatic synchronization of data acquisition across multiple nodes, provides high data bandwidth at low deterministic latencies, and includes flexible libraries for modular software development both for local nodes and the CPU. This paper describes the design of the URB architecture, provides a careful experimental characterization of its performance, and demonstrates its utility in the context of its deployment in a legged robot platform.Item Open Access Photonic devices and systems embedded with nanocyrstals(SPIE, 2006) Demir, Hilmi Volkan; Soğancı, Ibrahim Murat; Mutlugun, Evren; Tek, Sümeyra; Huyal, Ilkem OzgeWe review our research work on the development of photonic devices and systems embedded with nanocyrstals for new functionality within EU Phoremost Network of Excellence on nanophotonics. Here we report on CdSe/ZnS nanocrystalbased hybrid optoelectronic devices and systems used for scintillation to enhance optical detection and imaging in the ultraviolet range and for optical modulation via electric field dependent optical absorption and photoluminescence in the visible. In our collaboration with DYO, we also present photocatalytic TiO2 nanoparticles incorporated in solgel matrix that are optically activated in the ultraviolet for the purpose of self-cleaning.Item Open Access Runtime verification of component-based embedded software(Springer, 2011-09) Sözer, Hasan; Hofmann, C.; Tekinerdoğan, Bedir; Akşit, M.To deal with increasing size and complexity, component-based software development has been employed in embedded systems. Due to several faults, components can make wrong assumptions about the working mode of the system and the working modes of the other components. To detect mode inconsistencies at runtime, we propose a "lightweight" error detection mechanism, which can be integrated with component-based embedded systems. We define links among three levels of abstractions: the runtime behavior of components, the working mode specifications of components and the specification of the working modes of the system. This allows us to detect the user observable runtime errors. The effectiveness of the approach is demonstrated by implementing a software monitor integrated into a TV system. © 2012 Springer-Verlag London Limited.Item Open Access Slicing based code parallelization for minimizing inter-processor communication(ACM, 2009-10) Kandemir, M.; Zhang, Y.; Muralidhara, S. P.; Öztürk, Özcan; Narayanan, S. H. K.One of the critical problems in distributed memory multi-core architectures is scalable parallelization that minimizes inter-processor communication. Using the concept of iteration space slicing, this paper presents a new code parallelization scheme for data-intensive applications. This scheme targets distributed memory multi-core architectures, and formulates the problem of data-computation distribution (partitioning) across parallel processors using slicing such that, starting with the partitioning of the output arrays, it iteratively determines the partitions of other arrays as well as iteration spaces of the loop nests in the application code. The goal is to minimize inter-processor data communications. Based on this iteration space slicing based formulation of the problem, we also propose a solution scheme. The proposed data-computation scheme is evaluated using six data-intensive benchmark programs. In our experimental evaluation, we also compare this scheme against three alternate data-computation distribution schemes. The results obtained are very encouraging, indicating around 10% better speedup, with 16 processors, over the next-best scheme when averaged over all benchmark codes we tested. Copyright 2009 ACM.Item Open Access SPM management using markov chain based data access prediction(IEEE, 2008-11) Yemliha, T.; Srikantaiah, S.; Kandemir, M.; Öztürk, ÖzcanLeveraging the power of scratchpad memories (SPMs) available in most embedded systems today is crucial to extract maximum performance from application programs. While regular accesses like scalar values and array expressions with affine subscript functions have been tractable for compiler analysis (to be prefetched into SPM), irregular accesses like pointer accesses and indexed array accesses have not been easily amenable for compiler analysis. This paper presents an SPM management technique using Markov chain based data access prediction for such irregular accesses. Our approach takes advantage of inherent, but hidden reuse in data accesses made by irregular references. We have implemented our proposed approach using an optimizing compiler. In this paper, we also present a thorough comparison of our different dynamic prediction schemes with other SPM management schemes. SPM management using our approaches produces 12.7% to 28.5% improvements in performance across a range of applications with both regular and irregular access patterns, with an average improvement of 20.8%.Item Open Access Vibrasyon ve PIR algılayıcılar kullanılarak çevre destekli akıllı ev tasarımı(IEEE, 2013-04) Yazar, Ahmet; Çetin, A. EnisIntelligent ambient assisted living systems for elderly and handicapped people become affordable with the recent advances in computer and sensor technologies. In this paper, fall detection algorithm using multiple passive infrared sensors is developed. As a novel method for detecting a falling person, two passive infrared sensors are used concurrently in a room and developed a determination algorithm depending on the height at which the falling event is happened. Motionles detection system is integrated with the falling person detection system to provide a complete solution. Detection algorithms are implemented using embedded microprocessors. © 2013 IEEE.Item Open Access Voltage island based heterogeneous NoC design through constraint programming(Pergamon Press, 2014) Demiriz, A.; Bagherzadeh, N.; Ozturk, O.This paper discusses heterogeneous Network-on-Chip (NoC) design from a Constraint Programming (CP) perspective and extends the formulation to solving Voltage-Frequency Island (VFI) problem. In general, VFI is a superior design alternative in terms of thermal constraints, power consumption as well as performance considerations. Given a Communication Task Graph (CTG) and subsequent task assignments for cores, cores are allocated to the best possible places on the chip in the first stage to minimize the overall communication cost among cores. We then solve the application scheduling problem to determine the optimum core types from a list of technological alternatives and to minimize the makespan. Moreover, an elegant CP model is proposed to solve VFI problem by mapping and grouping cores at the same time with scheduling the computation tasks as a limited capacity resource allocation model. The paper reports results based on real benchmark datasets from the literature.