Browsing by Author "Ahangari, Hamzeh"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Open Access Analysis of design parameters in SIL-4 safety-critical computer(IEEE, 2017-01) Ahangari, Hamzeh; Özkök, Y. I.; Yıldırım, A.; Say, F.; Atik, Funda; Öztürk, ÖzcanNowadays, Safety-critical computers are extensively used in may civil domains like transportation including railways, avionics and automotive. We noticed that in design of some previous works, some critical safety design parameters like failure diagnostic coverage (DC) or common cause failure (CCF) ratio have not been seriously taken into account. Moreover, in some cases safety has not been compared with standard safety levels (IEC-61508 SIL1-SIL4) or even have not met them. Most often, it is not very clear that which part of the system is the Achilles' heel and how design can be improved to reach standard safety levels. Motivated by such design ambiguities, we aim to study the effect of various design parameters on safety in some prevalent safety configurations: 1oo2 and 2oo3. 1oo1 is also used as a reference. By employing Markov modeling, sensitivity of safety to each of the following critical design parameters is analyzed: failure rate of processing element, failure diagnostics coverage, common cause failures and repair rates. This study gives a deeper sense regarding influence of variation in design parameters over safety. Consequently, to meet appropriate safety integrity level, instead of improving some system parts blindly, it will be possible to make an informed decision on more relevant parameters. © 2017 IEEE.Item Embargo Architecture for safety–critical transportation systems(Elsevier B.V., 2023-03-15) Ahangari, Hamzeh; Özkök, Yusuf İbrahim; Yıldırım, Asil; Say, Fatih; Atık, Funda; Ozturk, OzcanIn many industrial systems, including transportation, fault tolerance is a key requirement. Usually, faulttolerance is achieved by redundancy, where replication of critical components is used. In the case oftransportation computing systems, this redundancy starts with the processing element. In this paper, we useMarkov models to assess the level of safety with different redundancy techniques used in the literature. Morespecifically, we give implementation details for various architecture options and evaluate one out of two (1oo2)and two out of three (2oo3) implementations. We observe that both 1oo2 and 2oo3 can reduce the averageprobability of failure per hour (PFH) down to 10−7 which provides Level-3 (SIL3) safety according to thestandards.Item Open Access Custom hardware optimizations for reliable and high performance computer architectures(Bilkent University, 2020-09) Ahangari, HamzehIn recent years, we have witnessed a huge wave of innovations, such as in Artificial Intelligence (AI) and Internet-of-Things (IoT). In this trend, software tools are constantly and increasingly demanding more processing power, which can no longer be met by processors traditionally. In response to this need, a diverse range of hardware, including GPUs, FPGAs, and AI accelerators, are coming to the market every day. On the other hand, while hardware platforms are becoming more power-hungry due to higher performance demand, concurrent reduction in the size of transistors, and placing high emphasis on reducing the voltage, altogether have always been sources of reliability concerns in circuits. This particularly is applicable to error-sensitive applications, such as transportation and aviation industries where an error can be catastrophic. The reliability issues may have other reasons too, like harsh environmental conditions. These two problems of modern electronic circuits, meaning the need for higher performance and reliability at the same time, require appropriate solutions. In order to satisfy both the performance and the reliability constraints either designs based on reconfigurable circuits, such as FPGAs, or designs based on Commercial-Off-The-Shelf (COTS) components like general-purpose processors, can be an appropriate approach because the platforms can be used in a wide variety of applications. In this regard, three solutions have been proposed in this thesis. These solutions target 1) safety and reliability at the system-level using redundant processors, 2) performance at the architecture-level using multiple accelerators, and 3) reliability at the circuit-level through the use of redundant transistors. Specifically, in the first work, the contribution of some prevalent parameters in the design of safetycritical computers, using COTS processors, is discussed. Redundant architectures are modeled by the Markov chains, and sensitivity of system safety to parameters has been analyzed. Most importantly, the significant presence of Common Cause Failures (CCFs) has been investigated. In the second work, the design, and implementation of an HLS-based, FPGA-accelerated, high-throughput/work-efficient, synthesizable template-based graph processing framework has been presented. The template framework is simplified for easy mapping to FPGA, even for software programmers. The framework is particularly experimented on Intel state-ofthe-art Xeon+FPGA platform to implement iterative graph algorithms. Beside high-throughput pipeline, work-efficient mode significantly reduces total graph processing run-time with a novel active-list design. In the third work, Joint SRAM (JSRAM) cell, a novel circuit-level technique to exploit the trade-off between reliability and memory size, is introduced. This idea is applicable to any SRAM structure like cache memory, register file, FPGA block RAM, or FPGA look-up table (LUT), and even latches and Flip-Flops. In fault-prone conditions, the structure can be configured in such a way that four cells are combined together at the circuit level to form one large and robust memory bit. Unlike prevalent hardware redundancy techniques, like Triple Modular Redundancy (TMR), there is no explicit majority voter at the output. The proposed solution mainly focuses on transient faults, where the reliable mode can provide auto-correction and full immunity against single faults.Item Open Access HLS-based high-throughput and work-efficient synthesizable graph processing template pipeline(Association for Computing Machinery, 2024-01-24) Ahangari, Hamzeh; Özdal, Muhammet Mustafa; Öztürk, Özcan; Mitra, TulikaHardware systems composed of diverse execution resources are being deployed to cope with the complexity and performance requirements of Artificial Intelligence (AI) and Machine Learning (ML) applications. With the emergence of new hardware platforms, system-wide programming support has become much more important. While this is true for various devices ranging from CPUs to GPUs, it is especially critical for specific neural network accelerators implemented on FPGAs. For example, Intel’s recent HARP platform encompasses a Xeon CPU and an FPGA, which requires an intense software stack to be used effectively. Programming such a hybrid system will be a challenge for most of the non-expert users. High-level language solutions such as Intel OpenCL for FPGA try to address the problem. However, as the abstraction level increases, the efficiency of implementation decreases, depicting two opposing requirements. In this work, we propose a framework to generate HLS-based, FPGA-accelerated, high-throughput/work-efficient, synthesizable, and template-based graph-processing pipeline. While a fixed and clock-wise precisely designed deep-pipeline architecture, written in SystemC, is responsible for processing graph vertices, the user implements the intended iterative graph algorithm by implementing/modifying only a single module in C/C++. This way, efficiency and high performance can be achieved with better programmability and productivity. With similar programming efforts, it is shown that the proposed template outperforms a high-throughput OpenCL baseline by up to 50% in terms of edge throughput. Furthermore, the novel work-efficient design significantly improves execution time and power consumption by up to 100×.Item Open Access A novel heterogeneous approximate multiplier for low power and high performance(Institute of Electrical and Electronics Engineers, 2018) Alouani, I.; Ahangari, Hamzeh; Öztürk, Özcan; Niar, S.Approximate computing is a design paradigm considered for a range of applications that can tolerate some loss of accuracy. In fact, the bottleneck in conventional digital design techniques can be eliminated to achieve higher performance and energy efficiency by compromising accuracy. In this letter, a new architecture that engages accuracy as a design parameter is presented, where an approximate parallel multiplier using heterogeneous blocks is implemented. Based on design space exploration, we demonstrate that introducing diverse building blocks to implement the multiplier rather than cloning one building block achieves higher precision results. We show experimental results in terms of precision, delay, and power dissipation as metrics and compare with three previous approximate designs. Our results show that the proposed heterogeneous multiplier achieves more precise outputs than the tested circuits while improving performance and power tradeoffs.Item Open Access NS-SRAM: neighborhood solidarity SRAM for reliability enhancement of SRAM memories(IEEE, 2016-08-09) Alouani, I.; Ahangari, Hamzeh; Öztürk, Özcan; Niar, S.Technology shift and voltage scaling increased the susceptibility of Static Random Access Memories (SRAMs) to errors dramatically. In this paper, we present NS-SRAM, for Neighborhood Solidarity SRAM, a new technique to enhance error resilience of SRAMs by exploiting the adjacent memory bit data. Bit cells of a memory line are paired together in circuit level to mutually increase the static noise margin and critical charge of a cell. Unlike existing techniques, NS-SRAM aims to enhance both Bit Error Rate (BER) and Soft Error rate (SER) at the same time. Due to auto-adaptive joiners, each of the adjacent cells' nodes is connected to its counterpart in the neighbor bit. NS-SRAM enhances read-stability by increasing critical Read Static Noise Margin (RSNM), thereby decreasing faults when circuit operates under voltage scaling. It also increases hold-stability and critical charge to mitigate soft-errors. By the proposed technique, reliability of SRAM based structures such as cache memories and register files can drastically be improved with comparable area overhead to existing hardening techniques. Moreover it does not require any extra-memory, does not impact the memory effective size, and has no negative impact on performance. © 2016 IEEE.Item Open Access Power‐efficient reliable register file for aggressive‐environment applications(Institution of Engineering and Technology, 2020) Alouani, I.; Ahangari, Hamzeh; Öztürk, Özcan; Niar, S.In a context of increasing demands for on‐board data processing, insuring reliability under reduced power budget is a serious design challenge for embedded system manufacturers. Particularly, embedded processors in aggressive environments need to be designed with error hardening as a primary goal, not an afterthought. As Register File (RF) is a critical element within the processor pipeline, enhancing RF reliability is mandatory to design fault immune computing systems. This study proposes integer and floating point RF reliability enhancement techniques. Specifically, the authors propose Adjacent Register Hardened RF, a new RF architecture that exploits the adjacent byte‐level narrow‐width values for hardening integer registers at runtime. Registers are paired together by special switches referred to as joiners and non‐utilised bits of each register are exploited to enhance the reliability of its counterpart register. Moreover, they suggest sacrificing the least significant bits of the Mantissa to enhance the reliability of the floating point critical bits, namely, Exponent and Sign bits. The authors’ results show that with a low power budget compared to state of the art techniques, they achieve better results under both normal and highly aggressive operating conditions.Item Open Access Reconfigurable hardened latch and flip-flop for FPGAs(IEEE, 2017-07) Ahangari, Hamzeh; Alouani, I.; Öztürk, Özcan; Niar, S.In this paper, we propose Joint Latch (JLatch) and Joint Flip-Flop (JFF), two novel reconfigurable structures which bring the reconfigurability of reliability to user latches and flip-flops (FFs) in reconfigurable devices such as FPGAs. Specifically, we implement two reconfigurable storage elements that exploit a trade-off between reliability and amount of available resources. In fault prone conditions, JLatch (or JFF) is configured in such a way that four pre-selected normal static latches (or FFs) are combined together at circuit level to form one hardened storage cell. Solution focuses on transient faults such as soft errors, where we show that critical charge is increased by at least three orders of magnitude (1000X) to practically bring immunity against any Single Event Upset (SEU). If four latches inside an FPGA logic block are far enough, it can effectively cope with Multiple Bit Upsets (MBUs) as well. Additionally, provided that special transistor sizing is applied (only necessary for some latch structures), JLatch and JFF take advantage of a novel self-correcting technique to correct any single fault immediately. Our solution provides reconfigurability of reliability with negligible performance and area overhead with only one (two) extra transistor(s) per latch (FF). The delay of this technique is less than the delay of conventional TMR (Triple Modular Redundancy) technique with a majority voter at output. © 2017 IEEE.Item Open Access Register file reliability enhancement through adjacent narrow-width exploitation(IEEE, 2016-04) Ahangari, Hamzeh; Alouani, I.; Öztürk, Özcan; Niar, S.; Rivenq, A.Due to the increasing vulnerability of CMOS circuits, new generations of microprocessors require an inevitable focus on reliability issues. As the Register File (RF) constitutes a critical element within the processor pipeline, it is mandatory to enhance the RF reliability to develop fault tolerant architectures. This paper proposes Adjacent Register Hardened RF (ARH), a new RF architecture that exploits the adjacent byte-level narrow-width values for hardening registers at runtime. Registers are paired together by some special switches referred to as joiners. Dummy sign bits of each register are used to keep redundant data of its counterpart register. We use 7T/14T SRAM cell [6] to combine redundant bits together to make a single bit cell which is, by far, more resilient against faults. Our simulations show that with 3% to 12% power overhead and 10% to 20% increase in area, in comparison to baseline RF, we can obtain up to 80% reduction in soft error rate (SER). © 2016 IEEE.Item Open Access Temperature-aware core mapping for heterogeneous 3D NoC design through constraint programming(Institute of Electrical and Electronics Engineers, 2020) Demiriz, A.; Ahangari, Hamzeh; Öztürk, ÖzcanIn the context of Network-on-Chip (NoC) based Chip Multiprocessor (CMP) design, core mapping for application specific systems is a challenging problem. In such designs, various decisions have to be made that affect performance and power consumption. Moreover, in emerging 3D NoC systems, by intensification of cooling issues, temperature constraints on hot-spots are added, and problem becomes more complicated. In this paper, an earlier Constraint Programming (CP) methodology for heterogeneous 2D NoC design is extended to 3D model, while critical temperature constraints are accounted. In a single-stage, our approach can choose core types from a set of low, medium and high power, and assign them to appropriate places on the mesh which minimizes the overall computation time and communication cost while satisfying the temperature constraints. To achieve our objective, in addition to cores placement problem, tasks should also be scheduled on corresponding cores with matching performance levels to minimize the overall completion time (makespan). Experimental results show that task completion times are more dependent on the mesh structure for our benchmark data. 3D mesh structures may yield shorter task completion times, without compromising thermal constraints. On the other hand, restricting the peak temperature naturally requires the usage of low-performance computing elements which inherently may delay the processing time.