Browsing by Subject "Random access storage"

Now showing 1 - 6 of 6

Open Access
High performance 3D CMP design with stacked hybrid memory architecture in the dark silicon era using a convex optimization model
(IEEE, 2016-05) Onsori, Salman; Asad, Arghavan; Raahemifar, K.; Fathy, M.
In this article, we present a convex optimization model to design a stacked hybrid memory system to improve performance and reduce energy consumption of the chip-multiprocessor (CMP). Our convex model optimizes numbers and placement of SRAM and STT-RAM memories on the memory layer, and efficiently maps applications/threads on cores in the core layer. Power consumption that is the main challenge in the dark silicon era is represented as a power constraint in this work and it is satisfied by the detailed optimization model in order to design a dark silicon aware 3D CMP. Experimental results show that the proposed architecture considerably improves the energy-delay product (EDP) and performance of the 3D CMP compared to the Baseline memory design. © 2016 IEEE.
Open Access
A high-performance hybrid memory architecture for embedded CMPs using a convex optimization model
(IEEE, 2015-11) Onsori, Salman; Asad, Arghavan; Raahemifar, K.; Fathy, M.
In this article, we present a convex optimization model to design a stacked hybrid memory system for 3D embedded chip-multiprocessors (eCMP). Our convex model optimizes numbers and placement of SRAM and STT-RAM memories on the memory layer, and maps applications/threads on cores in the core layer effectively. The detailed proposed model satisfies the power constraint which is the main challenge of dark-silicon era. Experimental results show that the proposed architecture considerably improves the energy-delay product (EDP) and performance of the 3D eCMP compared to the Baseline memory design. © 2015 IEEE.
Open Access
Memory resident parallel inverted index construction
(Springer, London, 2012) Küçükyılmaz, Tayfun; Türk, Ata; Aykanat, Cevdet
Advances in cloud computing, 64-bit architectures and huge RAMs enable performing many search related tasks in memory.We argue that term-based partitioned parallel inverted index construction is among such tasks, and provide an efficient parallel framework that achieves this task. We show that by utilizing an efficient bucketing scheme we can eliminate the need for the generation of a global index and reduce the communication overhead without disturbing balancing constraint. We also propose and investigate assignment schemes that can further reduce communication overheads without disturbing balancing constraints. The conducted experiments indicate promising results. © 2012 Springer-Verlag London Limited.
Open Access
NS-SRAM: neighborhood solidarity SRAM for reliability enhancement of SRAM memories
(IEEE, 2016-08-09) Alouani, I.; Ahangari, Hamzeh; Öztürk, Özcan; Niar, S.
Technology shift and voltage scaling increased the susceptibility of Static Random Access Memories (SRAMs) to errors dramatically. In this paper, we present NS-SRAM, for Neighborhood Solidarity SRAM, a new technique to enhance error resilience of SRAMs by exploiting the adjacent memory bit data. Bit cells of a memory line are paired together in circuit level to mutually increase the static noise margin and critical charge of a cell. Unlike existing techniques, NS-SRAM aims to enhance both Bit Error Rate (BER) and Soft Error rate (SER) at the same time. Due to auto-adaptive joiners, each of the adjacent cells' nodes is connected to its counterpart in the neighbor bit. NS-SRAM enhances read-stability by increasing critical Read Static Noise Margin (RSNM), thereby decreasing faults when circuit operates under voltage scaling. It also increases hold-stability and critical charge to mitigate soft-errors. By the proposed technique, reliability of SRAM based structures such as cache memories and register files can drastically be improved with comparable area overhead to existing hardening techniques. Moreover it does not require any extra-memory, does not impact the memory effective size, and has no negative impact on performance. © 2016 IEEE.
Open Access
OptMem: dark-silicon aware low latency hybrid memory design
(IEEE, 2016-01) Onsori, Salman; Asad, Arghavan A; Raahemifar, K.; Fathy, M.
In this article, we present a convex optimization model to design a three dimension (3D)stacked hybrid memory system to improve performance in the dark silicon era. Our convex model optimizes numbers and placement of static random access memory (SRAM) and spin-Transfer torque magnetic random-Access memory(STT-RAM) memories on the memory layer to exploit advantages of both technologies. Power consumption that is the main challenge in the dark silicon era is represented as a main constraint in this work and it is satisfied by the detailed optimization model in order to design a dark silicon aware 3D Chip-Multiprocessor (CMP). Experimental results show that the proposed architecture improves the energy consumption and performanceof the 3D CMPabout 25.8% and 12.9% on averagecompared to the Baseline memory design. © 2016 IEEE.
Open Access
A two phase successive cancellation decoder architecture for polar codes
(IEEE, 2013) Pamuk, Alptekin; Arıkan, Erdal
We propose a two-phase successive cancellation (TPSC) decoder architecture for polar codes that exploits the array-code property of polar codes by breaking the decoding of a length-TV polar code into a series of length-√ L decoding cycles. Each decoding cycle consists of two phases: a first phase for decoding along the columns and a second phase for decoding along the rows of the code array. The reduced decoder size makes it more affordable to implement the core decoder logic using distributed memory elements consisting of flip-flops (FFs), as opposed to slower random access memory (RAM), leading to a speed up in clock frequency. To minimize the circuit complexity, a single decoder unit is used in both phases with minor modifications. The re-use of the same decoder module makes it necessary to recall certain internal decoder state variables between decoding cycles. Instead of storing the decoder state variables in RAM, the decoder discards them and calculates them again when needed. Overall, the decoder has O(√ L) circuit complexity excluding RAM, and a latency of approximately 2.57V. A RAM of size O(N) is needed for storing the channel log-likelihood variables and the decoder decision variables. As an example of the proposed method, a length N = 214 bit polar code is implemented in an FPGA and the synthesis results are compared with a previously reported FPGA implementation. The results show that the proposed architecture has lower complexity, lower memory utilization with higher throughput, and a clock frequency that is less sensitive to code length. © 2013 IEEE.