Browsing by Subject "Memory architecture"

Now showing 1 - 14 of 14

Open Access
Comparison of two image-space subdivision algorithms for direct volume rendering on distributed-memory multicomputers
(Springer, 1995-08) Tanin, Egemen; Kurç, Tahsin M.; Aykanat, Cevdet; Özgüç, Bülent
Direct Volume Rendering (DVR) is a powerful technique for visualizing volumetric data sets. However, it involves intensive computations. In addition, most of the volumetric data sets consist of large number of 3D sampling points. Therefore, visualization of such data sets also requires large computer memory space. Hence, DVR is a good candidate for parallelization on distributed-memory multicomputers. In this work, image-space parallelization of Raycasting based DVR for unstructured grids on distributed-memory multicomputers is presented and discussed. In order to visualize unstructured volumetric datasets where grid points of the dataset are irregularly distributed over the 3D space, the underlying algorithms should resolve the point location and view sort problems of the 3D grid points. In this paper, these problems are solved using a Scanline Z-buffer based algorithm. Two image space subdivision heuristics, namely horizontal and recursive rectangular subdivision heuristics, are utilized to distribute the computations evenly among the processors in the rendering phase. The horizontal subdivision algorithm divides the image space into horizontal bands composed of consecutive scanlines. In the recursive subdivision algorithm, the image space is divided into rectangular subregions recursively. The experimental performance evaluation of the horizontal and recursive subdivision algorithms on an IBM SP2 system are presented and discussed. © Springer-Verlag Berlin Heidelberg 1996.
Open Access
Graph analytics accelerators for cognitive systems
(Institute of Electrical and Electronics Engineers, 2017) Ozdal, M. M.; Yesil, S.; Kim, T.; Ayupov, A.; Greth, J.; Burns, S.; Ozturk, O.
Hardware accelerators are known to be performance and power efficient. This article focuses on accelerator design for graph analytics applications, which are commonly used kernels for cognitive systems. The authors propose a templatized architecture that is specifically optimized for vertex-centric graph applications with irregular memory access patterns, asynchronous execution, and asymmetric convergence. The proposed architecture addresses the limitations of existing CPU and GPU systems while providing a customizable template. The authors' experiments show that the generated accelerators can outperform a high-end CPU system with up to 3 times better performance and 65 times better power efficiency. © 1981-2012 IEEE.
Open Access
Graphene Nanoplatelets Embedded in HfO2 for MOS Memory
(Electrochemical Society Inc., 2015) El-Atab, N.; Turgut, Berk Berkan; Okyay, Ali Kemal; Nayfeh, A.
In this work, a MOS memory with graphene nanoplatelets charge trapping layer and a double layer high-κ Al2O3/HfO2 tunnel oxide is demonstrated. Using C-Vgate measurements, the memory showed a large memory window at low program/erase voltages. The analysis of the C-V characteristics shows that electrons are being stored in the graphene-nanoplatelets during the program operation. In addition, the retention characteristic of the memory is studied by plotting the hysteresis measurement vs. time. The measured excellent retention characteristic (28.8% charge loss in 10 years) is due to the large electron affinity of the graphene. The analysis of the plot of the energy band diagram of the MOS structure further proves its good retention characteristic. Finally, the results show that such graphene nanoplatelets are promising in future low-power non-volatile memory devices.
Open Access
High performance 3D CMP design with stacked hybrid memory architecture in the dark silicon era using a convex optimization model
(IEEE, 2016-05) Onsori, Salman; Asad, Arghavan; Raahemifar, K.; Fathy, M.
In this article, we present a convex optimization model to design a stacked hybrid memory system to improve performance and reduce energy consumption of the chip-multiprocessor (CMP). Our convex model optimizes numbers and placement of SRAM and STT-RAM memories on the memory layer, and efficiently maps applications/threads on cores in the core layer. Power consumption that is the main challenge in the dark silicon era is represented as a power constraint in this work and it is satisfied by the detailed optimization model in order to design a dark silicon aware 3D CMP. Experimental results show that the proposed architecture considerably improves the energy-delay product (EDP) and performance of the 3D CMP compared to the Baseline memory design. © 2016 IEEE.
Open Access
A high-performance hybrid memory architecture for embedded CMPs using a convex optimization model
(IEEE, 2015-11) Onsori, Salman; Asad, Arghavan; Raahemifar, K.; Fathy, M.
In this article, we present a convex optimization model to design a stacked hybrid memory system for 3D embedded chip-multiprocessors (eCMP). Our convex model optimizes numbers and placement of SRAM and STT-RAM memories on the memory layer, and maps applications/threads on cores in the core layer effectively. The detailed proposed model satisfies the power constraint which is the main challenge of dark-silicon era. Experimental results show that the proposed architecture considerably improves the energy-delay product (EDP) and performance of the 3D eCMP compared to the Baseline memory design. © 2015 IEEE.
Open Access
Hybrid stacked memory architecture for energy efficient embedded chip-multiprocessors based on compiler directed approach
(IEEE, 2015-12) Onsori, Salman; Asad, A.; Öztürk, Özcan; Fathy, M.
Energy consumption becomes the most critical limitation on the performance of nowadays embedded system designs. On-chip memories due to major contribution in overall system energy consumption are always significant issue for embedded systems. Using conventional memory technologies in future designs in nano-scale era causes a drastic increase in leakage power consumption and temperature-related problems. Emerging non-volatile memory (NVM) technologies are promising replacement for conventional memory structure in embedded systems due to its attractive characteristics such as near-zero leakage power, high density and non-volatility. Recent advantages of NVM technologies can significantly mitigate the issue of memory leakage power. However, they introduce new challenges such as limited write endurance and high write energy consumption which restrict them for adoption in modern memory systems. In this article, we propose a stacked hybrid memory system to minimize energy consumption for 3D embedded chip-multiprocessors (eCMP). For reaching this target, we present a convex optimization-based model to distribute data blocks between SRAM and NVM banks based on data access pattern derived by compiler. Our compiler-assisted hybrid memory architecture can achieve up to 51.28 times improvement in lifetime. In addition, experimental results show that our proposed method reduce energy consumption by 56% on average compared to the traditional memory design where single technology is used. © 2015 IEEE.
Open Access
Implications of non-volatile memory as primary storage for database management systems
(IEEE, 2017) Mustafa, Naveed Ul; Armejach, A.; Öztürk, Özcan; Cristal, A.; Unsal, O. S.
Traditional Database Management System (DBMS) software relies on hard disks for storing relational data. Hard disks are cheap, persistent, and offer huge storage capacities. However, data retrieval latency for hard disks is extremely high. To hide this latency, DRAM is used as an intermediate storage. DRAM is significantly faster than disk, but deployed in smaller capacities due to cost and power constraints, and without the necessary persistency feature that disks have. Non-Volatile Memory (NVM) is an emerging storage class technology which promises the best of both worlds. It can offer large storage capacities, due to better scaling and cost metrics than DRAM, and is non-volatile (persistent) like hard disks. At the same time, its data retrieval time is much lower than that of hard disks and it is also byte-addressable like DRAM. In this paper, we explore the implications of employing NVM as primary storage for DBMS. In other words, we investigate the modifications necessary to be applied on a traditional relational DBMS to take advantage of NVM features. As a case study, we have modified the storage engine (SE) of PostgreSQL enabling efficient use of NVM hardware. We detail the necessary changes and challenges such modifications entail and evaluate them using a comprehensive emulation platform. Results indicate that our modified SE reduces query execution time by up to 40% and 14.4% when compared to disk and NVM storage, with average reductions of 20.5% and 4.5%, respectively. © 2016 IEEE.
Open Access
Memristive behavior in a junctionless flash memory cell
(American Institute of Physics Inc., 2015) Orak, I.; Ürel, M.; Bakan, G.; Dana, A.
We report charge storage based memristive operation of a junctionless thin film flash memory cell when it is operated as a two terminal device by grounding the gate. Unlike memristors based on nanoionics, the presented device mode, which we refer to as the flashristor mode, potentially allows greater control over the memristive properties, allowing rational design. The mode is demonstrated using a depletion type n-channel ZnO transistor grown by atomic layer deposition (ALD), with HfO2 as the tunnel dielectric, AI2O3 as the control dielectric, and non-stoichiometric silicon nitride as the charge storage layer. The device exhibits the pinched hysteresis of a memristor and in the unoptimized device, R off/R on ratios of about 3 are presented with low operating voltages below 5 V. A simplified model predicts Roff/Ron ratios can be improved significantly by adjusting the native threshold voltage of the devices. The repeatability of the resistive switching is excellent and devices exhibit 106 s retention time, which can, in principle, be improved by engineering the gate stack and storage layer properties. The flashristor mode can find use in analog information processing applications, such as neuromorphic computing, where well-behaving and highly repeatable memristive properties are desirable.
Open Access
Notice of violation of IEEE publication principles an energy-efficient heterogeneous memory architecture for future dark silicon embedded chip-multiprocessors
(IEEE Computer Society, 2018) Onsori, S.; Asad, A.; Raahemifar, K.; Fathy, M.
Main memories play an important role in overall energy consumption of embedded systems. Using conventional memory technologies in future designs in nanoscale era causes a drastic increase in leakage power consumption and temperature-related problems. Emerging non-volatile memory (NVM) technologies offer many desirable characteristics such as near-zero leakage power, high density and non-volatility. They can significantly mitigate the issue of memory leakage power in future embedded chip-multiprocessor (eCMP) systems. However, they suffer from challenges such as limited write endurance and high write energy consumption which restrict them for adoption in modern memory systems. In this article, we present a convex optimization model to design a 3D stacked hybrid memory architecture in order to minimize the future embedded systems energy consumption in the dark silicon era. This proposed approach satisfies endurance constraint in order to design a reliable memory system. Our convex model optimizes numbers and placement of eDRAM and STT-RAM memory banks on the memory layer to exploit the advantages of both technologies in future eCMPs. Energy consumption, the main challenge in the dark silicon era, is represented as a major target in this work and it is minimized by the detailed optimization model in order to design a dark silicon aware 3D Chip-Multiprocessor. Experimental results show that in comparison with the Baseline memory design, the proposed architecture improves the energy consumption and performance of the 3D CMP on average about 61.33% and 9% respectively. IEEE
Open Access
On-chip memory space partitioning for chip multiprocessors using polyhedral algebra
(The Institution of Engineering and Technology, 2010) Ozturk, O.; Kandemir, M.; Irwin, M. J.
One of the most important issues in designing a chip multiprocessor is to decide its on-chip memory organisation. While it is possible to design an application-specific memory architecture, this may not necessarily be the best option, in particular when storage demands of individual processors and/or their data sharing patterns can change from one point in execution to another for the same application. Here, two problems are formulated. First, we show how a polyhedral method can be used to design, for array-based data-intensive embedded applications, an application-specific hybrid memory architecture that has both shared and private components. We evaluate the resulting memory configurations using a set of benchmarks and compare them to pure private and pure shared memory on-chip multiprocessor architectures. The second approach proposed consider dynamic configuration of software-managed on-chip memory space to adapt to the runtime variations in data storage demand and interprocessor sharing patterns. The proposed framework is fully implemented using an optimising compiler, a polyhedral tool, and a memory partitioner (based on integer linear programming), and is tested using a suite of eight data-intensive embedded applications. © 2010 © The Institution of Engineering and Technology.
Open Access
OptMem: dark-silicon aware low latency hybrid memory design
(IEEE, 2016-01) Onsori, Salman; Asad, Arghavan A; Raahemifar, K.; Fathy, M.
In this article, we present a convex optimization model to design a three dimension (3D)stacked hybrid memory system to improve performance in the dark silicon era. Our convex model optimizes numbers and placement of static random access memory (SRAM) and spin-Transfer torque magnetic random-Access memory(STT-RAM) memories on the memory layer to exploit advantages of both technologies. Power consumption that is the main challenge in the dark silicon era is represented as a main constraint in this work and it is satisfied by the detailed optimization model in order to design a dark silicon aware 3D Chip-Multiprocessor (CMP). Experimental results show that the proposed architecture improves the energy consumption and performanceof the 3D CMPabout 25.8% and 12.9% on averagecompared to the Baseline memory design. © 2016 IEEE.
Open Access
Parallel minimum norm solution of sparse block diagonal column overlapped underdetermined systems
(Association for Computing Machinery, 2017) Torun, F. S.; Manguoglu, M.; Aykanat, Cevdet
Underdetermined systems of equations in which the minimum norm solution needs to be computed arise in many applications, such as geophysics, signal processing, and biomedical engineering. In this article, we introduce a new parallel algorithm for obtaining the minimum 2-norm solution of an underdetermined system of equations. The proposed algorithm is based on the Balance scheme, which was originally developed for the parallel solution of banded linear systems. The proposed scheme assumes a generalized banded form where the coefficient matrix has column overlapped block structure in which the blocks could be dense or sparse. In this article, we implement the more general sparse case. The blocks can be handled independently by any existing sequential or parallel QR factorization library. A smaller reduced system is formed and solved before obtaining the minimum norm solution of the original system in parallel. We experimentally compare and confirm the error bound of the proposed method against the QR factorization based techniques by using true single-precision arithmetic. We implement the proposed algorithm by using the message passing paradigm. We demonstrate numerical effectiveness as well as parallel scalability of the proposed algorithm on both shared and distributed memory architectures for solving various types of problems. © 2017 ACM.
Open Access
Parallel pruning for k-means clustering on shared memory architectures
(Springer Verlag, 2001) Gürsoy, Attila; Cengiz, Ilker
We have developed and evaluated two parallelization schemes for a tree-based k-means clustering method on shared memory machines. One scheme is to partition the pattern space across processors. We have determined that spatial decomposition of patterns outperforms random decomposition even though random decomposition has almost no load imbalance problem. The other scheme is the parallel traverse of the search tree. This approach solves the load imbalance problem and performs slightly better than the spatial decomposition, but the efficiency is reduced due to thread synchronizations. In both cases, parallel treebased k-means clustering is significantly faster than the direct parallel k-means. © Springer-Verlag Berlin Heidelberg 2001.
Open Access
Rigorous solutions of large-scale dielectric problems with the parallel multilevel fast multipole algorithm
(IEEE, 2011) Ergül, Özgür; Gürel, Levent
We present fast and accurate solutions of large-scale electromagnetics problems involving three-dimensional homogeneous dielectric objects. Problems are formulated rigorously with the electric and magnetic current combined-field integral equation (JMCFIE) and solved iteratively with the multilevel fast multipole algorithm (MLFMA). In order to solve large-scale problems, MLFMA is parallelized efficiently on distributed-memory architectures using the hierarchical partitioning strategy. Efficiency and accuracy of the developed implementation are demonstrated on very large scattering problems discretized with tens of millions of unknowns. © 2011 IEEE.