Browsing by Subject "Hardware accelerators"

Now showing 1 - 4 of 4

Open Access
Graph analytics accelerators for cognitive systems
(Institute of Electrical and Electronics Engineers, 2017) Ozdal, M. M.; Yesil, S.; Kim, T.; Ayupov, A.; Greth, J.; Burns, S.; Ozturk, O.
Hardware accelerators are known to be performance and power efficient. This article focuses on accelerator design for graph analytics applications, which are commonly used kernels for cognitive systems. The authors propose a templatized architecture that is specifically optimized for vertex-centric graph applications with irregular memory access patterns, asynchronous execution, and asymmetric convergence. The proposed architecture addresses the limitations of existing CPU and GPU systems while providing a customizable template. The authors' experiments show that the generated accelerators can outperform a high-end CPU system with up to 3 times better performance and 65 times better power efficiency. © 1981-2012 IEEE.
Open Access
Hardware accelerator design for data centers
(IEEE, 2016-11) Yeşil, Şerif; Özdal, Muhammet Mustafa; Kim, T.; Ayupov, A.; Burns, S.; Öztürk, Özcan.
As the size of available data is increasing, it is becoming inefficient to scale the computational power of traditional systems. To overcome this problem, customized application-specific accelerators are becoming integral parts of modern system on chip (SOC) architectures. In this paper, we summarize existing hardware accelerators for data centers and discuss the techniques to implement and embed them along with the existing SOCs. © 2015 IEEE.
Open Access
JPEG hardware accelerator design for FPGA
(IEEE, 2007) Duman, Kaan; Çoǧun, Fuat; Öktem, L.
A fully pipelined JPEG hardware accelerator that runs on FPGA is presented. The accelerator is designed interactively in a simulation environment, using a DSP hardware design automation tool chain. The encoder part of the accelerator accepts 8×8 image blocks in a streaming fashion, and outputs the zigzag-scanned, quantized 2-D DCT coefficients of the block. The decoder part accepts zigzag-scanned, quantized DCT coefficients, and outputs reconstructed 8×8 image block. Each part has a throughput of one system clock per pixel per channel. The encoder employs a fast pipelined implementation for 2-D DCT [1]. For the decoder, a new pipelined 2-D IDCT structure is developed. Our IDCT structure is based on an IDCT factorization for software implementation [2], and is inspired by the pipelined DCT structure employed in the encoder. The resource utilization and maximum frequency figures for a particular FPGA target suggest that our accelerator has competitive performance.
Open Access
Source-to-source transformation based methodology for graph-parallel FPGA accelerators
(2019-08) Akyol, Cemil Kaan
Graph applications are becoming more and more important with their widespread usage and the amounts of data they deal with. Biological and social web graphs are well-known examples which show the importance of efficient processing of the graph analytic applications and problems. Addressing those problems in an efficient manner is not a straightforward task. Distributing and parallelizing the computation, and integrating hardware accelerators are the main approaches that were tried during the last decade. However, these approaches mainly focus on specific legacy algorithms and may not completely solve the problems. Therefore, when there is an emerging need for a non-legacy algorithm targeting a specific problem, the developer has to cope with the adversaries of the distribution, parallelization techniques, and hardware specifications to parallelize and accelerate the application. Our proposed source-to-source based methodology gives the freedom of not knowing the low-level details of parallelization and distribution by translating any vertex-centric C++ graph application into pipelined SystemC model. In order to support different types of graph applications, we have implemented several features like non-standard application support, active set functionality, multi-pipeline support, etc. The generated SystemC model can be synthesized by High-Level Synthesis (HLS) tools to obtain the FPGA programming image, i.e., the bitstream. Our accelerator development flow can generate two different execution models, high-throughput (HT) and work-efficient (WE). Compared to OpenCL counterparts of the algorithms, HT and WE models perform slightly better in terms of execution time and throughput. WE model performed approximately 40% better than OpenCL in terms of work done and execution time. Therefore, the proposed source-to-source based methodology is able to provide more efficient hardware designs by only requiring a simple high-level language description from the user.