Reducing processor-memory performance gap and improving network-on-chip throughput

Mustafa, Naveed U. l.

Reducing processor-memory performance gap and improving network-on-chip throughput

buir.advisor	Öztürk, Özcan
dc.contributor.author	Mustafa, Naveed U. l.
dc.date.accessioned	2019-02-21T12:53:31Z
dc.date.available	2019-02-21T12:53:31Z
dc.date.copyright	2019-02
dc.date.issued	2019-02
dc.date.submitted	2019-02-22
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references (leaves 102-121).	en_US
dc.description.abstract	Performance of computing systems has tremendously improved over last few decades primarily due to decreasing transistor size and increasing clock rate. Billions of transistors placed on a single chip and switching at high clock rate result in overheating of the chip. The demand for performance improvement without increasing the heat dissipation lead to the inception of multi/many core design where multiple cores and/or memories communicate through a network on chip. Unfortunately, performance of memory devices has not improved at the same rate as that of processors and hence become a performance bottleneck. On the other hand, varying traffic pattern in real applications limits the network throughput delivered by a routing algorithm. In this thesis, we address the issue of reducing processor-memory performance gap in two ways: First, by integrating improved and newly developed memory technologies in memory hierarchy of a computing system. Second, by equipping the execution platform with necessary architectural features and enabling its compiler to parallelize memory access instructions. We also address issue of improving network throughput by proposing a selection scheme that switches routing algorithm of an NoC with changing traffic pattern of an application. We present integration of emerging non-volatile memory (NVM) devices in memory hierarchy of a computing system in the context of database management systems (DBMS). To this end, we propose modifications in storage engine (SE) of a DBMS aiming at fast access to data through bypassing the slow disk interfaces while maintaining all the functionalities of a robust DBMS. As a case study, we modify the SE of PostgreSQL and detail the necessary changes and challenges such modifications entail. We evaluate our proposal using a comprehensive emulation platform. Results indicate that our modified SE reduces query execution time by up to 45% and 13% when compared to disk and NVM storage, with average reductions of 19% and 4%, respectively. Detailed analysis of these results shows that our modified SE suffers from data readiness problem. To solve this, we develop a general purpose library that employs helper threads to prefetch data from NVM hardware via a simple application program interface (API). Our library further improves query execution time for our modified SE when compared to disk and NVM storage by up to 54% and 17%, with average reductions of 23% and 8%, respectively. As a second way to reduce processor-memory performance gap, we propose a compiler optimization aiming at reduction of memory bound stalls. The proposed optimization generates efficient instruction schedule through classification of memory references and consists of two steps: affinity analysis and affinity-aware instruction scheduling. We suggest two different approaches for affinity analysis, i.e., source code annotation and automated analysis. Our experimental results show that application of annotation-based approach on a memory intensive program reduces stall cycles by 67.44%, leading to 25.61% improvement in execution time. We also evaluate automated-analysis approach using eleven different image processing benchmarks. Experimental results show that automated-analysis reduces stall cycles, on average, by 69.83%. As all benchmarks are both compute and memory-intensive, we achieve improvement in execution time by up to 30%, with a modest average of 5.79%. In order to improve network throughput, we propose a selection scheme that switches routing algorithm with changing traffic pattern. We use two selection strategies: static and dynamic selection. While static selection is made off-line, dynamic approach uses run-time information on network congestion for selection of routing algorithm. Experimental results show that our proposal improves throughput for real applications up to 37.49%. They key conclusion of this thesis is that improvement in performance of a computing system needs multifaceted approach i.e., improving the performance of memory and communication subsystem at the same time. The reduction in performance gap between processors and memories requires not only integration of improved memory technologies in system but also software/compiler support. We also conclude that switching routing algorithm with changing traffic pattern of an application leads to improvement of NoC throughput.	en_US
dc.description.statementofresponsibility	by Naveed Ul Mustafa	en_US
dc.format.extent	xix, 126 leaves : charts (some color) ; 30 cm.	en_US
dc.identifier.itemid	B159710
dc.identifier.uri	http://hdl.handle.net/11693/49641
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Memory bound stalls	en_US
dc.subject	Compiler optimization	en_US
dc.subject	Execution time	en_US
dc.subject	Computer vision	en_US
dc.subject	Non volatile memory	en_US
dc.subject	Relational DBMS	en_US
dc.subject	Storage engine	en_US
dc.subject	Network-onchip	en_US
dc.subject	Routing algorithm	en_US
dc.subject	Throughput	en_US
dc.title	Reducing processor-memory performance gap and improving network-on-chip throughput	en_US
dc.title.alternative	İşlemci-bellek performans farkını azaltmak ve yonga üstü ağ verimini artırmak	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Doctoral
thesis.degree.name	Ph.D. (Doctor of Philosophy)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: PhDThesisNaveedUlMustafa.pdf
Size:: 2.15 MB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Graduate School of Engineering and Science