Efficient neural network processing via model compression and low-power functional units

Karakuloğlu, Ali Necat

Efficient neural network processing via model compression and low-power functional units

buir.advisor	Çukur, Emine Ülkü Sarıtaş
buir.co-advisor	Çakır, Burçin
dc.contributor.author	Karakuloğlu, Ali Necat
dc.date.accessioned	2025-01-06T11:39:53Z
dc.date.available	2025-01-06T11:39:53Z
dc.date.copyright	2024-12
dc.date.issued	2024-12
dc.date.submitted	2024-12-19
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references (leaves 63-69).	en_US
dc.description.abstract	We present a framework that contributes neural network optimization through novel methods in pruning, quantization, and arithmetic unit design for resource-constrained devices to datacenters. The first component is a pruning method that employs an importance metric to measure and selectively eliminate less critical neurons and weights, achieving high compression rates up to 99.9% without sacrificing significant accuracy. This idea is improved by a novel pruning schedule that optimizes the balance between compression and model’s generalization capa-bility. Next, we introduce a quantization method that combines with pruning to improve hardware compatibility for floating point format, offering efficient model compression and fast computation and general usability. Finally, we propose a logarithmic arithmetic unit that designed as an energy-efficient alternative to conventional floating-point operations, providing precise and configurable processing without relying on bulky lookup tables. Extensive evaluations across different datasets and CUDA-based simulations and Verilog based hardware designs indicate that our approaches outperforms existing methods, making it a powerful solution for deploying artificial intelligence models more efficiently.
dc.description.statementofresponsibility	by Ali Necat Karakuloğlu
dc.embargo.release	2025-06-19
dc.format.extent	xiv, 89 leaves : illustrations, charts ; 30 cm.
dc.identifier.itemid	B149015
dc.identifier.uri	https://hdl.handle.net/11693/115942
dc.language.iso	English
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Efficient
dc.subject	Neural Networks
dc.subject	Pruning
dc.subject	Quantization
dc.subject	Approximate computing
dc.title	Efficient neural network processing via model compression and low-power functional units
dc.title.alternative	Model sıkıştırma ve düşük güç fonksiyonel ünitelerle verimli sinir ağı işleme
dc.type	Thesis
thesis.degree.discipline	Electrical and Electronic Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: B149015.pdf
Size:: 7.2 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Graduate School of Engineering and Science