Efficient neural network processing via model compression and low-power functional units

Karakuloğlu, Ali Necat

Efficient neural network processing via model compression and low-power functional units

Limited Access

This item is unavailable until:
2025-06-19

Files

B149015.pdf (7.2 MB)

Date

2024-12

Authors

Karakuloğlu, Ali Necat

Advisor

Çukur, Emine Ülkü Sarıtaş

Co-Advisor

Çakır, Burçin

BUIR Usage Stats

13
views

0
downloads

Abstract

We present a framework that contributes neural network optimization through novel methods in pruning, quantization, and arithmetic unit design for resource-constrained devices to datacenters. The first component is a pruning method that employs an importance metric to measure and selectively eliminate less critical neurons and weights, achieving high compression rates up to 99.9% without sacrificing significant accuracy. This idea is improved by a novel pruning schedule that optimizes the balance between compression and model’s generalization capa-bility. Next, we introduce a quantization method that combines with pruning to improve hardware compatibility for floating point format, offering efficient model compression and fast computation and general usability. Finally, we propose a logarithmic arithmetic unit that designed as an energy-efficient alternative to conventional floating-point operations, providing precise and configurable processing without relying on bulky lookup tables. Extensive evaluations across different datasets and CUDA-based simulations and Verilog based hardware designs indicate that our approaches outperforms existing methods, making it a powerful solution for deploying artificial intelligence models more efficiently.

Keywords

Efficient, Neural Networks, Pruning, Quantization, Approximate computing

Degree Discipline

Electrical and Electronic Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Permalink

https://hdl.handle.net/11693/115942

Collections

Graduate School of Engineering and Science

Language

English

Type

Thesis

Full item page

Efficient neural network processing via model compression and low-power functional units

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Efficient neural network processing via model compression and low-power functional units

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type