Efficient neural network processing via model compression and low-power functional units

buir.advisorÇukur, Emine Ülkü Sarıtaş
buir.co-advisorÇakır, Burçin
dc.contributor.authorKarakuloğlu, Ali Necat
dc.date.accessioned2025-01-06T11:39:53Z
dc.date.available2025-01-06T11:39:53Z
dc.date.copyright2024-12
dc.date.issued2024-12
dc.date.submitted2024-12-19
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionIncludes bibliographical references (leaves 63-69).en_US
dc.description.abstractWe present a framework that contributes neural network optimization through novel methods in pruning, quantization, and arithmetic unit design for resource-constrained devices to datacenters. The first component is a pruning method that employs an importance metric to measure and selectively eliminate less critical neurons and weights, achieving high compression rates up to 99.9% without sacrificing significant accuracy. This idea is improved by a novel pruning schedule that optimizes the balance between compression and model’s generalization capa-bility. Next, we introduce a quantization method that combines with pruning to improve hardware compatibility for floating point format, offering efficient model compression and fast computation and general usability. Finally, we propose a logarithmic arithmetic unit that designed as an energy-efficient alternative to conventional floating-point operations, providing precise and configurable processing without relying on bulky lookup tables. Extensive evaluations across different datasets and CUDA-based simulations and Verilog based hardware designs indicate that our approaches outperforms existing methods, making it a powerful solution for deploying artificial intelligence models more efficiently.
dc.description.provenanceSubmitted by Betül Özen (ozen@bilkent.edu.tr) on 2025-01-06T11:39:53Z No. of bitstreams: 1 B149015.pdf: 7549474 bytes, checksum: de0502b253de3484b64ab0ccbba4c937 (MD5)en
dc.description.provenanceMade available in DSpace on 2025-01-06T11:39:53Z (GMT). No. of bitstreams: 1 B149015.pdf: 7549474 bytes, checksum: de0502b253de3484b64ab0ccbba4c937 (MD5) Previous issue date: 2024-12en
dc.description.statementofresponsibilityby Ali Necat Karakuloğlu
dc.embargo.release2025-06-19
dc.format.extentxiv, 89 leaves : illustrations, charts ; 30 cm.
dc.identifier.itemidB149015
dc.identifier.urihttps://hdl.handle.net/11693/115942
dc.language.isoEnglish
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectEfficient
dc.subjectNeural Networks
dc.subjectPruning
dc.subjectQuantization
dc.subjectApproximate computing
dc.titleEfficient neural network processing via model compression and low-power functional units
dc.title.alternativeModel sıkıştırma ve düşük güç fonksiyonel ünitelerle verimli sinir ağı işleme
dc.typeThesis
thesis.degree.disciplineElectrical and Electronic Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
B149015.pdf
Size:
7.2 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: