Efficient neural network processing via model compression and low-power functional units
buir.advisor | Çukur, Emine Ülkü Sarıtaş | |
buir.co-advisor | Çakır, Burçin | |
dc.contributor.author | Karakuloğlu, Ali Necat | |
dc.date.accessioned | 2025-01-06T11:39:53Z | |
dc.date.available | 2025-01-06T11:39:53Z | |
dc.date.copyright | 2024-12 | |
dc.date.issued | 2024-12 | |
dc.date.submitted | 2024-12-19 | |
dc.description | Cataloged from PDF version of article. | en_US |
dc.description | Includes bibliographical references (leaves 63-69). | en_US |
dc.description.abstract | We present a framework that contributes neural network optimization through novel methods in pruning, quantization, and arithmetic unit design for resource-constrained devices to datacenters. The first component is a pruning method that employs an importance metric to measure and selectively eliminate less critical neurons and weights, achieving high compression rates up to 99.9% without sacrificing significant accuracy. This idea is improved by a novel pruning schedule that optimizes the balance between compression and model’s generalization capa-bility. Next, we introduce a quantization method that combines with pruning to improve hardware compatibility for floating point format, offering efficient model compression and fast computation and general usability. Finally, we propose a logarithmic arithmetic unit that designed as an energy-efficient alternative to conventional floating-point operations, providing precise and configurable processing without relying on bulky lookup tables. Extensive evaluations across different datasets and CUDA-based simulations and Verilog based hardware designs indicate that our approaches outperforms existing methods, making it a powerful solution for deploying artificial intelligence models more efficiently. | |
dc.description.provenance | Submitted by Betül Özen (ozen@bilkent.edu.tr) on 2025-01-06T11:39:53Z No. of bitstreams: 1 B149015.pdf: 7549474 bytes, checksum: de0502b253de3484b64ab0ccbba4c937 (MD5) | en |
dc.description.provenance | Made available in DSpace on 2025-01-06T11:39:53Z (GMT). No. of bitstreams: 1 B149015.pdf: 7549474 bytes, checksum: de0502b253de3484b64ab0ccbba4c937 (MD5) Previous issue date: 2024-12 | en |
dc.description.statementofresponsibility | by Ali Necat Karakuloğlu | |
dc.embargo.release | 2025-06-19 | |
dc.format.extent | xiv, 89 leaves : illustrations, charts ; 30 cm. | |
dc.identifier.itemid | B149015 | |
dc.identifier.uri | https://hdl.handle.net/11693/115942 | |
dc.language.iso | English | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.subject | Efficient | |
dc.subject | Neural Networks | |
dc.subject | Pruning | |
dc.subject | Quantization | |
dc.subject | Approximate computing | |
dc.title | Efficient neural network processing via model compression and low-power functional units | |
dc.title.alternative | Model sıkıştırma ve düşük güç fonksiyonel ünitelerle verimli sinir ağı işleme | |
dc.type | Thesis | |
thesis.degree.discipline | Electrical and Electronic Engineering | |
thesis.degree.grantor | Bilkent University | |
thesis.degree.level | Master's | |
thesis.degree.name | MS (Master of Science) |