Efficient neural network processing via model compression and low-power functional units

Karakuloğlu, Ali Necat2025-01-062025-01-062024-122024-122024-12-19https://hdl.handle.net/11693/115942Cataloged from PDF version of article.Includes bibliographical references (leaves 63-69).We present a framework that contributes neural network optimization through novel methods in pruning, quantization, and arithmetic unit design for resource-constrained devices to datacenters. The first component is a pruning method that employs an importance metric to measure and selectively eliminate less critical neurons and weights, achieving high compression rates up to 99.9% without sacrificing significant accuracy. This idea is improved by a novel pruning schedule that optimizes the balance between compression and model’s generalization capa-bility. Next, we introduce a quantization method that combines with pruning to improve hardware compatibility for floating point format, offering efficient model compression and fast computation and general usability. Finally, we propose a logarithmic arithmetic unit that designed as an energy-efficient alternative to conventional floating-point operations, providing precise and configurable processing without relying on bulky lookup tables. Extensive evaluations across different datasets and CUDA-based simulations and Verilog based hardware designs indicate that our approaches outperforms existing methods, making it a powerful solution for deploying artificial intelligence models more efficiently.xiv, 89 leaves : illustrations, charts ; 30 cm.Englishinfo:eu-repo/semantics/openAccessEfficientNeural NetworksPruningQuantizationApproximate computingEfficient neural network processing via model compression and low-power functional unitsModel sıkıştırma ve düşük güç fonksiyonel ünitelerle verimli sinir ağı işlemeThesisB149015