Memory efficient filtering algorithms for convolutional neural networks
Item Usage Stats
Deployment of state of the art CNN architectures like Xception, ResNet and GoogleNet in resource limited devices is a big challenge. These architectures consist of many layers and millions of parameters. Moreover, they require billions of floating point operations to inference just an image. Therefore, memory space needed to store parameters and to execute them are the main constraints for efficient convolutional neural network architectures. In this thesis, we examine Winograd’s minimal filtering algorithms to reduce number of floating point operations performed in convolutional layers. We reduce the number of multiplications x2.25 times without any accuracy loss. Moreover, we investigate, sparse and quantized Winograd’s algorithms so that we can make conventional Winograd algorithms more memory efficient. We propose a linear quantization scheme to quantize weights of the networks more than 1-bit. We use ReLU activation function and Targeted Dropout which is a variant of Dropout to prune transformed inputs of Winograd algorithm. We binarize weights so that most arithmetic operations are converted to bit-wise operations. We conduct several experiments on CIFAR10 and CIFAR100 datasets and discuss the classification performances of both conventional and modified Winograd minimal filtering algorithms. We achieve less than 1.9% classification error with ReLU-ed Winograd CNN compared to conventional Winograd. We reduce memory requirements up to x32 times by binarizing weights of ReLU-ed Winograd CNN, and in return we incur around 2% accuracy loss. Lastly, for applications which are less tolerant to accuracy loss, rather than binarizing weights we quantize them to 2-bit, 4-bit and 8-bit. Our quantized ReLU-ed Winograd CNNs reach same accuracy levels as ReLU-ed Winograd CNN.
KeywordsWinograd’s minimal filtering algorithms