Use of dropouts and sparsity for regularization of autoencoders in deep neural networks
Ali, Muhaddisa Barat
Item Usage Stats
MetadataShow full item record
Deep learning has emerged as an e ective pre-training technique for neural networks with many hidden layers. To overcome the over- tting issue, usually large capacity models are used. In this thesis, two methodologies which are frequently utilized in deep neural network literature have been considered. Firstly, for pretraining the performance of sparse autoencoder has been improved by adding p-norm of the sparse penalty term to an over-complete case. This e ciently induces sparsity to the hidden layers of a deep network to overcome over- tting issues. At the end of the training, features constructed for each layer end up with a variety of useful information to initialize a deep network. The accuracy obtained is comparable to the conventional sparse autoencoder technique. Secondly, the large capacity networks su er from complex co-adaptations between the hidden layers by combining the predictions of each unit in the previous layer to generate the features of the next layer. This results to certain redundant features. So, the idea we propose is to induce a threshold level on the hidden activations to allow only the highest active units to participate in the reconstruction of the features and suppressing the e ect of less active units in the optimization. This is implemented by dropping out k-lowest hidden units while retaining the rest. Our simulations con rm the hypothesis that the k-lowest dropouts help the optimization in both the pre-training and ne-tuning phases giving rise to the internal distributed representations for better generalization. Moreover, this model gives quick convergence than the conventional dropout method. In classi cation task on MNIST dataset, the proposed idea gives the comparable results with the previous regularization techniques such as denoising autoencoders, use of recti er linear units combined with standard regularizations. The deep networks constructed from the combination of our models achieve favorably the similar state of the art results obtained by dropout idea with less time complexity making them well suited to large problem sizes.