Browsing by Subject "Ef-operator"

Now showing 1 - 2 of 2

Unknown
Energy-Efficient LSTM networks for online learning
(IEEE, 2020) Ergen, T.; Mirza, Ali H.; Kozat, Süleyman Serdar
We investigate variable-length data regression in an online setting and introduce an energy-efficient regression structure build on long short-term memory (LSTM) networks. For this structure, we also introduce highly effective online training algorithms. We first provide a generic LSTM-based regression structure for variable-length input sequences. To reduce the complexity of this structure, we then replace the regular multiplication operations with an energy-efficient operator, i.e., the ef-operator. To further reduce the complexity, we apply factorizations to the weight matrices in the LSTM network so that the total number of parameters to be trained is significantly reduced. We then introduce online training algorithms based on the stochastic gradient descent (SGD) and exponentiated gradient (EG) algorithms to learn the parameters of the introduced network. Thus, we obtain highly efficient and effective online learning algorithms based on the LSTM network. Thanks to our generic approach, we also provide and simulate an energy-efficient gated recurrent unit (GRU) network in our experiments. Through an extensive set of experiments, we illustrate significant performance gains and complexity reductions achieved by the introduced algorithms with respect to the conventional methods.
Unknown
A highly efficient recurrent neural network architecture for data regression
(IEEE, 2018) Ergen, Tolga; Ceyani, Emir
In this paper, we study online nonlinear data regression and propose a highly efficient long short term memory (LSTM) network based architecture. Here, we also introduce on-line training algorithms to learn the parameters of the introduced architecture. We first propose an LSTM based architecture for data regression. To diminish the complexity of this architecture, we use an energy efficient operator (ef-operator) instead of the multiplication operation. We then factorize the matrices of the LSTM network to reduce the total number of parameters to be learned. In order to train the parameters of this structure, we introduce online learning methods based on the exponentiated gradient (EG) and stochastic gradient descent (SGD) algorithms. Experimental results demonstrate considerable performance and efficiency improvements provided by the introduced architecture.