Bilkent Repository :: Browsing by Subject "Long short-term memory (LSTM)"

Browsing by Subject "Long short-term memory (LSTM)"

Now showing 1 - 5 of 5

Open Access
End-to-end hybrid architectures for effective sequential data prediction
(Bilkent University, 2023-08) Aydın, Mustafa Enes
We investigate nonlinear prediction in an online setting and introduce two hybrid models that effectively mitigate, via end-to-end architectures, the need for hand-designed features and manual model selection issues of conventional nonlinear prediction/regression methods. Particularly, we first use an enhanced recurrent neural network (LSTM) to extract features from sequential signals, while pre-serving the state information, i.e., the history, and soft gradient boosted decision trees (sGBDT) to produce the final output. The connection is in an end-to-end fashion and we jointly optimize the whole architecture using stochastic gradient descent. Secondly, we again use recursive structures (LSTM) for automatic fea-ture extraction out of raw data but accompany it with a traditional linear time series model (SARIMAX) to deal with the intricacies of the sequential data, e.g., seasonality. The unification of the models is again in a joint manner; it is through a single state space and we optimize the entire architecture using particle filter-ing. The proposed frameworks are generic so that one can use other recurrent architectures, e.g., GRUs, and differentiable machine learning algorithms as well as time series models that have state space representations in lieu of the specific models presented. We demonstrate the learning behavior of the models on syn-thetic data and the significant performance improvements over the conventional methods and the disjoint counterparts over various real life datasets, with which we also show the generic nature of the frameworks. Furthermore, we openly share the source code of the proposed methods to facilitate further research.
Open Access
Energy-Efficient LSTM networks for online learning
(IEEE, 2020) Ergen, T.; Mirza, Ali H.; Kozat, Süleyman Serdar
We investigate variable-length data regression in an online setting and introduce an energy-efficient regression structure build on long short-term memory (LSTM) networks. For this structure, we also introduce highly effective online training algorithms. We first provide a generic LSTM-based regression structure for variable-length input sequences. To reduce the complexity of this structure, we then replace the regular multiplication operations with an energy-efficient operator, i.e., the ef-operator. To further reduce the complexity, we apply factorizations to the weight matrices in the LSTM network so that the total number of parameters to be trained is significantly reduced. We then introduce online training algorithms based on the stochastic gradient descent (SGD) and exponentiated gradient (EG) algorithms to learn the parameters of the introduced network. Thus, we obtain highly efficient and effective online learning algorithms based on the LSTM network. Thanks to our generic approach, we also provide and simulate an energy-efficient gated recurrent unit (GRU) network in our experiments. Through an extensive set of experiments, we illustrate significant performance gains and complexity reductions achieved by the introduced algorithms with respect to the conventional methods.
Open Access
A hybrid framework for sequential data prediction with end-to-end optimization
(Elsevier, 2022-08-08) Aydin, M.E.; Kozat, Süleyman S.
We investigate nonlinear prediction in an online setting and introduce a hybrid model that effectively mitigates, via an end-to-end architecture, the need for hand-designed features and manual model selection issues of conventional nonlinear prediction/regression methods. In particular, we use recursive structures to extract features from sequential signals, while preserving the state information, i.e., the history, and boosted decision trees to produce the final output. The connection is in an end-to-end fashion and we jointly optimize the whole architecture using stochastic gradient descent, for which we also provide the backward pass update equations. In particular, we employ a recurrent neural network (LSTM) for adaptive feature extraction from sequential data and a gradient boosting machinery (soft GBDT) for effective supervised regression. Our framework is generic so that one can use other deep learning architectures for feature extraction (such as RNNs and GRUs) and machine learning algorithms for decision making as long as they are differentiable. We demonstrate the learning behavior of our algorithm on synthetic data and the significant performance improvements over the conventional methods over various real life datasets. Furthermore, we openly share the source code of the proposed method to facilitate further research. © 2022 Elsevier Inc.
Open Access
Nonuniformly sampled data processing using LSTM networks
(Institute of Electrical and Electronics Engineers, 2019) Şahin, Safa Onur; Kozat, Süleyman Serdar
We investigate classification and regression for nonuniformly sampled variable length sequential data and introduce a novel long short-term memory (LSTM) architecture. In particular, we extend the classical LSTM network with additional time gates, which incorporate the time information as a nonlinear scaling factor on the conventional gates. We also provide forward-pass and backward-pass update equations for the proposed LSTM architecture. We show that our approach is superior to the classical LSTM architecture when there is correlation between time samples. In our experiments, we achieve significant performance gains with respect to the classical LSTM and phased-LSTM architectures. In this sense, the proposed LSTM architecture is highly appealing for the applications involving nonuniformly sampled sequential data. IEEE
Open Access
Unsupervised anomaly detection with LSTM neural networks
(IEEE, 2020) Ergen, T.; Kozat, Süleyman Serdar
We investigate anomaly detection in an unsupervised framework and introduce long short-term memory (LSTM) neural network-based algorithms. In particular, given variable length data sequences, we first pass these sequences through our LSTM-based structure and obtain fixed-length sequences. We then find a decision function for our anomaly detectors based on the one-class support vector machines (OC-SVMs) and support vector data description (SVDD) algorithms. As the first time in the literature, we jointly train and optimize the parameters of the LSTM architecture and the OC-SVM (or SVDD) algorithm using highly effective gradient and quadratic programming-based training methods. To apply the gradient-based training method, we modify the original objective criteria of the OC-SVM and SVDD algorithms, where we prove the convergence of the modified objective criteria to the original criteria. We also provide extensions of our unsupervised formulation to the semisupervised and fully supervised frameworks. Thus, we obtain anomaly detection algorithms that can process variable length data sequences while providing high performance, especially for time series data. Our approach is generic so that we also apply this approach to the gated recurrent unit (GRU) architecture by directly replacing our LSTM-based structure with the GRU-based structure. In our experiments, we illustrate significant performance gains achieved by our algorithms with respect to the conventional methods.

Browsing by Subject "Long short-term memory (LSTM)"

Results Per Page

Sort Options