Browsing by Subject "Stochastic gradient descent (SGD)"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Open Access Efficient online learning algorithms based on LSTM neural networks(Institute of Electrical and Electronics Engineers, 2018) Ergen, Tolga; Kozat, Süleyman SerdarWe investigate online nonlinear regression and introduce novel regression structures based on the long short term memory (LSTM) networks. For the introduced structures, we also provide highly efficient and effective online training methods. To train these novel LSTM-based structures, we put the underlying architecture in a state space form and introduce highly efficient and effective particle filtering (PF)-based updates. We also provide stochastic gradient descent and extended Kalman filter-based updates. Our PF-based training method guarantees convergence to the optimal parameter estimation in the mean square error sense provided that we have a sufficient number of particles and satisfy certain technical conditions. More importantly, we achieve this performance with a computational complexity in the order of the first-order gradient-based methods by controlling the number of particles. Since our approach is generic, we also introduce a gated recurrent unit (GRU)-based approach by directly replacing the LSTM architecture with the GRU architecture, where we demonstrate the superiority of our LSTM-based approach in the sequential prediction task via different real life data sets. In addition, the experimental results illustrate significant performance improvements achieved by the introduced algorithms with respect to the conventional methods over several different benchmark real life data sets.Item Open Access Joint optimization of linear and nonlinear models for sequential regression(Academic Press, 2022-12) Fazla, Arda; Aydin, Mustafa E.; Kozat, Süleyman S.We investigate nonlinear regression and introduce a novel approach based on the joint optimization of linear and nonlinear models. In order to capture both the nonlinear and linear characteristics in sequential data, we model the underlying data as a combination of linear and nonlinear models, where we optimize the models jointly to minimize the final regression error. As the nonlinear model, we employ a differentiable version of the boosted decision trees. As the linear model, we use the well-known SARIMAX model. Our approach is generic so that any differentiable nonlinear or linear model can be readily employed provided that they are differentiable. By this joint optimization, we alleviate the well-known underfitting and overfitting problems in modeling sequential data. Through our experiments on synthetic and real-life data, we demonstrate significant improvements over individual components as well as the combination/mixture methods in the literature.Item Open Access Time and context sensitive optimization of machine learning models for sequential data prediction(Bilkent University, 2024-07) Fazla, ArdaWe investigate the nonlinear prediction of sequential time series data through the mixture/combination of machine learning models. First, we introduce a novel ensemble learning approach that effectively combines multiple base learners in a time-aware and context-sensitive manner. This process involves a weight optimization problem targeting a specific loss function while considering (non)convex constraints on the linear combination of base learners. These constraints are theoretically analyzed under known statistics and are automatically incorporated into the meta-learner as part of the optimization process during training. Next, we introduce a direct two-stage approach based on the combination of linear and nonlinear models, where we jointly optimize the parameters of both models to minimize the final regression error. By this joint optimization, we alleviate the well-known underfitting and overfitting problems in modeling sequential data. As the linear model, we use a traditional linear time series forecasting model (SARIMAX) and as the nonlinear model, we use boosted soft decision trees (Soft GBDT). For both of our approaches, we illustrate notable performance improvements on real-life data and well-known competition datasets compared to traditional ensemble/mixture techniques and state-of-the-art forecasting models in the machine learning literature. Additionally, we make the source code of both of our approaches publicly available to facilitate further research and comparison.