Nonlinear regression with hierarchical recurrent neural networks under missing data

buir.contributor.authorŞahin, Safa Onur
buir.contributor.authorKozar, Süleyman Serdar
buir.contributor.orcidŞahin, Safa Onur|0000-0001-8528-058X
buir.contributor.orcidKozat, Süleyman Serdar|0000-0002-6488-3848
dc.citation.epage5025
dc.citation.issueNumber10
dc.citation.spage5012
dc.citation.volumeNumber5
dc.contributor.authorŞahin, Safa Onur
dc.contributor.authorKozat, Süleyman Serdar
dc.date.accessioned2025-02-24T09:16:41Z
dc.date.available2025-02-24T09:16:41Z
dc.date.issued2024-10
dc.departmentDepartment of Electrical and Electronics Engineering
dc.description.abstractWe investigated nonlinear regression of variable length sequential data, where the data suffer from missing inputs. We introduced the hierarchical-LSTM network, which is a novel hierarchical architecture based on the LSTM networks. The hierarchical-LSTM architecture contained a set of LSTM networks, where each LSTM network is trained as an expert for processing the inputs following a particular presence-pattern, i.e., we partition the input space into subspaces in a hierarchical manner based on the presence-patterns and assign specific LSTM networks to these subpatterns. We adaptively combine the outputs of these LSTM networks based on the presence-pattern and construct the final output at each time step. The introduced algorithm protects the LSTM networks against performance losses due to: 1) statistical mismatches commonly faced by the widely used imputation methods; and 2) imputation drift, since our architecture uses only the existing inputs without any assumption on the missing data. In addition, the computational load of our algorithm is less than the computational load of the conventional algorithms in terms of the number of multiplication operations, particularly under high missingness ratios. We emphasize that our architecture can be readily applied to other recurrent architectures such as the RNNs and GRU networks. The hierarchical-LSTM network demonstrates significant performance improvements with respect to the state-of-the-art methods in several different well-known real-life and financial datasets. We also openly share the source code of our algorithm to facilitate other studies and for the reproducibility of our results. Future work may explore the selection of a subset of presence-patterns instead of using all presence-patterns so that one can use hierarchical-LSTM architecture with large window lengths by keeping the number of parameters and the computational load at the same level.
dc.identifier.doi10.1109/TAI.2024.3404414
dc.identifier.eissn2691-4581
dc.identifier.urihttps://hdl.handle.net/11693/116736
dc.language.isoEnglish
dc.publisherIEEE
dc.relation.isversionofhttps://dx.doi.org/10.1109/TAI.2024.3404414
dc.rightsCC BY-NC-ND 4.0 DEED (Attribution-NonCommercial-NoDerivatives 4.0 International)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.source.titleIEEE Transactions on Artificial Intelligence
dc.subjectLong short-term memory (LSTM)
dc.subjectMissing data
dc.subjectMixture of experts
dc.subjectRecurrent neural networks (RNNs)
dc.subjectTime series regression/prediction
dc.titleNonlinear regression with hierarchical recurrent neural networks under missing data
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Nonlinear_Regression_With_Hierarchical_Recurrent_Neural_Networks_Under_Missing_Data.pdf
Size:
1.49 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: