Online training of LSTM networks in distributed systems for variable length data sequences

buir.contributor.authorKozat, Serdar
dc.citation.epage5165en_US
dc.citation.issueNumber10en_US
dc.citation.spage5159en_US
dc.citation.volumeNumber29en_US
dc.contributor.authorErgen, T.en_US
dc.contributor.authorKozat, Serdaren_US
dc.date.accessioned2019-02-21T16:05:48Z
dc.date.available2019-02-21T16:05:48Z
dc.date.issued2018en_US
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.description.abstractIn this brief, we investigate online training of long short term memory (LSTM) architectures in a distributed network of nodes, where each node employs an LSTM-based structure for online regression. In particular, each node sequentially receives a variable length data sequence with its label and can only exchange information with its neighbors to train the LSTM architecture. We first provide a generic LSTM-based regression structure for each node. In order to train this structure, we put the LSTM equations in a nonlinear state-space form for each node and then introduce a highly effective and efficient distributed particle filtering (DPF)-based training algorithm. We also introduce a distributed extended Kalman filtering-based training algorithm for comparison. Here, our DPF-based training algorithm guarantees convergence to the performance of the optimal LSTM coefficients in the mean square error sense under certain conditions. We achieve this performance with communication and computational complexity in the order of the first-order gradient-based methods. Through both simulated and real-life examples, we illustrate significant performance improvements with respect to the state-of-The-Art methods.
dc.description.provenanceMade available in DSpace on 2019-02-21T16:05:48Z (GMT). No. of bitstreams: 1 Bilkent-research-paper.pdf: 222869 bytes, checksum: 842af2b9bd649e7f548593affdbafbb3 (MD5) Previous issue date: 2018en
dc.description.sponsorshipManuscript received June 15, 2017; revised September 8, 2017; accepted November 1, 2017. Date of publication December 7, 2017; date of current version September 17, 2018. This work was supported by TUBITAK under Contract 115E917. (Corresponding author: Tolga Ergen.) The authors are with the Department of Electrical and Electronics Engineering, Bilkent University, 06800 Ankara, Turkey (e-mail: ergen@ee.bilkent.edu.tr; kozat@ee.bilkent.edu.tr). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2017.2770179 2162-237X © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
dc.identifier.doi10.1109/TNNLS.2017.2770179
dc.identifier.issn2162-237X
dc.identifier.urihttp://hdl.handle.net/11693/50274
dc.language.isoEnglish
dc.publisherInstitute of Electrical and Electronics Engineers
dc.relation.isversionofhttps://doi.org/10.1109/TNNLS.2017.2770179
dc.relation.projectBilkent Üniversitesi - IEEE Foundation, IEEE - 115E917
dc.source.titleIEEE Transactions on Neural Networks and Learning Systemsen_US
dc.subjectDistributed learningen_US
dc.subjectExtended Kalman filtering (EKF)en_US
dc.subjectLong short term memory (LSTM) networksen_US
dc.subjectOnline learningen_US
dc.subjectParticle filteringen_US
dc.titleOnline training of LSTM networks in distributed systems for variable length data sequencesen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Online_training-of_LSTM_networks_in_distributed_systems_for_variable_lenght_data_sequences.pdf
Size:
683.84 KB
Format:
Adobe Portable Document Format
Description:
Full printable version