Online learning in limit order book trade execution

buir.contributor.authorTekin, Cem
dc.citation.epage4641en_US
dc.citation.issueNumber17en_US
dc.citation.spage4626en_US
dc.citation.volumeNumber66en_US
dc.contributor.authorAkbarzadeh, N.en_US
dc.contributor.authorTekin, Cemen_US
dc.contributor.authorvan der Schaar, M.en_US
dc.date.accessioned2019-02-21T16:06:04Z
dc.date.available2019-02-21T16:06:04Z
dc.date.issued2018en_US
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.description.abstractIn this paper, we propose an online learning algorithm for optimal execution in the limit order book of a financial asset. Given a certain number of shares to sell and an allocated time window to complete the transaction, the proposed algorithm dynamically learns the optimal number of shares to sell via market orders at prespecified time slots within the allocated time interval. We model this problem as a Markov Decision Process (MDP), which is then solved by dynamic programming. First, we prove that the optimal policy has a specific form, which requires either selling no shares or the maximum allowed amount of shares at each time slot. Then, we consider the learning problem, in which the state transition probabilities are unknown and need to be learned on the fly. We propose a learning algorithm that exploits the form of the optimal policy when choosing the amount to trade. Interestingly, this algorithm achieves bounded regret with respect to the optimal policy computed based on the complete knowledge of the market dynamics. Our numerical results on several finance datasets show that the proposed algorithm performs significantly better than the traditional Q-learning algorithm by exploiting the structure of the problem.
dc.description.provenanceMade available in DSpace on 2019-02-21T16:06:04Z (GMT). No. of bitstreams: 1 Bilkent-research-paper.pdf: 222869 bytes, checksum: 842af2b9bd649e7f548593affdbafbb3 (MD5) Previous issue date: 2018en
dc.description.sponsorshipManuscript received December 16, 2017; revised May 15, 2018; accepted June 27, 2018. Date of publication July 20, 2018; date of current version August 2, 2018. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Mark A. Davenport. The work of M. van der Schaar is supported by the National Science Foundation under NSF Award 1524417 and NSF Award 1462245. This work was presented in part at the Fifth IEEE Global Conference on Signal and Information Processing, Montreal, Quebec, November 2017. (Corresponding author: Nima Akbarzadeh.) N. Akbarzadeh is with the Department of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada, and also with the Department of Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey (e-mail:,nima.akbarzadeh@mail.mcgill.ca).
dc.identifier.doi10.1109/TSP.2018.2858188
dc.identifier.issn1053-587X
dc.identifier.urihttp://hdl.handle.net/11693/50289
dc.language.isoEnglish
dc.publisherInstitute of Electrical and Electronics Engineers
dc.relation.isversionofhttps://doi.org/10.1109/TSP.2018.2858188
dc.relation.projectBilkent Üniversitesi - National Science Foundation, NSF - McGill University, McGill - IEEE Foundation, IEEE - National Science Foundation, NSF: 1462245, 1524417
dc.source.titleIEEE Transactions on Signal Processingen_US
dc.subjectbounded regreten_US
dc.subjectDynamic programmingen_US
dc.subjectLimit order booken_US
dc.subjectMarkov decision processen_US
dc.subjectOnline learningen_US
dc.titleOnline learning in limit order book trade executionen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Online_Learning_in_Limit_Order_Book_Trade_Execution.pdf
Size:
990.75 KB
Format:
Adobe Portable Document Format
Description:
Full printable version