Online learning in limit order book trade execution
buir.contributor.author | Tekin, Cem | |
dc.citation.epage | 4641 | en_US |
dc.citation.issueNumber | 17 | en_US |
dc.citation.spage | 4626 | en_US |
dc.citation.volumeNumber | 66 | en_US |
dc.contributor.author | Akbarzadeh, N. | en_US |
dc.contributor.author | Tekin, Cem | en_US |
dc.contributor.author | van der Schaar, M. | en_US |
dc.date.accessioned | 2019-02-21T16:06:04Z | |
dc.date.available | 2019-02-21T16:06:04Z | |
dc.date.issued | 2018 | en_US |
dc.department | Department of Electrical and Electronics Engineering | en_US |
dc.description.abstract | In this paper, we propose an online learning algorithm for optimal execution in the limit order book of a financial asset. Given a certain number of shares to sell and an allocated time window to complete the transaction, the proposed algorithm dynamically learns the optimal number of shares to sell via market orders at prespecified time slots within the allocated time interval. We model this problem as a Markov Decision Process (MDP), which is then solved by dynamic programming. First, we prove that the optimal policy has a specific form, which requires either selling no shares or the maximum allowed amount of shares at each time slot. Then, we consider the learning problem, in which the state transition probabilities are unknown and need to be learned on the fly. We propose a learning algorithm that exploits the form of the optimal policy when choosing the amount to trade. Interestingly, this algorithm achieves bounded regret with respect to the optimal policy computed based on the complete knowledge of the market dynamics. Our numerical results on several finance datasets show that the proposed algorithm performs significantly better than the traditional Q-learning algorithm by exploiting the structure of the problem. | |
dc.description.provenance | Made available in DSpace on 2019-02-21T16:06:04Z (GMT). No. of bitstreams: 1 Bilkent-research-paper.pdf: 222869 bytes, checksum: 842af2b9bd649e7f548593affdbafbb3 (MD5) Previous issue date: 2018 | en |
dc.description.sponsorship | Manuscript received December 16, 2017; revised May 15, 2018; accepted June 27, 2018. Date of publication July 20, 2018; date of current version August 2, 2018. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Mark A. Davenport. The work of M. van der Schaar is supported by the National Science Foundation under NSF Award 1524417 and NSF Award 1462245. This work was presented in part at the Fifth IEEE Global Conference on Signal and Information Processing, Montreal, Quebec, November 2017. (Corresponding author: Nima Akbarzadeh.) N. Akbarzadeh is with the Department of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada, and also with the Department of Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey (e-mail:,nima.akbarzadeh@mail.mcgill.ca). | |
dc.identifier.doi | 10.1109/TSP.2018.2858188 | |
dc.identifier.issn | 1053-587X | |
dc.identifier.uri | http://hdl.handle.net/11693/50289 | |
dc.language.iso | English | |
dc.publisher | Institute of Electrical and Electronics Engineers | |
dc.relation.isversionof | https://doi.org/10.1109/TSP.2018.2858188 | |
dc.relation.project | Bilkent Üniversitesi - National Science Foundation, NSF - McGill University, McGill - IEEE Foundation, IEEE - National Science Foundation, NSF: 1462245, 1524417 | |
dc.source.title | IEEE Transactions on Signal Processing | en_US |
dc.subject | bounded regret | en_US |
dc.subject | Dynamic programming | en_US |
dc.subject | Limit order book | en_US |
dc.subject | Markov decision process | en_US |
dc.subject | Online learning | en_US |
dc.title | Online learning in limit order book trade execution | en_US |
dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Online_Learning_in_Limit_Order_Book_Trade_Execution.pdf
- Size:
- 990.75 KB
- Format:
- Adobe Portable Document Format
- Description:
- Full printable version