Online learning in limit order book trade execution

Akbarzadeh, N.; Tekin, Cem; van der Schaar, M.

Online learning in limit order book trade execution

buir.contributor.author	Tekin, Cem
dc.citation.epage	4641	en_US
dc.citation.issueNumber	17	en_US
dc.citation.spage	4626	en_US
dc.citation.volumeNumber	66	en_US
dc.contributor.author	Akbarzadeh, N.	en_US
dc.contributor.author	Tekin, Cem	en_US
dc.contributor.author	van der Schaar, M.	en_US
dc.date.accessioned	2019-02-21T16:06:04Z
dc.date.available	2019-02-21T16:06:04Z
dc.date.issued	2018	en_US
dc.department	Department of Electrical and Electronics Engineering	en_US
dc.description.abstract	In this paper, we propose an online learning algorithm for optimal execution in the limit order book of a financial asset. Given a certain number of shares to sell and an allocated time window to complete the transaction, the proposed algorithm dynamically learns the optimal number of shares to sell via market orders at prespecified time slots within the allocated time interval. We model this problem as a Markov Decision Process (MDP), which is then solved by dynamic programming. First, we prove that the optimal policy has a specific form, which requires either selling no shares or the maximum allowed amount of shares at each time slot. Then, we consider the learning problem, in which the state transition probabilities are unknown and need to be learned on the fly. We propose a learning algorithm that exploits the form of the optimal policy when choosing the amount to trade. Interestingly, this algorithm achieves bounded regret with respect to the optimal policy computed based on the complete knowledge of the market dynamics. Our numerical results on several finance datasets show that the proposed algorithm performs significantly better than the traditional Q-learning algorithm by exploiting the structure of the problem.
dc.description.sponsorship	Manuscript received December 16, 2017; revised May 15, 2018; accepted June 27, 2018. Date of publication July 20, 2018; date of current version August 2, 2018. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Mark A. Davenport. The work of M. van der Schaar is supported by the National Science Foundation under NSF Award 1524417 and NSF Award 1462245. This work was presented in part at the Fifth IEEE Global Conference on Signal and Information Processing, Montreal, Quebec, November 2017. (Corresponding author: Nima Akbarzadeh.) N. Akbarzadeh is with the Department of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0E9, Canada, and also with the Department of Electrical and Electronics Engineering, Bilkent University, Ankara 06800, Turkey (e-mail:,nima.akbarzadeh@mail.mcgill.ca).
dc.identifier.doi	10.1109/TSP.2018.2858188
dc.identifier.issn	1053-587X
dc.identifier.uri	http://hdl.handle.net/11693/50289
dc.language.iso	English
dc.publisher	Institute of Electrical and Electronics Engineers
dc.relation.isversionof	https://doi.org/10.1109/TSP.2018.2858188
dc.relation.project	Bilkent Üniversitesi - National Science Foundation, NSF - McGill University, McGill - IEEE Foundation, IEEE - National Science Foundation, NSF: 1462245, 1524417
dc.source.title	IEEE Transactions on Signal Processing	en_US
dc.subject	bounded regret	en_US
dc.subject	Dynamic programming	en_US
dc.subject	Limit order book	en_US
dc.subject	Markov decision process	en_US
dc.subject	Online learning	en_US
dc.title	Online learning in limit order book trade execution	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Online_Learning_in_Limit_Order_Book_Trade_Execution.pdf
Size:: 990.75 KB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

Collections

Scholarly Publications - Electrical and Electronics Engineering