Browsing by Subject "Markov decision process"
Now showing 1 - 5 of 5
Results Per Page
Sort Options
Item Open Access Dynamic wavelength allocation in IP/WDM metro access networks(Institute of Electrical and Electronics Engineers, 2008-04) Yetginer, E.; Karasan, E.Increasing demand for bandwidth and proliferation of packet based traffic represent a challenge for today’s metro networks, which have been traditionally designed to carry circuitswitched connections. The problem is further complicated by the constraints of cost efficiency and traffic adaptability, imposed by the limited customer base in the metro area. Recently, several architectures have been proposed for future metro access networks. Nearly all of these solutions support dynamic reconfigurability, however reconfiguration policies have not been fully explored yet. In this paper, reconfiguration policies for IP/WDM metro access networks with switching delays are considered, where dynamic reconfiguration corresponds to dynamic allocation of wavelengths to access nodes. Exact formulation of the dynamic wavelength allocation (DWA) problem is developed as a Markov Decision Process (MDP) and a new cost function is proposed to attain both throughput efficiency and fairness. For larger problems, a heuristic approach based on first passage probabilities is developed and shown to yield nearly optimum performance through simulations.Item Open Access Online learning in limit order book trade execution(Institute of Electrical and Electronics Engineers, 2018) Akbarzadeh, N.; Tekin, Cem; van der Schaar, M.In this paper, we propose an online learning algorithm for optimal execution in the limit order book of a financial asset. Given a certain number of shares to sell and an allocated time window to complete the transaction, the proposed algorithm dynamically learns the optimal number of shares to sell via market orders at prespecified time slots within the allocated time interval. We model this problem as a Markov Decision Process (MDP), which is then solved by dynamic programming. First, we prove that the optimal policy has a specific form, which requires either selling no shares or the maximum allowed amount of shares at each time slot. Then, we consider the learning problem, in which the state transition probabilities are unknown and need to be learned on the fly. We propose a learning algorithm that exploits the form of the optimal policy when choosing the amount to trade. Interestingly, this algorithm achieves bounded regret with respect to the optimal policy computed based on the complete knowledge of the market dynamics. Our numerical results on several finance datasets show that the proposed algorithm performs significantly better than the traditional Q-learning algorithm by exploiting the structure of the problem.Item Open Access Online learning in limit order book trade execution(IEEE, 2018) Akbarzadeh, Nima; Tekin, Cem; Schaar, M. V.In this paper, we propose an online learning algorithm for optimal execution in the limit order book of a financial asset. Given a certain amount of shares to sell and an allocated time window to complete the transaction, the proposed algorithm dynamically learns the optimal number of shares to sell via market orders at pre-specified time-slots within the allocated time interval. We model this problem as a Markov Decision Process (MDP), which is then solved by dynamic programming. First, we prove that the optimal policy has a specific form, which requires either selling no shares or the maximum allowed amount of shares at each time slot. Then, we consider the learning problem, where the state transition probabilities are unknown and need to be learned on-the-fly. We propose a learning algorithm that exploits the form of the optimal policy when choosing the amount to trade. Our numerical results show that the proposed algorithm performs significantly better than the traditional Q-learning algorithm by exploiting the structure of the problem.Item Open Access Optimal timing of living-donor liver transplantation under risk-aversion(Bilkent University, 2016-07) Köse, Ümit EmreLiver transplantation, which can be performed from either living-donors or cadavers, is the only viable treatment for end-stage liver diseases. In this study, we focus on living-donor liver transplantation. The timing of the transplantation from a living-donor is crucial as it affects the quality and the length of the patient's lifetime. The studies in the literature use risk-neutral Markov decision processes (MDPs) to optimize the timing of transplantation. However, in real life, the patients and the physicians are usually risk-averse, therefore, those risk neutral models fail to represent the real behavior. In this study, we model the living-donor liver transplantation problem as a risk-averse MDP. We incorporate risk-aversion into the MDP model using dynamic coherent measures of risk, and in order to be able to re ect varying risk preferences of the decision makers, we use first-order mean-semi-deviation and mean-AVaR as the one-step conditional measures of risk. We obtain optimal policies for patients having cirrhotic diseases or hepatitis B under different risk preferences and organs of different quality. We also measure the sensitivity of the optimal policies to the transition probabilities and to the quality of life. We further perform a simulation study in order to find the distribution of lifetime under the risk-averse optimal policies.Item Open Access Risk-averse multi-armed bandit problem(Bilkent University, 2021-08) Malekipirbazari, MiladIn classical multi-armed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision maker is risk-neutral. On the other hand, the decision makers are risk-averse in some real life applications. In this study, we design a new setting for the classical multi-armed bandit problem (MAB) based on the concept of dynamic risk measures, where the aim is to find a policy with the best risk adjusted total discounted outcome. We provide theoretical analysis of MAB with respect to this novel setting, and propose two different priority-index heuristics giving risk-averse allocation indices with structures similar to Gittins index. The first proposed heuristic is based on Lagrangian duality and the indices are expressed as the Lagrangian multiplier corresponding to the activation constraint. In the second part, we present a theoretical analysis based on Whittle’s retirement problem and propose a gener-alized version of restart-in-state formulation of the Gittins index to compute the proposed risk-averse allocation indices. Finally, as a practical application of the proposed methods, we focus on optimal design of clinical trials and we apply our risk-averse MAB approach to perform risk-averse treatment allocation based on a Bayesian Bernoulli model. We evaluate the performance of our approach against other allocation rules, including fixed randomization.