Browsing by Subject "Markov Decision Process"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Open Access Adaptive energy management for solar energy harvesting wireless sensor nodes(Bilkent University, 2018-09) Aydin, Abdul KerimWireless Sensor Networks (WSN) will have a key role in the upcoming era of the Internet of Things (IoT) as they will be forming the basis of communication infrastructure. Energy harvesting has been a widely used instrument for prolonging the battery life and enhancing the quality of service (QoS) of sensor nodes (SN). In this study, we investigate adaptive transmission policies for a solar-powered wireless sensor node which is tasked with sending status updates to a gateway as frequently as possible with energy-neutral operation constraints. On the basis of empirical data, we model the daily variations of the solar energy harvesting process with a Discrete Time Markov Chain (DTMC). When the number of states of the DTMC is increased, the harvesting process is modeled more accurately. Using the DTMC model, we formulate the energy management problem of the WSN node as a Markov Decision Process (MDP); and based on this model, we use the policy iteration algorithm to obtain optimal energy management policies so as to minimize the average Age of Information (AoI) of the corresponding status update system. We validate the effectiveness of the proposed approach using datasets belonging to two different locations with 20 years of solar radiance data.Item Open Access Dynamic wavelength allocation in IP/WDM metro access networks(Bilkent University, 2008) Yetginer, EmreIncreasing demand for bandwidth and proliferation of packet based traffic have been causing architectural changes in the communications infrastructure. In this evolution, metro networks face both the capacity and dynamic adaptability constraints. The increase in the access and backbone speeds result in high bandwidth requirements, whereas the popularity of wireless access and limited number of customers in metro area necessitates traffic adaptability. Traditional architecture which has been optimized for carrying circuit-switched connections, is far from meeting these requirements. Recently, several architectures have been proposed for future metro access networks. Nearly all of these solutions support dynamic allocation of bandwidth to follow fluctuations in the traffic demand. However, reconfiguration policies that can be used in this process have not been fully explored yet. In this thesis, dynamic wavelength allocation (DWA) policies for IP/WDM metro access networks with reconfiguration delays are considered. Reconfiguration actions incur a cost since a portion of the capacity becomes idle in the reconfiguration period due to the signalling latencies and tuning times of optical transceivers. Exact formulation of the DWA problem is developed as a Markov Decision Process (MDP) and a new cost function is proposed to attain both throughput efficiency and fairness. For larger problems, a heuristic approach based on first passage probabilities is developed. The performance of the method is evaluated under both stationary and non-stationary traffic conditions. The effects of relevant network and traffic parameters, such as delay and flow size are also discussed. Finally, performance bounds for the DWA methods are derived.Item Open Access Implementing condition-based maintenance: optimizing maintenance decisions in multi-component systems using Markov Decision Processes(Bilkent University, 2021-07) Nakhost, Mahsa AbbaszadehMaintenance scheduling has been playing a pivotal role in many industrial areas since unexpected failures result in costly actions to bring the system back to the operating state. An advanced maintenance policy is condition-based maintenance (CBM), which schedules the maintenance actions according to the data collected from the system inspections. In this study, we present a realistic discretization method for a maintainable multi-component system that is subject to periodic in-spection. We consider CBM policy and age-based maintenance policy for critical components and non-critical components of the system, respectively. We define ageneral coststructureincludingan operatingcost which isafunction ofsystem reliability, and we explain how this operating cost must be assigned in discrete and continuous state space. We use the Markov Decision Process (MDP) to find the optimal maintenance policy for the discrete control problem. Using the MDP model, we prove that the threshold policy is not always optimal, which is the most well-known policy in the CBM literature. Finally, we propose two policies, RL-KIT and RI-MIT, to implement the policy found by MDP in the continuous environment. We show that either of these policies can be optimal depending on the system of interest using simulation.Item Open Access Online learning in structured Markov decision processes(Bilkent University, 2017-07) Akbarzadeh, NimaThis thesis proposes three new multi-armed bandit problems, in which the learner proceeds in a sequence of rounds where each round is a Markov Decision Process (MDP). The learner's goal is to maximize its cumulative reward without any a priori knowledge on the state transition probabilities. The rst problem considers an MDP with sorted states and a continuation action that moves the learner to an adjacent state; and a terminal action that moves the learner to a terminal state (goal or dead-end state). In this problem, a round ends and the next round starts when a terminal state is reached, and the aim of the learner in each round is to reach the goal state. First, the structure of the optimal policy is derived. Then, the regret of the learner with respect to an oracle, who takes optimal actions in each round is de ned, and a learning algorithm that exploits the structure of the optimal policy is proposed. Finally, it is shown that the regret either increases logarithmically over rounds or becomes bounded. In the second problem, we investigate the personalization of a clinical treatment. This process is modeled as a goal-oriented MDP with dead-end states. Moreover, the state transition probabilities of the MDP depends on the context of the patients. An algorithm that uses the rule of optimism in face of uncertainty is proposed to maximize the number of rounds in which the goal state is reached. In the third problem, we propose an online learning algorithm for optimal execution in the limit order book of a nancial asset. Given a certain amount of shares to sell and an allocated time to complete the transaction, the proposed algorithm dynamically learns the optimal number of shares to sell at each time slot of the allocated time. We model this problem as an MDP, and derive the form of the optimal policy.