Browsing by Subject "Markov decision processes"

Now showing 1 - 12 of 12

Open Access
The benefits of state aggregation with extreme-point weighting for assemble-to-order systems
(Institute for Operations Research and the Management Sciences (INFORMS), 2018) Nadar, Emre; Akçay, A.; Akan, M.; Scheller Wolf, A.
We provide a new method for solving a very general model of an assemble-toorder system: multiple products, multiple components that may be demanded in different quantities by different products, batch production, random lead times, and lost sales, modeled as a Markov decision process under the discounted cost criterion. A control policy specifies when a batch of components should be produced and whether an arriving demand for each product should be satisfied. As optimal solutions for our model are computationally intractable for even moderately sized systems, we approximate the optimal cost function by reformulating it on an aggregate state space and restricting each aggregate state to be represented by its extreme original states. Our aggregation drastically reduces the value iteration computational burden. We derive an upper bound on the distance between aggregate and optimal solutions. This guarantees that the value iteration algorithm for the original problem initialized with the aggregate solution converges to the optimal solution. We also establish the optimality of a lattice-dependent base-stock and rationing policy in the aggregate problem when certain product and component characteristics are incorporated into the aggregation/disaggregation schemes. This enables us to further alleviate the value iteration computational burden in the aggregate problem by eliminating suboptimal actions. Leveraging all of our results, we can solve the aggregate problem for systems of up to 22 components, with an average distance of 11.09% from the optimal cost in systems of up to 4 components (for which we could solve the original problem to optimality).
Open Access
Energy management for age of information control in solar-powered IoT end devices
(Springer, 2021-07) Aydin, A. K.; Akar, Nail
In this paper, we propose several harvesting-aware energy management policies for solar-powered wireless IoT end devices that asynchronously send status updates for their surrounding environments to a network gateway device. For such devices, we aim at minimizing the average age of information (AoI) metric which has recently been investigated extensively for status update systems. The proposed energy management policies are obtained using discrete-time Markov chain-based modeling of the stochastic intra-day variations of the solar energy harvesting process in conjunction with the average reward Markov decision process formulation. With this approach, energy management policies are constructed by using the time of day and month of year information in addition to the instantaneous values of the age of information and the battery level. The effectiveness of the proposed energy management policies in terms of their capability to reduce the average AoI as well as improving upon the tail of the AoI distribution, is validated with empirical data for a wide range of system parameters.
Open Access
Energy operations management for renewable power producers in electricity markets
(2023-05) Karakoyun, Ece Çiğdem
Renewable energy generation has grown dramatically around the world in recent years, and policies targeted at reducing greenhouse gas emissions that cause global warming are expected to ensure a consistent expansion of renewable power generation in the electricity sector. With the increasing contribution of renewable sources to the overall energy supply, renewable power producers participate in electricity markets where they are imposed to make advance commitment decisions for energy delivery and purchase. Making advance commitments, however, is a complex task due to the inherent intermittency of renewable sources, increasingly volatile electricity prices, and penalties incurred for possible energy imbalances in electricity markets. Integrating renewable sources with energy storage units is among the most effective methods to address this challenging task. Motivated by the recent trends of paired renewable energy generators and storage units, we study the energy commitment, generation and storage problem of a wind power producer who owns a battery and participates in a spot market operating with hourly commitments and settlements. In each time period, the producer decides how much energy to commit to selling to or purchasing from the market in the next time period, how much energy to generate in the wind power plant, and how much energy to charge into or discharge from the battery. The existence of the battery not only helps smooth out imbalances caused by the fluctuating wind output but also enables the producer to respond to price changes in the market. We formulate the wind power producer's problem as a Markov decision process by taking into account the uncertainties in wind speed and electricity price. In the first part of this dissertation, we consider two different problem settings: In the first setting, the producer may choose to deviate from her commitments based on the latest available information, using the battery to support such deviations. In the second setting, the producer is required to fulfill her commitments, using the battery as a back-up source. We numerically examine the effects of system components, imbalance pricing parameters, and negative prices on the producer's profits, curtailment decisions, and imbalance tendencies in each problem setting. We provide managerial insights to renewable power producers in their assessment of energy storage adoption decisions and to power system operators in their understanding of the producers' behavior in the market with their storage capabilities. In the second part of this dissertation, we establish several multi-dimensional structural properties of the optimal profit function such as supermodularity and joint concavity. This enables us to prove the optimality of a state-dependent threshold policy for the storage and commitment decisions under the assumptions of a perfectly efficient system and positive electricity prices. Leveraging this policy structure, we construct two heuristic solution methods for solving the more general problem in which the battery and transmission line can be imperfectly efficient and the price can also be negative. Numerical experiments with data-calibrated instances have revealed the high efficiency and scalability of our solution procedure. In the third part of this dissertation, we characterize the optimal policy structure by taking into account the battery and transmission line efficiency losses and showing the joint concavity of the optimal profit function. In the last part of this dissertation, we consider an alternative problem setting that allows for real-time trading without making any advance commitment. We analytically compare the total cash flows of this setting to those of our original problem setting. We conclude with a numerical investigation of the effect of advance commitment decisions on the producer's energy storage and generation decisions.
Open Access
Experimental Results Indicating Lattice-Dependent Policies May Be Optimal for General Assemble-To-Order Systems
(Wiley-Blackwell, 2016) Nadar, E.; Akan, M.; Scheller Wolf, A.
We consider an assemble-to-order (ATO) system with multiple products, multiple components which may be demanded in different quantities by different products, possible batch ordering of components, random lead times, and lost sales. We model the system as an infinite-horizon Markov decision process under the average cost criterion. A control policy specifies when a batch of components should be produced, and whether an arriving demand for each product should be satisfied. Previous work has shown that a lattice-dependent base-stock and lattice-dependent rationing (LBLR) policy is an optimal stationary policy for a special case of the ATO model presented here (the generalized M-system). In this study, we conduct numerical experiments to evaluate the use of an LBLR policy for our general ATO model as a heuristic, comparing it to two other heuristics from the literature: a state-dependent base-stock and state-dependent rationing (SBSR) policy, and a fixed base-stock and fixed rationing (FBFR) policy. Remarkably, LBLR yields the globally optimal cost in each of more than 22,500 instances of the general problem, outperforming SBSR and FBFR with respect to both objective value (by up to 2.6% and 4.8%, respectively) and computation time (by up to three orders and one order of magnitude, respectively) in 350 of these instances (those on which we compare the heuristics). LBLR and SBSR perform significantly better than FBFR when replenishment batch sizes imperfectly match the component requirements of the most valuable or most highly demanded product. In addition, LBLR substantially outperforms SBSR if it is crucial to hold a significant amount of inventory that must be rationed.
Open Access
Gambler's ruin bandit problem
(IEEE, 2017) Akbarzadeh, Nima; Tekin, Cem
In this paper, we propose a new multi-armed bandit problem called the Gambler's Ruin Bandit Problem (GRBP). In the GRBP, the learner proceeds in a sequence of rounds, where each round is a Markov Decision Process (MDP) with two actions (arms): a continuation action that moves the learner randomly over the state space around the current state; and a terminal action that moves the learner directly into one of the two terminal states (goal and dead-end state). The current round ends when a terminal state is reached, and the learner incurs a positive reward only when the goal state is reached. The objective of the learner is to maximize its long-term reward (expected number of times the goal state is reached), without having any prior knowledge on the state transition probabilities. We first prove a result on the form of the optimal policy for the GRBP. Then, we define the regret of the learner with respect to an omnipotent oracle, which acts optimally in each round, and prove that it increases logarithmically over rounds. We also identify a condition under which the learner's regret is bounded. A potential application of the GRBP is optimal medical treatment assignment, in which the continuation action corresponds to a conservative treatment and the terminal action corresponds to a risky treatment such as surgery.
Open Access
Optimal packet scheduling and rate control for video streaming
(SPIE, 2007) Gürses, E.; Bozdağı-Akar, G.; Akar, Nail
In this paper, we propose a new low-complexity retransmission based optimal video streaming and rate adaptation algorithm. The proposed OSRC (Optimal packet Scheduling and Rate Control) algorithm provides average reward optimal solution to the joint scheduling and rate control problem. The efficacy of the OSRC algorithm is demonstrated against optimal FEC based schemes and results are verified over TFRC (TCP Friendly Rate Control) transport with ns-2 simulations.
Embargo
Optimization of pumped hydro energy storage systems under uncertainty: A review
(Elsevier, 2023-12-20) Toufani, P.; Karakoyun, E. Ç.; Nadar, Emre; Fasso, O. B.; Kocaman, Ayşe Selin
This paper provides an overview of the research dealing with optimization of pumped hydro energy storage (PHES) systems under uncertainty. This overview can potentially stimulate the scientific community’s interest and facilitate future research on this topic. We review the literature from various perspectives, including the optimization problem type, objective function, physical characteristics of the PHES facility, paradigm used to capture uncertainty, and solution method adopted. We then identify several research gaps and future research directions for energy researchers. This review highlights the need for developing optimization models such as Markov decision processes that can represent uncertainties in renewable energy sources and electricity markets more accurately, constructing multi-objective models that consider not only economic but also environmental impacts, investigating underrepresented solar-PHES systems and PHES sizing problems, addressing nonlinear characteristics of PHES facilities, and optimizing bidding strategies in sequential or coordinated electricity markets.
Open Access
Renewable energy system design and operational planning for demand fulfillment
(2024-08) Yurter, Gülin
Renewable energy sources have gained prominence in reducing the dependency on fossil fuels and minimizing their negative environmental impacts. Considering renewables' uncertain and variable nature, an effective design and operational planning of hybrid energy systems is key to success in clean energy transition. We study the optimal design and operational planning problem of hybrid energy systems involving a renewable energy source and a storage unit. We first develop two-stage stochastic mixed-integer programming models to determine the optimal sizing and investment decisions for solar/wind farms co-operated with pumped hydro energy storage facilities in decentralized areas. We then utilize a Markov decision process to find the optimal energy generation and storage decisions for decentralized grid-connected wind farm-battery systems with demand-fulfillment obligations. This is a novel study that compares several pumped hydro energy storage configurations with respect to optimal sizing decisions for system components and allows for uncertainties in electricity price, wind speed, and electricity demand for optimal operational planning. Using real-life data and considering economic benefits, we demonstrate how the renewable energy systems should be designed and managed to mitigate the adverse effects of uncertainties in matching supply with demand.
Open Access
Satisfying due-dates in a job shop with sequence-dependent family set-ups
(Taylor & Francis, 2003) Taner, M. R.; Hodgson, T. J.; King, R. E.; Thoney, K. A.
This paper addresses job shop scheduling with sequence dependent family set-ups. Based on a simple, single-machine dynamic scheduling problem, state dependent scheduling rules for the single machine problem are developed and tested using Markov Decision Processes. Then, a generalized scheduling policy for the job shop problem is established based on a characterization of the optimal policy. The policy is combined with a ‘forecasting’ mechanism to utilize global shop floor information for local dispatching decisions. Computational results show that performance is significantly better than that of existing alternative policies.
Open Access
Scheduling and queue management for information freshness in multi-source status update systems
(2023-09) Gamgam, Ege Orkun
Timely delivery of information to its intended destination is essential in many ex-isting and emerging time-sensitive applications. While conventional performance metrics like delay, throughput, or loss have been extensively studied in the literature, research concerning the management of age-sensitive traffic is relatively immature. Recently, a number of information freshness metrics have been intro-duced for quantifying the timeliness of information in networked systems carrying age-sensitive traffic, primarily the Age of Information (AoI) and peak AoI (PAoI) metrics as well as their alternatives including Age of Synchronization (AoS), ver-sion age, binary freshness, etc. The focus of this thesis is the development and performance modeling of age-agnostic scheduling and queue management policies in various multi-source status update systems carrying age-sensitive traffic, using the recently introduced information freshness metrics. In this thesis, first, the exact distributions of the AoI and PAoI for the probabilistic Generate-At-Will (GAW) and Random Arrival with Single Buffer (RA-SB) servers are studied with general number of heterogeneous information sources with phase-type (PH-type) service time distributions for which an absorbing Continuous-Time Markov Chains (CTMC) based analytical modeling method, namely AMC (Absorbing Markov Chains) method, is proposed. Secondly, a homogeneous multi-source status update system with Poisson information packet arrivals and exponentially distributed service times is studied for which the server is equipped with a queue holding the freshest packet from each source referred to as Single Buffer Per-Source Queueing (SBPSQ). For this case, two SBPSQ-based scheduling policies are studied, namely First Source First Serve (FSFS) and the Earliest Served First Serve (ESFS) policies, using the AMC method, and it is shown that ESFS presents a promising scheduler for this special setting. Third, a general status update system with two heterogeneous information sources is studied, i.e., sources have different priorities and generally distributed service times, for Deterministic GAW (D-GAW) and Deterministic RA-SB (D-RA-SB) servers. The aim in both servers is to minimize the system AoI/AoS that is time-averaged and weighted across the two sources. For the D-GAW server, the optimal update policy is obtained in closed form. A packet replacement policy, referred to as Pattern-based Replacement (PR) policy, is then proposed for the D-RA-SB server based on the optimal policy structure of the D-GAW server. Finally, scheduling in a cache update system is investigated where a remote server delivers time-varying contents of multiple items with heterogeneous popularities and service times to a local cache so as to maximize the weighted sum binary freshness of the system, and the server is equipped with a queue that holds the most up-to-date content for each item. A Water-filling based Scheduling (WFS) policy and its extension, namely Extended WFS (E-WFS) policy, are proposed based on convex optimization applied to a relaxation of the original system, with low computational complexity and near optimal weighted sum binary freshness performance.
Open Access
Technical note—optimal procurement in remanufacturing systems with uncertain used-item condition
(INFORMS Inst.for Operations Res.and the Management Sciences, 2023-05-08) Nadar, Emre; Akan, Mustafa; Debo, Laurens; Scheller-Wolf, Alan
We consider a single-product remanufacture-to-order system with multiple uncertain quality levels for used items, random procurement lead times, and lost sales. The quality level of a used item is revealed only after it is acquired and inspected; the remanufacturing cost is lower for a higher-quality item. We model this system as a Markov decision process and seek an optimal policy that specifies when a used item should be procured, whether an arriving demand for the remanufactured product should be satisfied, and which available item should be remanufactured to meet this demand. We characterize the optimal procurement policy as following a new type of strategy: state-dependent noncongestive acquisition. This strategy makes decisions, taking into account the system congestion level measured as the number of available items and their quality levels. We also show that it is always optimal to meet the demand with the highest-quality item among the available ones. We conclude with extensions of our model to limited cases when the used-item condition is known a priori (for two quality levels) and remanufacture-to-stock systems in which the standard push strategy is optimal in the remanufacturing stage. © 2023 INFORMS.
Open Access
Technical note-optimal structural results for assemble-to-order generalized M-Systems
(INFORMS Inst.for Operations Res.and the Management Sciences, 2014) Nadar, E.; Akan, M.; Scheller-Wolf, A.
We consider an assemble-to-order generalized M-system with multiple components and multiple products, batch ordering of components, random lead times, and lost sales. We model the system as an infinite-horizon Markov decision process and seek an optimal policy that specifies when a batch of components should be produced (i.e., inventory replenishment) and whether an arriving demand for each product should be satisfied (i.e., inventory allocation). We characterize optimal inventory replenishment and allocation policies under a mild condition on component batch sizes via a new type of policy: lattice-dependent base stock and lattice-dependent rationing. © 2014 INFORMS.