Browsing by Subject "Reinforcement Learning"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item Open Access A dynamic DRR scheduling algorithm for flow level QOS assurances for elastic traffic(2006) Kurugöl, SılaBest effort service, used to transport the Internet traffic today, does not provide any QoS assurances. Intserv, DiffServ and recently proposed Proportional Diff- Serv architectures have been introduced to provide QoS. In these architectures, some applications with more stringent QoS requirement such as real time traffic are prioritized, while elastic flows share the remaining bandwidth. As opposed to the well studied differential treatment of delay and/or loss sensitive traffic to satisfy QoS constraints, our aim is satisfy QoS requirements of elastic traffic at the flow level. We intend to maintain different average rate levels for different classes of elastic traffic. For differential treatment of elastic flows, a dynamic variant of Deficit Round Robin Scheduler (DRR) is used as oppose to a FIFO queue. In this scheduling algorithm, all classes are served in a round robin fashion in proportion to their weights at each round. The main difference of our scheduler from the original DRR scheduler is that, we update the weights, which are called quantums of the scheduler at each round in response to the feedback from the network, which is in terms of the rate of phantom connection sharing capacity fairly with the other flows in the same queue. According to the rate measured in the last time interval, the controller updates the weights in proportion with the bandwidth requirements of each class to satisfy their QoS requirements, while the remaining bandwidth will be used by the best effort traffic. In order to find an optimal policy for the controller a simulation-based learning algorithm is performed using a processor sharing model of TCP, then the resultant policies are applied to a more realistic scenario to solve Dynamic DRR scheduling problem through ns-2 simulations.Item Open Access Online learning in structured Markov decision processes(2017-07) Akbarzadeh, NimaThis thesis proposes three new multi-armed bandit problems, in which the learner proceeds in a sequence of rounds where each round is a Markov Decision Process (MDP). The learner's goal is to maximize its cumulative reward without any a priori knowledge on the state transition probabilities. The rst problem considers an MDP with sorted states and a continuation action that moves the learner to an adjacent state; and a terminal action that moves the learner to a terminal state (goal or dead-end state). In this problem, a round ends and the next round starts when a terminal state is reached, and the aim of the learner in each round is to reach the goal state. First, the structure of the optimal policy is derived. Then, the regret of the learner with respect to an oracle, who takes optimal actions in each round is de ned, and a learning algorithm that exploits the structure of the optimal policy is proposed. Finally, it is shown that the regret either increases logarithmically over rounds or becomes bounded. In the second problem, we investigate the personalization of a clinical treatment. This process is modeled as a goal-oriented MDP with dead-end states. Moreover, the state transition probabilities of the MDP depends on the context of the patients. An algorithm that uses the rule of optimism in face of uncertainty is proposed to maximize the number of rounds in which the goal state is reached. In the third problem, we propose an online learning algorithm for optimal execution in the limit order book of a nancial asset. Given a certain amount of shares to sell and an allocated time to complete the transaction, the proposed algorithm dynamically learns the optimal number of shares to sell at each time slot of the allocated time. We model this problem as an MDP, and derive the form of the optimal policy.Item Open Access Using reinforcement learning for dynamic link sharing problems under signaling constraints(2003) Çelik, NuriIn static link sharing system, users are assigned a fixed bandwidth share of the link capacity irrespective of whether these users are active or not. On the other hand, dynamic link sharing refers to the process of dynamically allocating bandwidth to each active user based on the instantaneous utilization of the link. As an example, dynamic link sharing combined with rate adaptation capability of multimedia applications provides a novel quality of service (QoS) framework for HFC and broadband wireless networks. Frequent adjustment of the allocated bandwidth in dynamic link sharing, yields a scalability issue in the form of a significant amount of message distribution and processing power (i.e. signaling) in the shared link system. On the other hand, if the rate of applications is adjusted once for the highest loaded traffic conditions, a significant amount of bandwidth may be wasted depending on the actual traffic load. There is then a need for an optimal dynamic link sharing system that takes into account the tradeoff between signaling scalability and bandwidth efficiency. In this work, we introduce a Markov decision framework for the dynamic link sharing system, when the desired signaling rate is imposed as a constraint. Reinforcement learning methodology is adopted for the solution of this Markov decision problem, and the results demonstrate that the proposed method provides better bandwidth efficiency without violating the signaling rate requirement compared to other heuristics.Item Open Access Wavelength assignment in optical burst switching networks using neuro-dynamic programming(2003) Keçeli, FeyzaAll-optical networks are the most promising architecture for building large-size, hugebandwidth transport networks that are required for carrying the exponentially increasing Internet traffic. Among the existing switching paradigms in the literature, the optical burst switching is intended to leverage the attractive properties of optical communications, and at the same time, take into account its limitations. One of the major problems in optical burst switching is high blocking probability that results from one-way reservation protocol used. In this thesis, this problem is solved in wavelength domain by using smart wavelength assignment algorithms. Two heuristic wavelength assignment algorithms prioritizing available wavelengths according to reservation tables at the network nodes are proposed. The major contribution of the thesis is the formulation of the wavelength assignment problem as a continuous-time, average cost dynamic programming problem and its solution based on neuro-dynamic programming. Experiments are done over various traffic loads, burst lengths, and number of wavelength converters with a pool structure. The simulation results show that the wavelength assignment algorithms proposed for optical burst switching networks in the thesis perform better than the wavelength assignment algorithms in the literature that are developed for circuit-switched optical networks.