Aging wireless bandits: regret analysis and order-optimal learning algorithm

Atay, Eray Unsal; Kadota, Igor; Modiano, Eytan

Aging wireless bandits: regret analysis and order-optimal learning algorithm

buir.contributor.author	Atay, Eray Unsal
dc.citation.epage	8	en_US
dc.citation.spage	1	en_US
dc.contributor.author	Atay, Eray Unsal
dc.contributor.author	Kadota, Igor
dc.contributor.author	Modiano, Eytan
dc.coverage.spatial	Philadelphia, PA, USA	en_US
dc.date.accessioned	2022-02-03T10:06:26Z
dc.date.available	2022-02-03T10:06:26Z
dc.date.issued	2021-11-13
dc.description	Conference Name: 2021 19th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)	en_US
dc.description	Date of Conference: 18-21 October 2021	en_US
dc.description.abstract	We consider a single-hop wireless network with sources transmitting time-sensitive information to the destination over multiple unreliable channels. Packets from each source are generated according to a stochastic process with known statistics and the state of each wireless channel (ON/OFF) varies according to a stochastic process with unknown statistics. The reliability of the wireless channels is to be learned through observation. At every time-slot, the learning algorithm selects a single pair (source, channel) and the selected source attempts to transmit its packet via the selected channel. The probability of a successful transmission to the destination depends on the reliability of the selected channel. The goal of the learning algorithm is to minimize the Age-of-Information (AoI) in the network over T time-slots. To analyze its performance, we introduce the notion of AoI-regret, which is the difference between the expected cumulative AoI of the learning algorithm under consideration and the expected cumulative AoI of a genie algorithm that knows the reliability of the channels a priori. The AoI-regret captures the penalty incurred by having to learn the statistics of the channels over the T time-slots. The results are two-fold: first, we consider learning algorithms that employ well-known solutions to the stochastic multi-armed bandit problem (such as ϵ-Greedy, Upper Confidence Bound, and Thompson Sampling) and show that their AoI-regret scales as Θ(log T); second, we develop a novel learning algorithm and show that it has O(1) regret. To the best of our knowledge, this is the first learning algorithm with bounded AoI-regret.	en_US
dc.identifier.doi	10.23919/WiOpt52861.2021.9589673	en_US
dc.identifier.eisbn	978-3-903176-37-9
dc.identifier.isbn	978-1-6654-3292-4
dc.identifier.uri	http://hdl.handle.net/11693/76985
dc.language.iso	English	en_US
dc.publisher	IEEE	en_US
dc.relation.isversionof	https://dx.doi.org/10.23919/WiOpt52861.2021.9589673	en_US
dc.source.title	International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)	en_US
dc.subject	Age of Information	en_US
dc.subject	Wireless networks	en_US
dc.subject	Regret	en_US
dc.subject	Multi-armed bandits	en_US
dc.subject	Learning	en_US
dc.title	Aging wireless bandits: regret analysis and order-optimal learning algorithm	en_US
dc.type	Conference Paper	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Aging_Wireless_Bandits_Regret_Analysis_and_Order-Optimal_Learning_Algorithm.pdf
Size:: 1.05 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.69 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Publications - Electrical and Electronics Engineering