The biobjective multiarmed bandit: learning approximate lexicographic optimal allocations

buir.contributor.authorTekin, Cem
dc.citation.epage1080en_US
dc.citation.issueNumber2en_US
dc.citation.spage1065en_US
dc.citation.volumeNumber27en_US
dc.contributor.authorTekin, Cemen_US
dc.date.accessioned2020-02-24T07:45:50Z
dc.date.available2020-02-24T07:45:50Z
dc.date.issued2019en_US
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.description.abstractWe consider a biobjective sequential decision-making problem where an allocation (arm) is called ϵ lexicographic optimal if its expected reward in the first objective is at most ϵ smaller than the highest expected reward, and its expected reward in the second objective is at least the expected reward of a lexicographic optimal arm. The goal of the learner is to select arms that are ϵ lexicographic optimal as much as possible without knowing the arm reward distributions beforehand. For this problem, we first show that the learner’s goal is equivalent to minimizing the ϵ lexicographic regret, and then, propose a learning algorithm whose ϵ lexicographic gap-dependent regret is bounded and gap-independent regret is sublinear in the number of rounds with high probability. Then, we apply the proposed model and algorithm for dynamic rate and channel selection in a cognitive radio network with imperfect channel sensing. Our results show that the proposed algorithm is able to learn the approximate lexicographic optimal rate–channel pair that simultaneously minimizes the primary user interference and maximizes the secondary user throughput.en_US
dc.description.provenanceSubmitted by Evrim Ergin (eergin@bilkent.edu.tr) on 2020-02-24T07:45:50Z No. of bitstreams: 1 The_biobjective_multiarmed_bandit_Learning_approximate_lexicographic_optimal_allocations.pdf: 320706 bytes, checksum: c09270d588bd6bbc69fe4f7fcb73428e (MD5)en
dc.description.provenanceMade available in DSpace on 2020-02-24T07:45:50Z (GMT). No. of bitstreams: 1 The_biobjective_multiarmed_bandit_Learning_approximate_lexicographic_optimal_allocations.pdf: 320706 bytes, checksum: c09270d588bd6bbc69fe4f7fcb73428e (MD5) Previous issue date: 2019-03en
dc.identifier.doi10.3906/elk-1806-221en_US
dc.identifier.issn1300-0632
dc.identifier.urihttp://hdl.handle.net/11693/53477
dc.language.isoEnglishen_US
dc.publisherTÜBİTAKen_US
dc.relation.isversionofhttps://dx.doi.org/10.3906/elk-1806-221en_US
dc.source.titleTurkish Journal of Electrical Engineering and Computer Sciencesen_US
dc.subjectMultiarmed banditen_US
dc.subjectBiobjective learningen_US
dc.subjectLexicographic optimalityen_US
dc.subjectDynamic rate and channel selectionen_US
dc.subjectCognitive radio networksen_US
dc.titleThe biobjective multiarmed bandit: learning approximate lexicographic optimal allocationsen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
The_biobjective_multiarmed_bandit_Learning_approximate_lexicographic_optimal_allocations.pdf
Size:
313.19 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: