Algorithms and regret bounds for multi-objective contextual bandits with similarity information

buir.advisorTekin, Cem
dc.contributor.authorTurğay, Eralp
dc.date.accessioned2019-01-14T13:25:02Z
dc.date.available2019-01-14T13:25:02Z
dc.date.copyright2019-01
dc.date.issued2019-01
dc.date.submitted2019-01-14
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (M.S.): Bilkent University, Department of Electrical and Electronics Engineering, İhsan Doğramacı Bilkent University, 2019.en_US
dc.descriptionIncludes bibliographical references (leaves 71-76).en_US
dc.description.abstractContextual bandit algorithms have been shown to be e ective in solving sequential decision making problems under uncertain environments, ranging from cognitive radio networks to recommender systems to medical diagnosis. Many of these real world applications involve multiple and possibly con icting objectives. In this thesis, we consider an extension of contextual bandits called multi-objective contextual bandits with similarity information. Unlike single-objective contextual bandits, in which the learner obtains a random scalar reward for each arm it selects, in the multi-objective contextual bandits, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives and the distribution of the reward depends on the context that is provided to the learner at the beginning of each round. For this setting, rst, we propose a new multi-objective contextual multi-armed bandit problem with similarity information that has two objectives, where one of the objectives dominates the other objective. Here, the goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its total reward in the dominant objective. Then, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and de ne two performance measures: the 2-dimensional (2D) regret and the Pareto regret. We show that both the 2D regret and the Pareto regret of MOC-MAB are sublinear in the number of rounds. We also evaluate the performance of MOC-MAB in synthetic and real-world datasets. In the next problem, we consider a multi-objective contextual bandit problem with an arbitrary number of objectives and a highdimensional, possibly uncountable arm set, which is endowed with the similarity information. We propose an online learning algorithm called Pareto Contextual Zooming (PCZ), and prove that it achieves sublinear in the number of rounds Pareto regret, which is near-optimal.en_US
dc.description.degreeM.S.en_US
dc.description.statementofresponsibilityby Eralp Turğay.en_US
dc.format.extentxii, 81 leaves : charts (some color) ; 30 cm.en_US
dc.identifier.itemidB159517
dc.identifier.urihttp://hdl.handle.net/11693/48242
dc.language.isoEnglishen_US
dc.publisherBilkent Universityen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectOnline Learningen_US
dc.subjectContextual Banditsen_US
dc.subjectMulti-Objective Banditsen_US
dc.subjectDominant Objectiveen_US
dc.subjectMulti-Dimensional Regreten_US
dc.subjectPareto Regreten_US
dc.subject2D Regreten_US
dc.subjectSimilarity Informationen_US
dc.titleAlgorithms and regret bounds for multi-objective contextual bandits with similarity informationen_US
dc.title.alternativeBenzerlik bilgisine sahip çok amaçlı bağlamsal haydut problemlerinde pişmanlık sınırları ve algoritmalaren_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
EralpTurgay_10229562.pdf
Size:
1.5 MB
Format:
Adobe Portable Document Format
Description:
Full printable version
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: