Algorithms and regret bounds for multi-objective contextual bandits with similarity information

Turğay, Eralp

Algorithms and regret bounds for multi-objective contextual bandits with similarity information

buir.advisor	Tekin, Cem
dc.contributor.author	Turğay, Eralp
dc.date.accessioned	2019-01-14T13:25:02Z
dc.date.available	2019-01-14T13:25:02Z
dc.date.copyright	2019-01
dc.date.issued	2019-01
dc.date.submitted	2019-01-14
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references (leaves 71-76).	en_US
dc.description.abstract	Contextual bandit algorithms have been shown to be e ective in solving sequential decision making problems under uncertain environments, ranging from cognitive radio networks to recommender systems to medical diagnosis. Many of these real world applications involve multiple and possibly con icting objectives. In this thesis, we consider an extension of contextual bandits called multi-objective contextual bandits with similarity information. Unlike single-objective contextual bandits, in which the learner obtains a random scalar reward for each arm it selects, in the multi-objective contextual bandits, the learner obtains a random reward vector, where each component of the reward vector corresponds to one of the objectives and the distribution of the reward depends on the context that is provided to the learner at the beginning of each round. For this setting, rst, we propose a new multi-objective contextual multi-armed bandit problem with similarity information that has two objectives, where one of the objectives dominates the other objective. Here, the goal of the learner is to maximize its total reward in the non-dominant objective while ensuring that it maximizes its total reward in the dominant objective. Then, we propose the multi-objective contextual multi-armed bandit algorithm (MOC-MAB), and de ne two performance measures: the 2-dimensional (2D) regret and the Pareto regret. We show that both the 2D regret and the Pareto regret of MOC-MAB are sublinear in the number of rounds. We also evaluate the performance of MOC-MAB in synthetic and real-world datasets. In the next problem, we consider a multi-objective contextual bandit problem with an arbitrary number of objectives and a highdimensional, possibly uncountable arm set, which is endowed with the similarity information. We propose an online learning algorithm called Pareto Contextual Zooming (PCZ), and prove that it achieves sublinear in the number of rounds Pareto regret, which is near-optimal.	en_US
dc.description.statementofresponsibility	by Eralp Turğay.	en_US
dc.format.extent	xii, 81 leaves : charts (some color) ; 30 cm.	en_US
dc.identifier.itemid	B159517
dc.identifier.uri	http://hdl.handle.net/11693/48242
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Online Learning	en_US
dc.subject	Contextual Bandits	en_US
dc.subject	Multi-Objective Bandits	en_US
dc.subject	Dominant Objective	en_US
dc.subject	Multi-Dimensional Regret	en_US
dc.subject	Pareto Regret	en_US
dc.subject	2D Regret	en_US
dc.subject	Similarity Information	en_US
dc.title	Algorithms and regret bounds for multi-objective contextual bandits with similarity information	en_US
dc.title.alternative	Benzerlik bilgisine sahip çok amaçlı bağlamsal haydut problemlerinde pişmanlık sınırları ve algoritmalar	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Electrical and Electronic Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: EralpTurgay_10229562.pdf
Size:: 1.5 MB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Graduate School of Engineering and Science