RELEAF: an algorithm for learning and exploiting relevance

Tekin, C.; Schaar, Mihaela van der

RELEAF: an algorithm for learning and exploiting relevance

dc.citation.epage	15	en_US
dc.citation.spage	1	en_US
dc.contributor.author	Tekin, C.	en_US
dc.contributor.author	Schaar, Mihaela van der	en_US
dc.date.accessioned	2019-02-13T06:54:00Z
dc.date.available	2019-02-13T06:54:00Z
dc.date.issued	2015-02	en_US
dc.department	Department of Electrical and Electronics Engineering	en_US
dc.description.abstract	Recommender systems, medical diagnosis, network security, etc., require on-going learning and decision-making in real time. These -- and many others -- represent perfect examples of the opportunities and difficulties presented by Big Data: the available information often arrives from a variety of sources and has diverse features so that learning from all the sources may be valuable but integrating what is learned is subject to the curse of dimensionality. This paper develops and analyzes algorithms that allow efficient learning and decision-making while avoiding the curse of dimensionality. We formalize the information available to the learner/decision-maker at a particular time as a context vector which the learner should consider when taking actions. In general the context vector is very high dimensional, but in many settings, the most relevant information is embedded into only a few relevant dimensions. If these relevant dimensions were known in advance, the problem would be simple -- but they are not. Moreover, the relevant dimensions may be different for different actions. Our algorithm learns the relevant dimensions for each action, and makes decisions based in what it has learned. Formally, we build on the structure of a contextual multi-armed bandit by adding and exploiting a relevance relation. We prove a general regret bound for our algorithm whose time order depends only on the maximum number of relevant dimensions among all the actions, which in the special case where the relevance relation is single-valued (a function), reduces to O~(T2(2√−1)); in the absence of a relevance relation, the best known contextual bandit algorithms achieve regret O~(T(D+1)/(D+2)), where D is the full dimension of the context vector.	en_US
dc.identifier.uri	http://hdl.handle.net/11693/49374
dc.language.iso	English	en_US
dc.publisher	Cornell University	en_US
dc.source.title	IEEE Journal of Selected Topics in Signal Processing	en_US
dc.subject	Contextual bandits	en_US
dc.subject	Regret	en_US
dc.subject	Dimensionality reduction	en_US
dc.subject	Learning relevance	en_US
dc.subject	Recommender systems	en_US
dc.subject	Online learning	en_US
dc.subject	Active learning	en_US
dc.title	RELEAF: an algorithm for learning and exploiting relevance	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: RELEAF_An_Algorithm_for_Learning.pdf
Size:: 311.5 KB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Scholarly Publications - Electrical and Electronics Engineering