Tekin, CemElahi, SepehrVan Der Schaar, M.2021-03-092021-03-0920201939-1374http://hdl.handle.net/11693/75898Recommending applications (apps) to improve health or educational outcomes requires long-term planning and adaptation based on the user feedback, as it is imperative to recommend the right app at the right time to improve engagement and benefit. We model the challenging task of app recommendation for these specific categories of apps-or alike-using a new reinforcement learning method referred to as episodic multi-armed bandit (eMAB). In eMAB, the learner recommends apps to individual users and observes their interactions with the recommendations on a weekly basis. It then uses this data to maximize the total payoff of all users by learning to recommend specific apps. Since computing the optimal recommendation sequence is intractable, as a benchmark, we define an oracle that sequentially recommends apps to maximize the expected immediate gain. Then, we propose our online learning algorithm, named FeedBack Adaptive Learning (FeedBAL), and prove that its regret with respect to the benchmark increases logarithmically in expectation. We demonstrate the effectiveness of FeedBAL on recommending mental health apps based on data from an app suite and show that it results in a substantial increase in the number of app sessions compared with episodic versions of ϵn -greedy, Thompson sampling, and collaborative filtering methods.EnglishRecommender systemsApplication recommendationOnline learningMulti-armed banditFeedback adaptive learning for medical and educational application recommendationArticle10.1109/TSC.2020.3037224