Browsing by Subject "Recommender systems"

Now showing 1 - 7 of 7

Open Access
Cluster based collaborative filtering with inverted indexing
(2005) Subakan, Özlem Nurcan
Collectively, a population contains vast amounts of knowledge and modern communication technologies that increase the ease of communication. However, it is not feasible for a single person to aggregate the knowledge of thousands or millions of data and extract useful information from it. Collaborative information systems are attempts to harness the knowledge of a population and to present it in a simple, fast and fair manner. Collaborative filtering has been successfully used in domains where the information content is not easily parse-able and traditional information filtering techniques are difficult to apply. Collaborative filtering works over a database of ratings for the items which are rated by users. The computational complexity of these methods grows linearly with the number of customers which can reach to several millions in typical commercial applications. To address the scalability concern, we have developed an efficient collaborative filtering technique by applying user clustering and using a specific inverted index structure (so called cluster-skipping inverted index structure) that is tailored for clustered environments. We show that the predictive accuracy of the system is comparable with the collaborative filtering algorithms without clustering, whereas the efficiency is far more improved.
Open Access
Diversity and novelty in web search, recommender systems and data streams
(Association for Computing Machinery, 2014-02) Santos, R. L. T.; Castells, P.; Altingovde, I. S.; Can, Fazlı
This tutorial aims to provide a unifying account of current research on diversity and novelty in the domains of web search, recommender systems, and data stream processing.
Open Access
Feedback adaptive learning for medical and educational application recommendation
(IEEE, 2020) Tekin, Cem; Elahi, Sepehr; Van Der Schaar, M.
Recommending applications (apps) to improve health or educational outcomes requires long-term planning and adaptation based on the user feedback, as it is imperative to recommend the right app at the right time to improve engagement and benefit. We model the challenging task of app recommendation for these specific categories of apps-or alike-using a new reinforcement learning method referred to as episodic multi-armed bandit (eMAB). In eMAB, the learner recommends apps to individual users and observes their interactions with the recommendations on a weekly basis. It then uses this data to maximize the total payoff of all users by learning to recommend specific apps. Since computing the optimal recommendation sequence is intractable, as a benchmark, we define an oracle that sequentially recommends apps to maximize the expected immediate gain. Then, we propose our online learning algorithm, named FeedBack Adaptive Learning (FeedBAL), and prove that its regret with respect to the benchmark increases logarithmically in expectation. We demonstrate the effectiveness of FeedBAL on recommending mental health apps based on data from an app suite and show that it results in a substantial increase in the number of app sessions compared with episodic versions of ϵn -greedy, Thompson sampling, and collaborative filtering methods.
Open Access
Minimizing staleness and communication overhead in distributed SGD for collaborative filtering
(IEEE Computer Society, 2023-09-06) Abubaker, Nabil; Caglayan, O.; Karsavuran, M. O.; Aykanat, Cevdet
Distributed asynchronous stochastic gradient descent (ASGD) algorithms that approximate low-rank matrix factorizations for collaborative filtering perform one or more synchronizations per epoch where staleness is reduced with more synchronizations. However, high number of synchronizations would prohibit the scalability of the algorithm. We propose a parallel ASGD algorithm, η-PASGD, for efficiently handling η synchronizations per epoch in a scalable fashion. The proposed algorithm puts an upper limit of KK on η, for a KK-processor system, such that performing Kη=K synchronizations per epoch would eliminate the staleness completely. The rating data used in collaborative filtering are usually represented as sparse matrices. The sparsity allows for reduction in the staleness and communication overhead combinatorially via intelligently distributing the data to processors. We analyze the staleness and the total volume incurred during an epoch of η-PASGD. Following this analysis, we propose a hypergraph partitioning model to encapsulate reducing staleness and volume while minimizing the maximum number of synchronizations required for a stale-free SGD. This encapsulation is achieved with a novel cutsize metric that is realized via a new recursive-bipartitioning-based algorithm. Experiments on up to 512 processors show the importance of the proposed partitioning method in improving staleness, volume, RMSE and parallel runtime.
Open Access
RELEAF: an algorithm for learning and exploiting relevance
(Cornell University, 2015-02) Tekin, C.; Schaar, Mihaela van der
Recommender systems, medical diagnosis, network security, etc., require on-going learning and decision-making in real time. These -- and many others -- represent perfect examples of the opportunities and difficulties presented by Big Data: the available information often arrives from a variety of sources and has diverse features so that learning from all the sources may be valuable but integrating what is learned is subject to the curse of dimensionality. This paper develops and analyzes algorithms that allow efficient learning and decision-making while avoiding the curse of dimensionality. We formalize the information available to the learner/decision-maker at a particular time as a context vector which the learner should consider when taking actions. In general the context vector is very high dimensional, but in many settings, the most relevant information is embedded into only a few relevant dimensions. If these relevant dimensions were known in advance, the problem would be simple -- but they are not. Moreover, the relevant dimensions may be different for different actions. Our algorithm learns the relevant dimensions for each action, and makes decisions based in what it has learned. Formally, we build on the structure of a contextual multi-armed bandit by adding and exploiting a relevance relation. We prove a general regret bound for our algorithm whose time order depends only on the maximum number of relevant dimensions among all the actions, which in the special case where the relevance relation is single-valued (a function), reduces to O~(T2(2√−1)); in the absence of a relevance relation, the best known contextual bandit algorithms achieve regret O~(T(D+1)/(D+2)), where D is the full dimension of the context vector.
Open Access
Scaling stratified stochastic gradient descent for distributed matrix completion
(Institute of Electrical and Electronics Engineers, 2023-10-01) Abubaker, Nabil; Karsavuran, M. O.; Aykanat, Cevdet
Stratified SGD (SSGD) is the primary approach for achieving serializable parallel SGD for matrix completion. State-of-the-art parallelizations of SSGD fail to scale due to large communication overhead. During an SGD epoch, these methods send data proportional to one of the dimensions of the rating matrix. We propose a framework for scalable SSGD through significantly reducing the communication overhead via exchanging point-to-point messages utilizing the sparsity of the rating matrix. We provide formulas to represent the essential communication for correctly performing parallel SSGD and we propose a dynamic programming algorithm for efficiently computing them to establish the point-to-point message schedules. This scheme, however, significantly increases the number of messages sent by a processor per epoch from O(K) to (K2) for a K-processor system which might limit the scalability. To remedy this, we propose a Hold-and-Combine strategy to limit the upper-bound on the number of messages sent per processor to O(KlgK). We also propose a hypergraph partitioning model that correctly encapsulates reducing the communication volume. Experimental results show that the framework successfully achieves a scalable distributed SSGD through significantly reducing the communication overhead. Our code is publicly available at: github.com/nfabubaker/CESSGD
Open Access
Software design, implementation, application, and refinement of a Bayesian approach for the assessment of content and user qualities
(2011) Türk, Melihcan
The internet provides unlimited access to vast amounts of information. Technical innovations and internet coverage allow more and more people to supply contents for the web. As a result, there is a great deal of material which is either inaccurate or out-of-date, making it increasingly difficult to find relevant and up-to-date content. In order to solve this problem, recommender systems based on collaborative filtering have been introduced. These systems cluster users based on their past preferences, and suggest relevant contents according to user similarities. Trustbased recommender systems consider the trust level of users in addition to their past preferences, since some users may not be trustworthy in certain categories even though they are trustworthy in others. Content quality levels are important in order to present the most current and relevant contents to users. The study presented here is based on a model which combines the concepts of content quality and user trust. According to this model, the quality level of contents cannot be properly determined without considering the quality levels of evaluators. The model uses a Bayesian approach, which allows the simultaneous co-evaluation of evaluators and contents. The Bayesian approach also allows the calculation of the updated quality values over time. In this thesis, the model is further refined and configurable software is implemented in order to assess the qualities of users and contents on the web. Experiments were performed on a movie data set and the results showed that the Bayesian co-evaluation approach performed more effectively than a classical approach which does not consider user qualities. The approach also succeeded in classifying users according to their expertise level.