Browsing by Subject "Collaborative filtering"

Now showing 1 - 8 of 8

Open Access
Can who-edits-what predict edit survival?
(ACM, 2018-08) Yardım, Ali Batuhan; Maystre, L.; Kristof, V.; Grossglauser, M.
As the number of contributors to online peer-production systems grows, it becomes increasingly important to predict whether the edits that users make will eventually be beneficial to the project. Existing solutions either rely on a user reputation system or consist of a highly specialized predictor that is tailored to a specific peer-production system. In this work, we explore a different point in the solution space that goes beyond user reputation but does not involve any content-based feature of the edits. We view each edit as a game between the editor and the component of the project. We posit that the probability that an edit is accepted is a function of the editor's skill, of the difficulty of editing the component and of a user-component interaction term. Our model is broadly applicable, as it only requires observing data about who makes an edit, what the edit affects and whether the edit survives or not. We apply our model on Wikipedia and the Linux kernel, two examples of large-scale peer-production systems, and we seek to understand whether it can effectively predict edit survival: in both cases, we provide a positive answer. Our approach significantly outperforms those based solely on user reputation and bridges the gap with specialized predictors that use content-based features. It is simple to implement, computationally inexpensive, and in addition it enables us to discover interesting structure in the data.
Open Access
Cluster based collaborative filtering with inverted indexing
(2005) Subakan, Özlem Nurcan
Collectively, a population contains vast amounts of knowledge and modern communication technologies that increase the ease of communication. However, it is not feasible for a single person to aggregate the knowledge of thousands or millions of data and extract useful information from it. Collaborative information systems are attempts to harness the knowledge of a population and to present it in a simple, fast and fair manner. Collaborative filtering has been successfully used in domains where the information content is not easily parse-able and traditional information filtering techniques are difficult to apply. Collaborative filtering works over a database of ratings for the items which are rated by users. The computational complexity of these methods grows linearly with the number of customers which can reach to several millions in typical commercial applications. To address the scalability concern, we have developed an efficient collaborative filtering technique by applying user clustering and using a specific inverted index structure (so called cluster-skipping inverted index structure) that is tailored for clustered environments. We show that the predictive accuracy of the system is comparable with the collaborative filtering algorithms without clustering, whereas the efficiency is far more improved.
Open Access
Cluster searching strategies for collaborative recommendation systems
(2013) Altingovde, I. S.; Subakan, Ö. N.; Ulusoy, Özgür
In-memory nearest neighbor computation is a typical collaborative filtering approach for high recommendation accuracy. However, this approach is not scalable given the huge number of customers and items in typical commercial applications. Cluster-based collaborative filtering techniques can be a remedy for the efficiency problem, but they usually provide relatively lower accuracy figures, since they may become over-generalized and produce less-personalized recommendations. Our research explores an individualistic strategy which initially clusters the users and then exploits the members within clusters, but not just the cluster representatives, during the recommendation generation stage. We provide an efficient implementation of this strategy by adapting a specifically tailored cluster- skipping inverted index structure. Experimental results reveal that the individualistic strategy with the cluster-skipping index is a good compromise that yields high accuracy and reasonable scalability figures. © 2012 Elsevier Ltd. All rights reserved.
Open Access
Location recommendations for new businesses using check-in data
(IEEE, 2016-12) Eravci, Bahaeddin; Bulut, Neslihan; Etemoğlu, C.; Ferhatosmanoğlu, Hakan
Location based social networks (LBSN) and mobile applications generate data useful for location oriented business decisions. Companies can get insights about mobility patterns of potential customers and their daily habits on shopping, dining, etc.To enhance customer satisfaction and increase profitability. We introduce a new problem of identifying neighborhoods with a potential of success in a line of business. After partitioning the city into neighborhoods, based on geographical and social distances, we use the similarities of the neighborhoods to identify specific neighborhoods as candidates for investment for a new business opportunity. We present two solutions for this new problem: i) a probabilistic approach based on Bayesian inference for location selection along with a voting based approximation, and ii) an adaptation of collaborative filtering using the similarity of neighborhoods based on co-existence of related venues and check-in patterns. We use Foursquare user check-in and venue location data to evaluate the performance of the proposed approach. Our experiments show promising results for identifying new opportunities and supporting business decisions using increasingly available check-in data sets. © 2016 IEEE.
Open Access
Minimizing staleness and communication overhead in distributed SGD for collaborative filtering
(IEEE Computer Society, 2023-09-06) Abubaker, Nabil; Caglayan, O.; Karsavuran, M. O.; Aykanat, Cevdet
Distributed asynchronous stochastic gradient descent (ASGD) algorithms that approximate low-rank matrix factorizations for collaborative filtering perform one or more synchronizations per epoch where staleness is reduced with more synchronizations. However, high number of synchronizations would prohibit the scalability of the algorithm. We propose a parallel ASGD algorithm, η-PASGD, for efficiently handling η synchronizations per epoch in a scalable fashion. The proposed algorithm puts an upper limit of KK on η, for a KK-processor system, such that performing Kη=K synchronizations per epoch would eliminate the staleness completely. The rating data used in collaborative filtering are usually represented as sparse matrices. The sparsity allows for reduction in the staleness and communication overhead combinatorially via intelligently distributing the data to processors. We analyze the staleness and the total volume incurred during an epoch of η-PASGD. Following this analysis, we propose a hypergraph partitioning model to encapsulate reducing staleness and volume while minimizing the maximum number of synchronizations required for a stale-free SGD. This encapsulation is achieved with a novel cutsize metric that is realized via a new recursive-bipartitioning-based algorithm. Experiments on up to 512 processors show the importance of the proposed partitioning method in improving staleness, volume, RMSE and parallel runtime.
Open Access
Scaling stratified stochastic gradient descent for distributed matrix completion
(Institute of Electrical and Electronics Engineers, 2023-10-01) Abubaker, Nabil; Karsavuran, M. O.; Aykanat, Cevdet
Stratified SGD (SSGD) is the primary approach for achieving serializable parallel SGD for matrix completion. State-of-the-art parallelizations of SSGD fail to scale due to large communication overhead. During an SGD epoch, these methods send data proportional to one of the dimensions of the rating matrix. We propose a framework for scalable SSGD through significantly reducing the communication overhead via exchanging point-to-point messages utilizing the sparsity of the rating matrix. We provide formulas to represent the essential communication for correctly performing parallel SSGD and we propose a dynamic programming algorithm for efficiently computing them to establish the point-to-point message schedules. This scheme, however, significantly increases the number of messages sent by a processor per epoch from O(K) to (K2) for a K-processor system which might limit the scalability. To remedy this, we propose a Hold-and-Combine strategy to limit the upper-bound on the number of messages sent per processor to O(KlgK). We also propose a hypergraph partitioning model that correctly encapsulates reducing the communication volume. Experimental results show that the framework successfully achieves a scalable distributed SSGD through significantly reducing the communication overhead. Our code is publicly available at: github.com/nfabubaker/CESSGD
Open Access
Software design, implementation, application, and refinement of a Bayesian approach for the assessment of content and user qualities
(2011) Türk, Melihcan
The internet provides unlimited access to vast amounts of information. Technical innovations and internet coverage allow more and more people to supply contents for the web. As a result, there is a great deal of material which is either inaccurate or out-of-date, making it increasingly difficult to find relevant and up-to-date content. In order to solve this problem, recommender systems based on collaborative filtering have been introduced. These systems cluster users based on their past preferences, and suggest relevant contents according to user similarities. Trustbased recommender systems consider the trust level of users in addition to their past preferences, since some users may not be trustworthy in certain categories even though they are trustworthy in others. Content quality levels are important in order to present the most current and relevant contents to users. The study presented here is based on a model which combines the concepts of content quality and user trust. According to this model, the quality level of contents cannot be properly determined without considering the quality levels of evaluators. The model uses a Bayesian approach, which allows the simultaneous co-evaluation of evaluators and contents. The Bayesian approach also allows the calculation of the updated quality values over time. In this thesis, the model is further refined and configurable software is implemented in order to assess the qualities of users and contents on the web. Experiments were performed on a movie data set and the results showed that the Bayesian co-evaluation approach performed more effectively than a classical approach which does not consider user qualities. The approach also succeeded in classifying users according to their expertise level.
Open Access
Towards a quality service layer for Web 2.0
(Springer, 2011-12) Schaal, M.; Davenport, David; Çevik, Ali Hamdi
Despite the help of search engines and Web directories, identifying high quality content becomes increasingly difficult as the Internet gets ever more crowded with information. Prior approaches for filtering and searching content with respect to user-specific preferences do exist: Recommendation engines employ collaborative filtering to support subjective selection, (semi-)automatic page ranking algorithms utilize the hypertext link structure of the World Wide Web to assess page importance, and trust-based systems employ social network analysis to determine the most suitable Web pages. The use of implicit and explicit user feedback, however, is often either ignored or its exploitation is limited to isolated Web sites. We thus propose a quality overlay framework that enables the collection and processing of user-feedback, and the subsequent presentation of quality-enabled content for any Web-site. We present the quality overlay framework, propose an architecture for its realization, and validate our approach by scenarios and a detailed design with sample implementation. © 2011 Springer-Verlag.