BUIR Repository :: Browsing by Subject "PageRank"

Browsing by Subject "PageRank"

Now showing 1 - 8 of 8

Open Access
Accelerating pagerank with a heterogeneous two phase CPU-FPGA algorithm
(Bilkent University, 2020-12) Usta, Furkan
PageRank is a network analysis algorithm that is used to measure the importance of each vertex in a graph. Fundamentally it is a Sparse Matrix-Vector multiplication problem and suffers from the same bottlenecks, such as irregular memory access and low computation-to-communication ratio. Moreover, the existing Field Programmable Gate Array (FPGA) accelerators for PageRank algorithm either require large portions of the graph to be in-memory, which is not suitable for big data applications or cannot fully utilize the memory bandwidth. Recently published Propagation Blocking(PB) methodology improves the performance of PageRank by dividing the execution into binning and accumulation phases. In this paper, we propose a heterogeneous high-throughput implementation of the PB algorithm where the binning phase executed on the FPGA while accumulation is done on a CPU. Unlike prior solutions, our design can handle graphs of any sizes with no need for an on-board FPGA memory. We also show that despite the low frequency of our device, compared to the CPU, by offloading random writes to an accelerator we can still improve the performance significantly. Experimental results show that with our proposed accelerator, PB algorithm can gain up to 40% speedup.
Open Access
Incorporating the surfing behavior of web users into PageRank
(ACM, 2013-10-11) Ashyralyyev, Shatlyk; Cambazoğlu, B. B.; Aykanat, Cevdet
In large-scale commercial web search engines, estimating the importance of a web page is a crucial ingredient in ranking web search results. So far, to assess the importance of web pages, two different types of feedback have been taken into account, independent of each other: the feedback obtained from the hyperlink structure among the web pages (e.g., PageRank) or the web browsing patterns of users (e.g., BrowseRank). Unfortunately, both types of feedback have certain drawbacks. While the former lacks the user preferences and is vulnerable to malicious intent, the latter suffers from sparsity and hence low web coverage. In this work, we combine these two types of feedback under a hybrid page ranking model in order to alleviate the above-mentioned drawbacks. Our empirical results indicate that the proposed model leads to better estimation of page importance according to an evaluation metric that relies on user click feedback obtained from web search query logs. We conduct all of our experiments in a realistic setting, using a very large scale web page collection (around 6.5 billion web pages) and web browsing data (around two billion web page visits). Copyright is held by the owner/author(s).
Open Access
Incorporating the surfing behavior of web users into PageRank
(Bilkent University, 2013) Ashyralyyev, Shatlyk
One of the most crucial factors that determines the effectiveness of a large-scale commercial web search engine is the ranking (i.e., order) in which web search results are presented to the end user. In modern web search engines, the skeleton for the ranking of web search results is constructed using a combination of the global (i.e., query independent) importance of web pages and their relevance to the given search query. In this thesis, we are concerned with the estimation of global importance of web pages. So far, to estimate the importance of web pages, two different types of data sources have been taken into account, independent of each other: hyperlink structure of the web (e.g., PageRank) or surfing behavior of web users (e.g., BrowseRank). Unfortunately, both types of data sources have certain limitations. The hyperlink structure of the web is not very reliable and is vulnerable to bad intent (e.g., web spam), because hyperlinks can be easily edited by the web content creators. On the other hand, the browsing behavior of web users has limitations such as, sparsity and low web coverage. In this thesis, we combine these two types of feedback under a hybrid page importance estimation model in order to alleviate the above-mentioned drawbacks. Our experimental results indicate that the proposed hybrid model leads to better estimation of page importance according to an evaluation metric that uses the user click information obtained from Yahoo! web search engine’s query logs as ground-truth ranking. We conduct all of our experiments in a realistic setting, using a very large scale web page collection (around 6.5 billion web pages) and web browsing data (around two billion web page visits) collected through the Yahoo! toolbar.
Open Access
Site-based partitioning and repartitioning techniques for parallel pagerank computation
(Institute of Electrical and Electronics Engineers, 2011-05) Cevahir, A.; Aykanat, Cevdet; Turk, A.; Cambazoglu, B. B.
The PageRank algorithm is an important component in effective web search. At the core of this algorithm are repeated sparse matrix-vector multiplications where the involved web matrices grow in parallel with the growth of the web and are stored in a distributed manner due to space limitations. Hence, the PageRank computation, which is frequently repeated, must be performed in parallel with high-efficiency and low-preprocessing overhead while considering the initial distributed nature of the web matrices. Our contributions in this work are twofold. We first investigate the application of state-of-the-art sparse matrix partitioning models in order to attain high efficiency in parallel PageRank computations with a particular focus on reducing the preprocessing overhead they introduce. For this purpose, we evaluate two different compression schemes on the web matrix using the site information inherently available in links. Second, we consider the more realistic scenario of starting with an initially distributed data and extend our algorithms to cover the repartitioning of such data for efficient PageRank computation. We report performance results using our parallelization of a state-of-the-art PageRank algorithm on two different PC clusters with 40 and 64 processors. Experiments show that the proposed techniques achieve considerably high speedups while incurring a preprocessing overhead of several iterations (for some instances even less than a single iteration) of the underlying sequential PageRank algorithm. © 2011 IEEE.
Open Access
Spatial mutual information and PageRank-Based contrast enhancement and Quality-Aware relative contrast measure
(Institute of Electrical and Electronics Engineers Inc., 2016) Celik, T.
This paper proposes a novel algorithm for global contrast enhancement using a new definition of spatial mutual information (SMI) of gray levels of an input image and PageRank algorithm. The gray levels are used to represent nodes in PageRank algorithm, and the weights between the nodes are computed according to their dependence and spatial spread over the image, which is quantified by using SMI. The rank vector of gray levels resulted from PageRank algorithm is used in mapping input gray levels to output. The damping factor of the PageRank algorithm is utilized to control the level of perceived global contrast on the output image. Furthermore, a new metric is proposed for image quality-Aware relative contrast measurement between input and output images. Experimental results show that the proposed algorithm consistently produces good results. © 1992-2012 IEEE.
Open Access
Steady-state analysis of Google-like stochastic matrices
(Bilkent University, 2007) Noyan, Gökçe Nil
Many search engines use a two-step process to retrieve from the web pages related to a user’s query. In the first step, traditional text processing is performed to find all pages matching the given query terms. Due to the massive size of the web, this step can result in thousands of retrieved pages. In the second step, many search engines sort the list of retrieved pages according to some ranking criterion to make it manageable for the user. One popular way to create this ranking is to exploit additional information inherent in the web due to its hyperlink structure. One successful and well publicized link-based ranking system is PageRank, the ranking system used by the Google search engine. The dynamically changing matrices reflecting the hyperlink structure of the web and used by Google in ranking pages are not only very large, but they are also sparse, reducible, stochastic matrices with some zero rows. Ranking pages amounts to solving for the steady-state vectors of linear combinations of these matrices with appropriately chosen rank-1 matrices. The most suitable method of choice for this task appears to be the power method. Certain improvements have been obtained using techniques such as quadratic extrapolation and iterative aggregation. In this thesis, we propose iterative methods based on various block partitionings, including those with triangular diagonal blocks obtained using cutsets, for the computation of the steady-state vector of such stochastic matrices. The proposed iterative methods together with power and quadratically extrapolated power methods are coded into a software tool. Experimental results on benchmark matrices show that it is possible to recommend Gauss-Seidel for easier web problems and block Gauss-Seidel with partitionings based on a block upper triangular form in the remaining problems, although it takes about twice as much memory as quadratically extrapolated power method.
Open Access
Steady-state analysis of google-like stochastic matrices with block iterative methods
(Kent State University, 2011) Dayar, T.; Noyan, G. N.
A Google-like matrix is a positive stochastic matrix given by a convex combination of a sparse, nonnegative matrix and a particular rank one matrix. Google itself uses the steady-state vector of a large matrix of this form to help order web pages in a search engine. We investigate the computation of the steady-state vectors of such matrices using block iterative methods. The block partitionings considered include those based on block triangular form and those having triangular diagonal blocks obtained using cutsets. Numerical results show that block Gauss-Seidel with partitionings based on block triangular form is most often the best approach. However, there are cases in which a block partitioning with triangular diagonal blocks is better, and the Gauss-Seidel method is usually competitive. Copyright © 2011, Kent State University.
Open Access
Web-site-based partitioning techniques for efficient parallelization of the PageRank computation
(Bilkent University, 2006) Cevahir, Ali
Web search engines use ranking techniques to order Web pages in query results. PageRank is an important technique, which orders Web pages according to the linkage structure of the Web. The efficiency of the PageRank computation is important since the constantly evolving nature of the Web requires this computation to be repeated many times. PageRank computation includes repeated iterative sparse matrix-vector multiplications. Due to the enormous size of the Web matrix to be multiplied, PageRank computations are usually carried out on parallel systems. However, efficiently parallelizing PageRank is not an easy task, because of the irregular sparsity pattern of the Web matrix. Graph and hypergraphpartitioning-based techniques are widely used for efficiently parallelizing matrixvector multiplications. Recently, a hypergraph-partitioning-based decomposition technique for fast parallel computation of PageRank is proposed. This technique aims to minimize the communication overhead of the parallel matrix-vector multiplication. However, the proposed technique has a high prepropocessing time, which makes the technique impractical. In this work, we propose 1D (rowwise and columnwise) and 2D (fine-grain and checkerboard) decomposition models using web-site-based graph and hypergraph-partitioning techniques. Proposed models minimize the communication overhead of the parallel PageRank computations with a reasonable preprocessing time. The models encapsulate not only the matrix-vector multiplication, but the overall iterative algorithm. Conducted experiments show that the proposed models achieve fast PageRank computation with low preprocessing time, compared with those in the literature.

Browsing by Subject "PageRank"

Results Per Page

Sort Options