Steady-state analysis of Google-like stochastic matrices
Author
Noyan, Gökçe Nil
Advisor
Dayar, Tuğrul
Date
2007Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
84
views
views
21
downloads
downloads
Abstract
Many search engines use a two-step process to retrieve from the web pages related
to a user’s query. In the first step, traditional text processing is performed to find
all pages matching the given query terms. Due to the massive size of the web,
this step can result in thousands of retrieved pages. In the second step, many
search engines sort the list of retrieved pages according to some ranking criterion
to make it manageable for the user. One popular way to create this ranking is
to exploit additional information inherent in the web due to its hyperlink structure.
One successful and well publicized link-based ranking system is PageRank,
the ranking system used by the Google search engine. The dynamically changing
matrices reflecting the hyperlink structure of the web and used by Google
in ranking pages are not only very large, but they are also sparse, reducible,
stochastic matrices with some zero rows. Ranking pages amounts to solving for
the steady-state vectors of linear combinations of these matrices with appropriately
chosen rank-1 matrices. The most suitable method of choice for this task
appears to be the power method. Certain improvements have been obtained using
techniques such as quadratic extrapolation and iterative aggregation. In this thesis,
we propose iterative methods based on various block partitionings, including
those with triangular diagonal blocks obtained using cutsets, for the computation
of the steady-state vector of such stochastic matrices. The proposed iterative
methods together with power and quadratically extrapolated power methods are
coded into a software tool. Experimental results on benchmark matrices show
that it is possible to recommend Gauss-Seidel for easier web problems and block
Gauss-Seidel with partitionings based on a block upper triangular form in the
remaining problems, although it takes about twice as much memory as quadratically
extrapolated power method.
Keywords
GooglePageRank
Stochastic matrices
Power method
Quadratic extrapolation
Block iterative methods
Aggregation
Partitionings
Cutsets
Triangular blocks