Incorporating the surfing behavior of web users into PageRank
Author(s)
Advisor
Aykanat, CevdetDate
2013Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
178
views
views
117
downloads
downloads
Abstract
One of the most crucial factors that determines the effectiveness of a large-scale
commercial web search engine is the ranking (i.e., order) in which web search
results are presented to the end user. In modern web search engines, the skeleton
for the ranking of web search results is constructed using a combination of the
global (i.e., query independent) importance of web pages and their relevance to
the given search query. In this thesis, we are concerned with the estimation of
global importance of web pages. So far, to estimate the importance of web pages,
two different types of data sources have been taken into account, independent of
each other: hyperlink structure of the web (e.g., PageRank) or surfing behavior
of web users (e.g., BrowseRank). Unfortunately, both types of data sources have
certain limitations. The hyperlink structure of the web is not very reliable and
is vulnerable to bad intent (e.g., web spam), because hyperlinks can be easily
edited by the web content creators. On the other hand, the browsing behavior of
web users has limitations such as, sparsity and low web coverage.
In this thesis, we combine these two types of feedback under a hybrid page importance
estimation model in order to alleviate the above-mentioned drawbacks.
Our experimental results indicate that the proposed hybrid model leads to better
estimation of page importance according to an evaluation metric that uses the
user click information obtained from Yahoo! web search engine’s query logs as
ground-truth ranking. We conduct all of our experiments in a realistic setting,
using a very large scale web page collection (around 6.5 billion web pages) and
web browsing data (around two billion web page visits) collected through the
Yahoo! toolbar.