Personal PageRank is a vector over all the pages on the web that measures their importance to your interests. Computing personal PageRank in the obvious way, however, requires disclosing your interests to a third-party, the search engine. This paper shows how motivated users can compute personal PageRank on their home PCs.

Abstract

In this paper, we consider the problem of calculating fast and accurate approximations
to the personalized PageRank score ([8, 16]) of a webpage. We focus on techniques
to improve speed by limiting the amount of webgraph data we need to access.
PageRank scores are mainly used for ranking purposes, and generally only the scores
exceeding a given threshold are relevant. In practice, and relative to the size of the web,
only a small number of pages have a non-negligible personalized PageRank score. We
capitalize on this property of the PageRank score to reduce the amount of webgraph
data needed for computation. Our algorithms provide both the approximation to the
personalized PageRank score as well as guidance in using only the necessary information
— and therefore sensibly reduce not only the computational cost of the algorithm, but
also the memory and memory bandwidth requirements.

Our algorithms assume that we have random access to a sparse representation of
a webgraph (perhaps by crawling the web if necessary); some of them further require
knowledge of a hostgraph ([19]). All of them provide a fast approximate calculation of
the personalized PageRank score for pages where the score exceeds a given threshold.
We report experiments with these algorithms on webgraphs of up to 118 million
pages and prove theoretical approximation bound for all. We conclude by proposing
an application scenario of personalized Web Search that inspired and motivated our
work.

Available

Final publication
Internet Mathematics Website
Approximating Personalized PageRank with minimal usage of web graph data
(not yet available)
Project Euclid
Approximating Personalized PageRank with minimal usage of web graph data
Personal site
Approximating Personalized PageRank with minimal usage of web graph data
 
Early draft available
Personal site
Approximating Personalized PageRank with minimal usage of web graph data

Bibtex

@ARTICLE{gleich2007-approximate,
  author = {David Gleich and Marzia Polito},
  title = {Approximating Personalized {PageRank} with Minimal Use of Webgraph
Data}, journal = {Internet Mathematics}, year = {2007}, volume = {3}, pages = {257--294}, number = {3}, month = {December}, key = {GP2007}, keywords = {pagerank, hostrank}, owner = {David Gleich}, timestamp = {2006.11.27} }