Abstract
Influence-driven diffusion of information is a fundamental
process in social networks. Learning the latent variables of
such process, i.e., the influence strength along each link, is a
central question towards understanding the structure and
function of complex networks, modeling information cascades,
and developing applications such as viral marketing.
Motivated by modern microblogging platforms, such as
twitter, in this paper we study the problem of learning
influence probabilities in a data-stream scenario, in which
the network topology is relatively stable and the challenge
of a learning algorithm is to keep up with a continuous
stream of tweets using a small amount of time and memory.
Our contribution is a number of randomized approximation
algorithms, categorized according to the available
space (superlinear, linear, and sublinear in the number of
nodes n) and according to dierent models (landmark and
sliding window). Among several results, we show that we
can learn influence probabilities with one pass over the data,
using O(n log n) space, in both the landmark model and the
sliding-window model, and we further show that our algorithm
is within a logarithmic factor of optimal.
For truly large graphs, when one needs to operate with
sublinear space, we show that we can still learn influence
probabilities in one pass, assuming that we restrict our attention
to the most active users.
Our thorough experimental evaluation on large social
graph demonstrates that the empirical performance of our
algorithms agrees with that predicted by the theory.
process in social networks. Learning the latent variables of
such process, i.e., the influence strength along each link, is a
central question towards understanding the structure and
function of complex networks, modeling information cascades,
and developing applications such as viral marketing.
Motivated by modern microblogging platforms, such as
twitter, in this paper we study the problem of learning
influence probabilities in a data-stream scenario, in which
the network topology is relatively stable and the challenge
of a learning algorithm is to keep up with a continuous
stream of tweets using a small amount of time and memory.
Our contribution is a number of randomized approximation
algorithms, categorized according to the available
space (superlinear, linear, and sublinear in the number of
nodes n) and according to dierent models (landmark and
sliding window). Among several results, we show that we
can learn influence probabilities with one pass over the data,
using O(n log n) space, in both the landmark model and the
sliding-window model, and we further show that our algorithm
is within a logarithmic factor of optimal.
For truly large graphs, when one needs to operate with
sublinear space, we show that we can still learn influence
probabilities in one pass, assuming that we restrict our attention
to the most active users.
Our thorough experimental evaluation on large social
graph demonstrates that the empirical performance of our
algorithms agrees with that predicted by the theory.
Originalsprog | Engelsk |
---|---|
Titel | KDD '13 Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining |
Antal sider | 9 |
Forlag | Association for Computing Machinery |
Publikationsdato | 2013 |
Sider | 275-283 |
ISBN (Trykt) | 978-1-4503-2174-7 |
DOI | |
Status | Udgivet - 2013 |