Noise Corrected Sampling of Online Social Networks

Research output: Journal Article or Conference Article in JournalJournal articleResearchpeer-review

View graph of relations

In this article, we propose a new method to perform topological network sampling. Topological network sampling is a process for extracting a subset of nodes and edges from a network, such that analyses on the sample provide results and conclusions comparable to the ones they would return if run on whole structure. We need network sampling because the largest online network datasets are accessed through low-throughput application programming interface (API) systems, rendering the collection of the whole network infeasible. Our method is inspired by the literature on network backboning, specifically the noise-corrected backbone. We select the next node to explore by following the edge we identify as the one providing the largest information gain, given the topology of the sample explored so far. We evaluate our method against the most commonly used sampling methods. We do so in a realistic framework, considering a wide array of network topologies, network analysis, and features of API systems. There is no method that can provide the best sample in all possible scenarios, thus in our results section, we show the cases in which our method performs best and the cases in which it performs worst. Overall, the noise-corrected network sampling performs well: it has the best rank average among the tested methods across a wide range of applications.
Original languageEnglish
Article number29
JournalACM Transactions on Knowledge Discovery from Data
Issue number2
Pages (from-to)1-21
Number of pages21
Publication statusPublished - 1 Mar 2021

    Research areas

  • network sampling, network backboning, social media, social networks

ID: 85885067