Noise Corrected Sampling of Online Social Networks
Research output: Journal Article or Conference Article in Journal › Journal article › Research › peer-review
In this article, we propose a new method to perform topological network sampling. Topological network sampling is a process for extracting a subset of nodes and edges from a network, such that analyses on the sample provide results and conclusions comparable to the ones they would return if run on whole structure. We need network sampling because the largest online network datasets are accessed through low-throughput application programming interface (API) systems, rendering the collection of the whole network infeasible. Our method is inspired by the literature on network backboning, specifically the noise-corrected backbone. We select the next node to explore by following the edge we identify as the one providing the largest information gain, given the topology of the sample explored so far. We evaluate our method against the most commonly used sampling methods. We do so in a realistic framework, considering a wide array of network topologies, network analysis, and features of API systems. There is no method that can provide the best sample in all possible scenarios, thus in our results section, we show the cases in which our method performs best and the cases in which it performs worst. Overall, the noise-corrected network sampling performs well: it has the best rank average among the tested methods across a wide range of applications.