ITU

Benchmarking API Costs of Network Sampling Strategies

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Standard

Benchmarking API Costs of Network Sampling Strategies. / Coscia, Michele; Rossi, Luca.

2018 IEEE International Conference on Big Data, BigData 2018. IEEE, 2018.

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Harvard

Coscia, M & Rossi, L 2018, Benchmarking API Costs of Network Sampling Strategies. in 2018 IEEE International Conference on Big Data, BigData 2018. IEEE, 2018 IEEE International Conference on Big Data (Big Data), Seattle, United States, 10/12/2018. https://doi.org/10.1109/BigData.2018.8622486

APA

Coscia, M., & Rossi, L. (2018). Benchmarking API Costs of Network Sampling Strategies. In 2018 IEEE International Conference on Big Data, BigData 2018 IEEE. https://doi.org/10.1109/BigData.2018.8622486

Vancouver

Coscia M, Rossi L. Benchmarking API Costs of Network Sampling Strategies. In 2018 IEEE International Conference on Big Data, BigData 2018. IEEE. 2018 https://doi.org/10.1109/BigData.2018.8622486

Author

Coscia, Michele ; Rossi, Luca. / Benchmarking API Costs of Network Sampling Strategies. 2018 IEEE International Conference on Big Data, BigData 2018. IEEE, 2018.

Bibtex

@inproceedings{3c8b5a9425be4acbb274db7f30399704,
title = "Benchmarking API Costs of Network Sampling Strategies",
abstract = "Online social media contain valuable quantitative and qualitative data, necessary to advance complex social systems studies. However, these data vaults are often behind a wall: the owners of the media sites dictate what, when, and how much data can be collected via a mandatory interface (called Application Program Interface: API). To work with such restrictions, network scientists have designed sampling methods, which do not require a full crawl of the data to obtain a representative picture of the underlying social network. However, such sampling methods are usually evaluated only on one dimension: what strategy allowsfor the extraction of a sample whose statistical properties are closest to the original network? In this paper we go beyond this view, by creating a benchmark that tests the performance of a method in a multifaceted way. When evaluating a network sampling algorithm, we take into account the API policies and thebudget a researcher has to explore the network. By doing so, we show that some methods which are considered to perform poorly actually can perform well with tighter budgets, or with different API policies. Our results show that the decision of which sampling algorithm to use is not monodimensional. It is not enough to ask which method returns the most accurate sample, one has also to consider through which API constraints it has to go, and how much it can spend on the crawl.",
keywords = "network analysis, Social media, network sampling",
author = "Michele Coscia and Luca Rossi",
year = "2018",
month = dec
day = "14",
doi = "10.1109/BigData.2018.8622486",
language = "English",
isbn = "978-1-5386-5034-9 ",
booktitle = "2018 IEEE International Conference on Big Data, BigData 2018",
publisher = "IEEE",
address = "United States",
note = "2018 IEEE International Conference on Big Data (Big Data), BigData ; Conference date: 10-12-2018 Through 13-12-2018",
url = "http://cci.drexel.edu/bigdata/bigdata2018/",

}

RIS

TY - GEN

T1 - Benchmarking API Costs of Network Sampling Strategies

AU - Coscia, Michele

AU - Rossi, Luca

N1 - Conference code: 6

PY - 2018/12/14

Y1 - 2018/12/14

N2 - Online social media contain valuable quantitative and qualitative data, necessary to advance complex social systems studies. However, these data vaults are often behind a wall: the owners of the media sites dictate what, when, and how much data can be collected via a mandatory interface (called Application Program Interface: API). To work with such restrictions, network scientists have designed sampling methods, which do not require a full crawl of the data to obtain a representative picture of the underlying social network. However, such sampling methods are usually evaluated only on one dimension: what strategy allowsfor the extraction of a sample whose statistical properties are closest to the original network? In this paper we go beyond this view, by creating a benchmark that tests the performance of a method in a multifaceted way. When evaluating a network sampling algorithm, we take into account the API policies and thebudget a researcher has to explore the network. By doing so, we show that some methods which are considered to perform poorly actually can perform well with tighter budgets, or with different API policies. Our results show that the decision of which sampling algorithm to use is not monodimensional. It is not enough to ask which method returns the most accurate sample, one has also to consider through which API constraints it has to go, and how much it can spend on the crawl.

AB - Online social media contain valuable quantitative and qualitative data, necessary to advance complex social systems studies. However, these data vaults are often behind a wall: the owners of the media sites dictate what, when, and how much data can be collected via a mandatory interface (called Application Program Interface: API). To work with such restrictions, network scientists have designed sampling methods, which do not require a full crawl of the data to obtain a representative picture of the underlying social network. However, such sampling methods are usually evaluated only on one dimension: what strategy allowsfor the extraction of a sample whose statistical properties are closest to the original network? In this paper we go beyond this view, by creating a benchmark that tests the performance of a method in a multifaceted way. When evaluating a network sampling algorithm, we take into account the API policies and thebudget a researcher has to explore the network. By doing so, we show that some methods which are considered to perform poorly actually can perform well with tighter budgets, or with different API policies. Our results show that the decision of which sampling algorithm to use is not monodimensional. It is not enough to ask which method returns the most accurate sample, one has also to consider through which API constraints it has to go, and how much it can spend on the crawl.

KW - network analysis

KW - Social media

KW - network sampling

U2 - 10.1109/BigData.2018.8622486

DO - 10.1109/BigData.2018.8622486

M3 - Article in proceedings

SN - 978-1-5386-5034-9

BT - 2018 IEEE International Conference on Big Data, BigData 2018

PB - IEEE

T2 - 2018 IEEE International Conference on Big Data (Big Data)

Y2 - 10 December 2018 through 13 December 2018

ER -

ID: 83565382