Spring til hovednavigation Spring til søgning Spring til hovedindhold

Benchmarking API Costs of Network Sampling Strategies

    Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

    Abstract

    Online social media contain valuable quantitative and qualitative data, necessary to advance complex social systems studies. However, these data vaults are often behind a wall: the owners of the media sites dictate what, when, and how much data can be collected via a mandatory interface (called Application Program Interface: API). To work with such restrictions, network scientists have designed sampling methods, which do not require a full crawl of the data to obtain a representative picture of the underlying social network. However, such sampling methods are usually evaluated only on one dimension: what strategy allows
    for the extraction of a sample whose statistical properties are closest to the original network? In this paper we go beyond this view, by creating a benchmark that tests the performance of a method in a multifaceted way. When evaluating a network sampling algorithm, we take into account the API policies and the
    budget a researcher has to explore the network. By doing so, we show that some methods which are considered to perform poorly actually can perform well with tighter budgets, or with different API policies. Our results show that the decision of which sampling algorithm to use is not monodimensional. It is not enough to ask which method returns the most accurate sample, one has also to consider through which API constraints it has to go, and how much it can spend on the crawl.
    OriginalsprogEngelsk
    Titel2018 IEEE International Conference on Big Data, BigData 2018
    ForlagIEEE
    Publikationsdato14 dec. 2018
    ISBN (Trykt)978-1-5386-5034-9
    ISBN (Elektronisk)978-1-5386-5035-6
    DOI
    StatusUdgivet - 14 dec. 2018
    Begivenhed2018 IEEE International Conference on Big Data (Big Data) - Westin Seattle, 1900 5th Avenue., Seattle, USA
    Varighed: 10 dec. 201813 dec. 2018
    Konferencens nummer: 6
    http://cci.drexel.edu/bigdata/bigdata2018/

    Konference

    Konference2018 IEEE International Conference on Big Data (Big Data)
    Nummer6
    LokationWestin Seattle, 1900 5th Avenue.
    Land/OmrådeUSA
    BySeattle
    Periode10/12/201813/12/2018
    Internetadresse

    Emneord

    • network analysis
    • Social media
    • network sampling

    Fingeraftryk

    Dyk ned i forskningsemnerne om 'Benchmarking API Costs of Network Sampling Strategies'. Sammen danner de et unikt fingeraftryk.

    Citationsformater