ITU

Identifying Critical Projects via PageRank and Truck Factor

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Standard

Identifying Critical Projects via PageRank and Truck Factor. / Pfeiffer, Helge.

2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 2021. p. 41-45.

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Harvard

Pfeiffer, H 2021, Identifying Critical Projects via PageRank and Truck Factor. in 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, pp. 41-45. https://doi.org/10.1109/MSR52588.2021.00017

APA

Pfeiffer, H. (2021). Identifying Critical Projects via PageRank and Truck Factor. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) (pp. 41-45). IEEE. https://doi.org/10.1109/MSR52588.2021.00017

Vancouver

Pfeiffer H. Identifying Critical Projects via PageRank and Truck Factor. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE. 2021. p. 41-45 https://doi.org/10.1109/MSR52588.2021.00017

Author

Pfeiffer, Helge. / Identifying Critical Projects via PageRank and Truck Factor. 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 2021. pp. 41-45

Bibtex

@inproceedings{2054fcf7213e4d4dbe14e38e1e508e0f,
title = "Identifying Critical Projects via PageRank and Truck Factor",
abstract = "Recently, Google{\textquoteright}s Open Source team presented the criticality score a metric to assess “influence and importance” of a project in an ecosystem from project specific signals, e.g., number of dependents, commit frequency, etc. The community showed mixed reactions towards the score doubting if it can accurately identify critical projects. We share the community{\textquoteright}s doubts and we hypothesize, that a combination of PageRank (PR) and Truck Factor (TF) can more accurately identify critical projects than Google{\textquoteright}s current Criticality Score (CS). To verify our hypothesis, we conduct an experiment in which we compute the PR of thousands of projects from various ecosystems, such as, Maven (Java), NPM (JavaScript), PyPI (Python), etc., we compute the TFs of the projects with the highest PR in the respective ecosystems, and we compare these to the scores provided by the Google project. Unlike Google{\textquoteright}s CS, our approach identifies projects, such as, six and idna from PyPI, com.typesafe:config from Maven, or tap from NPM, as critical projects with high degree of transitive dependents (highest PR) and low amount of core developers (each of them possessing a TF of one).",
author = "Helge Pfeiffer",
year = "2021",
month = may,
day = "19",
doi = "10.1109/MSR52588.2021.00017",
language = "English",
isbn = "978-1-6654-2985-6",
pages = "41--45",
booktitle = "2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)",
publisher = "IEEE",
address = "United States",

}

RIS

TY - GEN

T1 - Identifying Critical Projects via PageRank and Truck Factor

AU - Pfeiffer, Helge

PY - 2021/5/19

Y1 - 2021/5/19

N2 - Recently, Google’s Open Source team presented the criticality score a metric to assess “influence and importance” of a project in an ecosystem from project specific signals, e.g., number of dependents, commit frequency, etc. The community showed mixed reactions towards the score doubting if it can accurately identify critical projects. We share the community’s doubts and we hypothesize, that a combination of PageRank (PR) and Truck Factor (TF) can more accurately identify critical projects than Google’s current Criticality Score (CS). To verify our hypothesis, we conduct an experiment in which we compute the PR of thousands of projects from various ecosystems, such as, Maven (Java), NPM (JavaScript), PyPI (Python), etc., we compute the TFs of the projects with the highest PR in the respective ecosystems, and we compare these to the scores provided by the Google project. Unlike Google’s CS, our approach identifies projects, such as, six and idna from PyPI, com.typesafe:config from Maven, or tap from NPM, as critical projects with high degree of transitive dependents (highest PR) and low amount of core developers (each of them possessing a TF of one).

AB - Recently, Google’s Open Source team presented the criticality score a metric to assess “influence and importance” of a project in an ecosystem from project specific signals, e.g., number of dependents, commit frequency, etc. The community showed mixed reactions towards the score doubting if it can accurately identify critical projects. We share the community’s doubts and we hypothesize, that a combination of PageRank (PR) and Truck Factor (TF) can more accurately identify critical projects than Google’s current Criticality Score (CS). To verify our hypothesis, we conduct an experiment in which we compute the PR of thousands of projects from various ecosystems, such as, Maven (Java), NPM (JavaScript), PyPI (Python), etc., we compute the TFs of the projects with the highest PR in the respective ecosystems, and we compare these to the scores provided by the Google project. Unlike Google’s CS, our approach identifies projects, such as, six and idna from PyPI, com.typesafe:config from Maven, or tap from NPM, as critical projects with high degree of transitive dependents (highest PR) and low amount of core developers (each of them possessing a TF of one).

U2 - 10.1109/MSR52588.2021.00017

DO - 10.1109/MSR52588.2021.00017

M3 - Article in proceedings

SN - 978-1-6654-2985-6

SP - 41

EP - 45

BT - 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)

PB - IEEE

ER -

ID: 86173863