Identifying Critical Projects via PageRank and Truck Factor

Helge Pfeiffer

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstrakt

Recently, Google’s Open Source team presented the criticality score a metric to assess “influence and importance” of a project in an ecosystem from project specific signals, e.g., number of dependents, commit frequency, etc. The community showed mixed reactions towards the score doubting if it can accurately identify critical projects. We share the community’s doubts and we hypothesize, that a combination of PageRank (PR) and Truck Factor (TF) can more accurately identify critical projects than Google’s current Criticality Score (CS). To verify our hypothesis, we conduct an experiment in which we compute the PR of thousands of projects from various ecosystems, such as, Maven (Java), NPM (JavaScript), PyPI (Python), etc., we compute the TFs of the projects with the highest PR in the respective ecosystems, and we compare these to the scores provided by the Google project. Unlike Google’s CS, our approach identifies projects, such as, six and idna from PyPI, com.typesafe:config from Maven, or tap from NPM, as critical projects with high degree of transitive dependents (highest PR) and low amount of core developers (each of them possessing a TF of one).
OriginalsprogEngelsk
Titel2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)
Antal sider5
ForlagIEEE
Publikationsdato19 maj 2021
Sider41-45
ISBN (Trykt)978-1-6654-2985-6
ISBN (Elektronisk)978-1-7281-8710-5
DOI
StatusUdgivet - 19 maj 2021

Fingeraftryk

Dyk ned i forskningsemnerne om 'Identifying Critical Projects via PageRank and Truck Factor'. Sammen danner de et unikt fingeraftryk.

Citationsformater