Identifying Critical Projects via PageRank and Truck Factor

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review


Recently, Google’s Open Source team presented the criticality score a metric to assess “influence and importance” of a project in an ecosystem from project specific signals, e.g., number of dependents, commit frequency, etc. The community showed mixed reactions towards the score doubting if it can accurately identify critical projects. We share the community’s doubts and we hypothesize, that a combination of PageRank (PR) and Truck Factor (TF) can more accurately identify critical projects than Google’s current Criticality Score (CS). To verify our hypothesis, we conduct an experiment in which we compute the PR of thousands of projects from various ecosystems, such as, Maven (Java), NPM (JavaScript), PyPI (Python), etc., we compute the TFs of the projects with the highest PR in the respective ecosystems, and we compare these to the scores provided by the Google project. Unlike Google’s CS, our approach identifies projects, such as, six and idna from PyPI, com.typesafe:config from Maven, or tap from NPM, as critical projects with high degree of transitive dependents (highest PR) and low amount of core developers (each of them possessing a TF of one).
Titel2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)
Antal sider5
Publikationsdato19 maj 2021
ISBN (Trykt)978-1-6654-2985-6
ISBN (Elektronisk)978-1-7281-8710-5
StatusUdgivet - 19 maj 2021


Dyk ned i forskningsemnerne om 'Identifying Critical Projects via PageRank and Truck Factor'. Sammen danner de et unikt fingeraftryk.