Identifying Critical Projects via PageRank and Truck Factor

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

Recently, Google’s Open Source team presented the criticality score a metric to assess “influence and importance” of a project in an ecosystem from project specific signals, e.g., number of dependents, commit frequency, etc. The community showed mixed reactions towards the score doubting if it can accurately identify critical projects. We share the community’s doubts and we hypothesize, that a combination of PageRank (PR) and Truck Factor (TF) can more accurately identify critical projects than Google’s current Criticality Score (CS). To verify our hypothesis, we conduct an experiment in which we compute the PR of thousands of projects from various ecosystems, such as, Maven (Java), NPM (JavaScript), PyPI (Python), etc., we compute the TFs of the projects with the highest PR in the respective ecosystems, and we compare these to the scores provided by the Google project. Unlike Google’s CS, our approach identifies projects, such as, six and idna from PyPI, com.typesafe:config from Maven, or tap from NPM, as critical projects with high degree of transitive dependents (highest PR) and low amount of core developers (each of them possessing a TF of one).
Original languageEnglish
Title of host publication2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)
Number of pages5
PublisherIEEE
Publication date19 May 2021
Pages41-45
ISBN (Print)978-1-6654-2985-6
ISBN (Electronic)978-1-7281-8710-5
DOIs
Publication statusPublished - 19 May 2021

Keywords

  • Criticality Score
  • PageRank
  • Truck Factor
  • Ecosystem Analysis
  • Open Source Metrics

Fingerprint

Dive into the research topics of 'Identifying Critical Projects via PageRank and Truck Factor'. Together they form a unique fingerprint.

Cite this