What constitutes Software? An Empirical, Descriptive Study of Artifacts

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

View graph of relations

The term software is ubiquitous, however, it does not seem as if we as a community have a clear understanding of what software actually is. Imprecise definitions of software do not help other professions, in particular those acquiring and sourcing software from third-parties, when deciding what precisely are potential deliverables. In this paper we investigate which artifacts constitute software by analyzing 23715 repositories from Github, we categorize the found artifacts into high-level categories, such as, code, data, and documentation (and into 19 more concrete categories) and we can confirm the notion of others that software is more than just source code or programs, for which the term is often used synonymously.
With this work we provide an empirical study of more than 13 million artifacts, we provide a taxonomy of artifact categories, and we can conclude that software most often consists of variously distributed amounts of code in different forms, such as source code, binary code, scripts, etc., data, such as configuration files, images, databases, etc., and documentation, such as user documentation, licenses, etc.
Original languageEnglish
Title of host publicationProceedings of the 17th Working Conference on Mining Software Repositories
Publication date2020
Publication statusPublished - 2020

ID: 85191446