Skip to main navigation Skip to search Skip to main content

What Are They Filtering Out? An Experimental Benchmark of Filtering Strategies for Harm Reduction in Pretraining Datasets

  • University of Turin

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

Data filtering strategies are a crucial component to develop safe Large Language Models (LLM), since they support the removal of harmful contents from pretraining datasets. There is a lack of research on the actual impact of these strategies on vulnerable groups to discrimination, though, and their effectiveness has not been yet systematically addressed. In this paper we present a benchmark study of data filtering strategies for harm reduction aimed at providing a systematic evaluation on these approaches. We provide an overview 55 technical reports of English LMs and LLMs to identify the existing filtering strategies in literature and implement an experimental setting to test their impact against vulnerable groups. Our results show that the positive impact that strategies have in reducing harmful contents from documents has the side effect of increasing the underrepresentation of vulnerable groups to discrimination in datasets.
Original languageEnglish
Title of host publicationProceedings of the AAAI Conference on Artificial Intelligence : AAAI Special Track on AI for Social Impact II
Number of pages11
Volume40
PublisherAAAI Press
Publication date2026
Edition46
Pages39303-39313
ISBN (Electronic)978-1-57735-906-7
DOIs
Publication statusPublished - 2026
EventAAAI Conference on Artificial Intelligence - Singapore EXPO, Singapore
Duration: 20 Jan 202627 Jan 2026
Conference number: 40

Conference

ConferenceAAAI Conference on Artificial Intelligence
Number40
LocationSingapore EXPO
Country/TerritorySingapore
Period20/01/202627/01/2026

Keywords

  • Data filtering strategies
  • Large Language Models
  • Harmful content filtering
  • Discrimination and bias in datasets
  • Underrepresentation of vulnerable groups

Cite this