PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs

  • Oskar van der Wal
  • , Pietro Lesci
  • , Max Müller-Eberstein
  • , Naomi Saphra
  • , Hailey Schoelkopf
  • , Willem Zuidema
  • , Stella Biderman

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

The stability of language model pre-training and its effects on downstream performance are still understudied. Prior work shows that the training process can yield significantly different results in response to slight variations in initial conditions, e.g., the random seed. Crucially, the research community still lacks sufficient resources and tools to systematically investigate pre-training stability, particularly for decoder-only language models. We introduce the PolyPythias, a set of 45 new training runs for the Pythia model suite: 9 new seeds across 5 model sizes, from 14M to 410M parameters, resulting in about 7k new checkpoints that we release. Using these new 45 training runs, in addition to the 5 already available, we study the effects of different initial conditions determined by the seed—i.e., parameters' initialisation and data order—on (i) downstream performance, (ii) learned linguistic representations, and (iii) emergence of training phases. In addition to common scaling behaviours, our analyses generally reveal highly consistent training dynamics across both model sizes and initial conditions. Further, the new seeds for each model allow us to identify outlier training runs and delineate their characteristics. Our findings show the potential of using these methods to predict training stability.
Original languageEnglish
Title of host publicationProceedings of the Thirteenth International Conference on Learning Representations (ICLR 2025)
Number of pages25
Publication dateMay 2025
Pages1-25
DOIs
Publication statusPublished - May 2025
EventLearning Representations - Singapore, Singapore, Singapore
Duration: 24 Apr 202528 Apr 2025
Conference number: 13
https://www.iclr.cc/Conferences/2025

Conference

ConferenceLearning Representations
Number13
LocationSingapore
Country/TerritorySingapore
CitySingapore
Period24/04/202528/04/2025
Internet address

Keywords

  • pre-training stability
  • seed variability
  • decoder-only language models
  • training dynamics
  • linguistic representations

Fingerprint

Dive into the research topics of 'PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs'. Together they form a unique fingerprint.

Cite this