Skip to main navigation Skip to search Skip to main content

Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models?

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

We examine the syntactic properties of BabyLM corpus, and age-groups within CHILDES. While we find that CHILDES does not exhibit strong syntactic differentiation by age, we show that the syntactic knowledge about the training data can be helpful in interpreting model performance on linguistic tasks. For curriculum learning, we explore developmental and several alternative cognitively inspired curriculum approaches. We find that some curricula help with reading tasks, but the main performance improvement come from using the subset of syntactically categorizable data, rather than the full noisy corpus.
Original languageEnglish
Title of host publicationProceedings of the First BabyLM Workshop
EditorsLucas Charpentier, Leshem Choshen, Ryan Cotterell, Mustafa Omer Gul, Michael Y. Hu, Jing Liu, Jaap Jumelet, Tal Linzen, Aaron Mueller, Candace Ross, Raj Sanjay Shah, Alex Warstadt, Ethan Gotlieb Wilcox, Adina Williams
Number of pages13
Place of PublicationSuzhou, China
PublisherAssociation for Computational Linguistics
Publication date1 Nov 2025
Pages288-300
ISBN (Print)TODO
DOIs
Publication statusPublished - 1 Nov 2025
EventWorkshop on BabyLM: Accelerating Language Modeling Research with Cognitively Plausible Data - Suzhou, China
Duration: 5 Nov 20259 Nov 2025
Conference number: 1
https://aclanthology.org/events/babylm-2025/?utm

Conference

ConferenceWorkshop on BabyLM
Number1
Country/TerritoryChina
CitySuzhou
Period05/11/202509/11/2025
Internet address

Keywords

  • Syntactic development
  • CHILDES corpus
  • BabyLM corpus
  • Curriculum learning
  • Language model evaluation

Fingerprint

Dive into the research topics of 'Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models?'. Together they form a unique fingerprint.

Cite this