Abstract
We examine the syntactic properties of BabyLM corpus, and age-groups within CHILDES. While we find that CHILDES does not exhibit strong syntactic differentiation by age, we show that the syntactic knowledge about the training data can be helpful in interpreting model performance on linguistic tasks. For curriculum learning, we explore developmental and several alternative cognitively inspired curriculum approaches. We find that some curricula help with reading tasks, but the main performance improvement come from using the subset of syntactically categorizable data, rather than the full noisy corpus.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the First BabyLM Workshop |
| Editors | Lucas Charpentier, Leshem Choshen, Ryan Cotterell, Mustafa Omer Gul, Michael Y. Hu, Jing Liu, Jaap Jumelet, Tal Linzen, Aaron Mueller, Candace Ross, Raj Sanjay Shah, Alex Warstadt, Ethan Gotlieb Wilcox, Adina Williams |
| Number of pages | 13 |
| Place of Publication | Suzhou, China |
| Publisher | Association for Computational Linguistics |
| Publication date | 1 Nov 2025 |
| Pages | 288-300 |
| ISBN (Print) | TODO |
| DOIs | |
| Publication status | Published - 1 Nov 2025 |
| Event | Workshop on BabyLM: Accelerating Language Modeling Research with Cognitively Plausible Data - Suzhou, China Duration: 5 Nov 2025 → 9 Nov 2025 Conference number: 1 https://aclanthology.org/events/babylm-2025/?utm |
Conference
| Conference | Workshop on BabyLM |
|---|---|
| Number | 1 |
| Country/Territory | China |
| City | Suzhou |
| Period | 05/11/2025 → 09/11/2025 |
| Internet address |
Keywords
- Syntactic development
- CHILDES corpus
- BabyLM corpus
- Curriculum learning
- Language model evaluation
Fingerprint
Dive into the research topics of 'Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models?'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver