Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models?

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

We examine the syntactic properties of BabyLM corpus, and age-groups within CHILDES. While we find that CHILDES does not exhibit strong syntactic differentiation by age, we show that the syntactic knowledge about the training data can be helpful in interpreting model performance on linguistic tasks. For curriculum learning, we explore developmental and several alternative cognitively inspired curriculum approaches. We find that some curricula help with reading tasks, but the main performance improvement come from using the subset of syntactically categorizable data, rather than the full noisy corpus.
OriginalsprogEngelsk
TitelProceedings of the First BabyLM Workshop
RedaktørerLucas Charpentier, Leshem Choshen, Ryan Cotterell, Mustafa Omer Gul, Michael Y. Hu, Jing Liu, Jaap Jumelet, Tal Linzen, Aaron Mueller, Candace Ross, Raj Sanjay Shah, Alex Warstadt, Ethan Gotlieb Wilcox, Adina Williams
Antal sider13
UdgivelsesstedSuzhou, China
ForlagAssociation for Computational Linguistics
Publikationsdato1 nov. 2025
Sider288-300
ISBN (Trykt)TODO
DOI
StatusUdgivet - 1 nov. 2025

Fingeraftryk

Dyk ned i forskningsemnerne om 'Do Syntactic Categories Help in Developmentally Motivated Curriculum Learning for Language Models?'. Sammen danner de et unikt fingeraftryk.

Citationsformater