MaChAmp at SemEval-2023 Tasks 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Intermediate Training on an Uncurated Collection of Datasets

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

To improve the ability of language models to handle Natural Language Processing(NLP) tasks and intermediate step of pre-training has recently beenintroduced. In this setup, one takes a pre-trained language model, trains it ona (set of) NLP dataset(s), and then finetunes it for a target task. It isknown that the selection of relevant transfer tasks is important, but recentlysome work has shown substantial performance gains by doing intermediatetraining on a very large set of datasets. Most previous work uses generativelanguage models or only focuses on one or a couple of tasks and uses acarefully curated setup. We compare intermediate training with one or manytasks in a setup where the choice of datasets is more arbitrary; we use allSemEval 2023 text-based tasks. We reach performance improvements for most taskswhen using intermediate training. Gains are higher when doing intermediatetraining on single tasks than all tasks if the right transfer taskis identified. Dataset smoothing and heterogeneous batching did not lead torobust gains in our setup.
Original languageEnglish
Title of host publicationProceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
PublisherAssociation for Computational Linguistics
Publication dateJul 2023
Pages230-245
DOIs
Publication statusPublished - Jul 2023

Keywords

  • Natural Language Processing (NLP)
  • Intermediate Training
  • Transfer Learning
  • Language Models
  • SemEval 2023

Fingerprint

Dive into the research topics of 'MaChAmp at SemEval-2023 Tasks 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Intermediate Training on an Uncurated Collection of Datasets'. Together they form a unique fingerprint.

Cite this