Skip to main navigation Skip to search Skip to main content

NLP North at WNUT-2020 Task 2: Pre-training versus Ensembling for Detection of Informative COVID-19 English Tweets

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

With the COVID-19 pandemic raging world-wide since the beginning of the 2020 decade,the need for monitoring systems to track relevant information on social media is vitally important. This paper describes our submission to the WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets. We investigate the effectiveness for a variety of classification models, and found that domain-specific pre-trained BERT models lead to the best performance. On top of this, we attempt a variety of ensembling strategies, but these at-tempts did not lead to further improvements.Our final best model, the standalone CT-BERT model, proved to be highly competitive, leading to a shared first place in the shared task.Our results emphasize the importance of do-main and task-related pre-training.
Original languageEnglish
Title of host publicationProceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
PublisherAssociation for Computational Linguistics
Publication dateNov 2020
Pages331-336
Publication statusPublished - Nov 2020
EventThe Sixth Workshop on Noisy User-generated Text - Online
Duration: 19 Nov 202019 Nov 2020
Conference number: 6th

Conference

ConferenceThe Sixth Workshop on Noisy User-generated Text
Number6th
LocationOnline
Period19/11/202019/11/2020

Keywords

  • COVID-19 Pandemic
  • Social Media Monitoring
  • Informative Tweets
  • Pre-trained BERT Models
  • Domain-Specific Classification

Fingerprint

Dive into the research topics of 'NLP North at WNUT-2020 Task 2: Pre-training versus Ensembling for Detection of Informative COVID-19 English Tweets'. Together they form a unique fingerprint.

Cite this