Projects per year
Abstract
With the COVID-19 pandemic raging world-wide since the beginning of the 2020 decade,the need for monitoring systems to track relevant information on social media is vitally important. This paper describes our submission to the WNUT-2020 Task 2: Identification of informative COVID-19 English Tweets. We investigate the effectiveness for a variety of classification models, and found that domain-specific pre-trained BERT models lead to the best performance. On top of this, we attempt a variety of ensembling strategies, but these at-tempts did not lead to further improvements.Our final best model, the standalone CT-BERT model, proved to be highly competitive, leading to a shared first place in the shared task.Our results emphasize the importance of do-main and task-related pre-training.
Original language | English |
---|---|
Title of host publication | Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020) |
Publisher | Association for Computational Linguistics |
Publication date | Nov 2020 |
Pages | 331-336 |
Publication status | Published - Nov 2020 |
Keywords
- COVID-19 Pandemic
- Social Media Monitoring
- Informative Tweets
- Pre-trained BERT Models
- Domain-Specific Classification
Fingerprint
Dive into the research topics of 'NLP North at WNUT-2020 Task 2: Pre-training versus Ensembling for Detection of Informative COVID-19 English Tweets'. Together they form a unique fingerprint.Projects
- 1 Finished
-
Multi-Task Sequence Labeling Under Adverse Conditions
Plank, B. (PI) & van der Goot, R. (CoI)
01/04/2019 → 31/08/2020
Project: Other