Abstract
Finding informative COVID-19 posts in a stream of tweets is very useful to monitor health-related updates. Prior work focused on a balanced data setup and on English, but in- formative tweets are rare, and English is only one of the many languages spoken in the world. In this work, we introduce a new dataset of 5,000 tweets for finding informative COVID- 19 tweets for Danish. In contrast to prior work, which balances the label distribution, we model the problem by keeping its natural dis- tribution. We examine how well a simple prob- abilistic model and a convolutional neural net- work (CNN) perform on this task. We find a weighted CNN to work well but it is sensi- tive to embedding and hyperparameter choices. We hope the contributed dataset is a starting point for further work in this direction.
| Originalsprog | Engelsk |
|---|---|
| Titel | Proceedings of the 2021 EMNLP Workshop W-NUT: The Seventh Workshop on Noisy User-generated Text |
| Forlag | Association for Computational Linguistics |
| Publikationsdato | 2021 |
| Sider | 11–19 |
| Status | Udgivet - 2021 |
| Begivenhed | The Seventh Workshop on Noisy User-generated Text - VIRTUAL Varighed: 11 nov. 2021 → 11 nov. 2021 Konferencens nummer: 7 |
Konference
| Konference | The Seventh Workshop on Noisy User-generated Text |
|---|---|
| Nummer | 7 |
| By | VIRTUAL |
| Periode | 11/11/2021 → 11/11/2021 |
Emneord
- Informative Tweets
- COVID-19
- Danish Language
- Natural Distribution
- Convolutional Neural Network (CNN)
Fingeraftryk
Dyk ned i forskningsemnerne om 'Finding the needle in a haystack: Extraction of Informative COVID-19 Danish Tweets'. Sammen danner de et unikt fingeraftryk.Citationsformater
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver