Abstract
We present a novel corpus for personality prediction in Italian, containing a
larger number of authors and a different genre compared to previously available
resources. The corpus is built exploiting Distant Supervision, assigning Myers-
Briggs Type Indicator (MBTI) labels to YouTube comments, and can lend itself to
a variety of experiments. We report on preliminary experiments on Personal-ITY,
which can serve as a baseline for future work, showing that some types are easier
to predict than others, and discussing the perks of cross-dataset prediction.
larger number of authors and a different genre compared to previously available
resources. The corpus is built exploiting Distant Supervision, assigning Myers-
Briggs Type Indicator (MBTI) labels to YouTube comments, and can lend itself to
a variety of experiments. We report on preliminary experiments on Personal-ITY,
which can serve as a baseline for future work, showing that some types are easier
to predict than others, and discussing the perks of cross-dataset prediction.
Originalsprog | Engelsk |
---|---|
Titel | Seventh Italian Conference on Computational Linguistics |
Forlag | Association for Computational Linguistics |
Publikationsdato | 2020 |
Status | Udgivet - 2020 |
Udgivet eksternt | Ja |
Begivenhed | Seventh Italian Conference on Computational Linguistics - Bologna, Italien Varighed: 1 mar. 2021 → 3 mar. 2021 http://clic2020.ilc.cnr.it/en/home-2/?ujci4tgnr78ybub/n28j8c0awe |
Konference
Konference | Seventh Italian Conference on Computational Linguistics |
---|---|
Land/Område | Italien |
By | Bologna |
Periode | 01/03/2021 → 03/03/2021 |
Internetadresse |
Emneord
- personality prediction
- Italian corpus
- Myers-Briggs Type Indicator
- YouTube comments
- distant supervision