Abstract
Readers’ eye movements used as part of the training signal have been shown to improve performance in a wide range of Natural Language Processing (NLP) tasks. Previous work uses gaze data either at the type level or at the token level and mostly from a single eye- tracking corpus. In this paper, we analyze type vs token-level integration options with eye tracking data from two corpora to inform two syntactic sequence labeling problems: bi- nary phrase chunking and part-of-speech tagging. We show that using globally-aggregated measures that capture the central tendency or variability of gaze data is more beneficial than proposed local views which retain individual participant information. While gaze data is in- formative for supervised POS tagging, which complements previous findings on unsupervised POS induction, almost no improvement is obtained for binary phrase chunking, except for a single specific setup. Hence, caution is warranted when using gaze data as signal for NLP, as no single view is robust over tasks, modeling choice and gaze corpus.
Originalsprog | Engelsk |
---|---|
Titel | The First Workshop Beyond Vision and LANguage: inTEgrating Real-World kNowledge : EMNLP-IJCNLP Workshop |
Udgivelsessted | Hong Kong |
Forlag | Association for Computational Linguistics |
Publikationsdato | 2019 |
Sider | 51–61 |
DOI | |
Status | Udgivet - 2019 |