Abstract
Readers’ eye movements used as part of the training signal have been shown to improve performance in a wide range of Natural Language Processing (NLP) tasks. Previous work uses gaze data either at the type level or at the token level and mostly from a single eye- tracking corpus. In this paper, we analyze type vs token-level integration options with eye tracking data from two corpora to inform two syntactic sequence labeling problems: bi- nary phrase chunking and part-of-speech tagging. We show that using globally-aggregated measures that capture the central tendency or variability of gaze data is more beneficial than proposed local views which retain individual participant information. While gaze data is in- formative for supervised POS tagging, which complements previous findings on unsupervised POS induction, almost no improvement is obtained for binary phrase chunking, except for a single specific setup. Hence, caution is warranted when using gaze data as signal for NLP, as no single view is robust over tasks, modeling choice and gaze corpus.
Original language | English |
---|---|
Title of host publication | The First Workshop Beyond Vision and LANguage: inTEgrating Real-World kNowledge : EMNLP-IJCNLP Workshop |
Place of Publication | Hong Kong |
Publisher | Association for Computational Linguistics |
Publication date | 2019 |
Pages | 51–61 |
DOIs | |
Publication status | Published - 2019 |
Keywords
- Eye Movements
- Natural Language Processing
- Syntactic Sequence Labeling
- Gaze Data Integration
- Part-of-Speech Tagging