Abstract
The recent explosion in question answering research produced a wealth of both factoid RC and commonsense reasoning datasets. Combining them presents a different kind of task: not deciding simply whether information is present in the text, but also whether a confident guess could be made for the missing information. To that end, we present QuAIL, the first reading comprehension dataset (a) to combine textbased, world knowledge and unanswerable questions, and (b) to provide annotation that would enable precise diagnostics of the reasoning strategies by a given QA system. QuAIL contains 15K multi-choice questions for 800 texts in 4 domains (fiction, blogs, political news, and user story texts). Crucially, to solve QuAIL a system would need to handle both general and text-specific questions, impossible to answer from pretraining data. We show that the new benchmark poses substantial challenges to the current state-of-the-art systems, with a 30% drop in accuracy compared to the most similar existing dataset.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the AAAI Conference on Artificial Intelligence |
Antal sider | 1 |
Publikationsdato | 2020 |
Sider | 11 |
Status | Udgivet - 2020 |
Emneord
- Reading comprehension dataset
- Question answering (QA)
- Commonsense reasoning
- Diagnostic annotation
- Multi-choice questions