TY - GEN
T1 - Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks
AU - Rogers, Anna
AU - Kovaleva, Olga
AU - Downey, Matthew
AU - Rumshisky, Anna
PY - 2020
Y1 - 2020
N2 - The recent explosion in question answering research produced a wealth of both factoid RC and commonsense reasoning datasets. Combining them presents a different kind of task: not deciding simply whether information is present in the text, but also whether a confident guess could be made for the missing information. To that end, we present QuAIL, the first reading comprehension dataset (a) to combine textbased, world knowledge and unanswerable questions, and (b) to provide annotation that would enable precise diagnostics of the reasoning strategies by a given QA system. QuAIL contains 15K multi-choice questions for 800 texts in 4 domains (fiction, blogs, political news, and user story texts). Crucially, to solve QuAIL a system would need to handle both general and text-specific questions, impossible to answer from pretraining data. We show that the new benchmark poses substantial challenges to the current state-of-the-art systems, with a 30% drop in accuracy compared to the most similar existing dataset.
AB - The recent explosion in question answering research produced a wealth of both factoid RC and commonsense reasoning datasets. Combining them presents a different kind of task: not deciding simply whether information is present in the text, but also whether a confident guess could be made for the missing information. To that end, we present QuAIL, the first reading comprehension dataset (a) to combine textbased, world knowledge and unanswerable questions, and (b) to provide annotation that would enable precise diagnostics of the reasoning strategies by a given QA system. QuAIL contains 15K multi-choice questions for 800 texts in 4 domains (fiction, blogs, political news, and user story texts). Crucially, to solve QuAIL a system would need to handle both general and text-specific questions, impossible to answer from pretraining data. We show that the new benchmark poses substantial challenges to the current state-of-the-art systems, with a 30% drop in accuracy compared to the most similar existing dataset.
KW - Reading comprehension dataset
KW - Question answering (QA)
KW - Commonsense reasoning
KW - Diagnostic annotation
KW - Multi-choice questions
M3 - Article in proceedings
SP - 11
BT - Proceedings of the AAAI Conference on Artificial Intelligence
ER -