Abstract
We present two methods for the automatic detection of complex words in context as perceived by non-native English readers, for the SemEval 2016 Task 11 on Complex Word Identification (Paetzold and Specia, 2016). The submitted systems exploit the same set of features, but are highly disparate in (i) their
learning algorithm and (ii) their angle on the learning objective, where especially the latter presents an effort to account for the sparsity of positive instances in the data as well as the large disparity between the distributions of positive instances in the training and test data.
We further present valuable insights that we gained during intensive and extensive posttask experiments. Those revealed that despite poor results in the task, our neural network approach is competitive with the systems achieving the best results. The central contribution of this paper is therefore a demonstration of the aptitude of deep neural networks for the task of identifying complex words.
learning algorithm and (ii) their angle on the learning objective, where especially the latter presents an effort to account for the sparsity of positive instances in the data as well as the large disparity between the distributions of positive instances in the training and test data.
We further present valuable insights that we gained during intensive and extensive posttask experiments. Those revealed that despite poor results in the task, our neural network approach is competitive with the systems achieving the best results. The central contribution of this paper is therefore a demonstration of the aptitude of deep neural networks for the task of identifying complex words.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of SemEval-2016 |
Antal sider | 5 |
Forlag | Association for Computational Linguistics |
Publikationsdato | 2016 |
Sider | 1028–1033 |
ISBN (Trykt) | 978-1-941643-95-2 |
Status | Udgivet - 2016 |
Udgivet eksternt | Ja |