Abstract
We propose an automated approach for semantic class identification of compounds in Sanskrit. It is essential to extract semantic information hidden in compounds for improving overall downstream Natural Language Processing (NLP) applications such as information extraction, question answering, machine translation, and many more. In this work, we systematically investigate the following research question: Can recent advances in neural network outperform traditional hand engineered feature based methods on the semantic level multi-class compound classi cation task for Sanskrit? Contrary to the previous methods, our method does not require feature engineering. For well-organized analysis,we categorize neural systems based on Multi-Layer Perceptron (MLP), Convolution Neural Network (CNN) and Long Short Term Memory (LSTM) architecture and feed input to the system from one of the possible levels, namely, word level, sub-word level, and character level. Our best system with LSTM architecture and FastText embedding with end-to-end training has shown promising results in terms of F-score (0.73) compared to the state of the art method based on feature engineering (0.74) and outperformed in terms of accuracy (77.68%)
Original language | English |
---|---|
Title of host publication | Proceedings of the 6th International Sanskrit Computational Linguistics Symposium : Association for Computational Linguistics |
Publisher | Association for Computational Linguistics |
Publication date | 23 Oct 2019 |
Pages | 28-44 |
Publication status | Published - 23 Oct 2019 |
Externally published | Yes |
Keywords
- Sanskrit
- Compound words