ITU

Revisiting the Role of Feature Engineering for Compound Type Identification in Sanskrit

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

View graph of relations

We propose an automated approach for semantic class identification of compounds in Sanskrit. It is essential to extract semantic information hidden in compounds for improving overall downstream Natural Language Processing (NLP) applications such as information extraction, question answering, machine translation, and many more. In this work, we systematically investigate the following research question: Can recent advances in neural network outperform traditional hand engineered feature based methods on the semantic level multi-class compound classi cation task for Sanskrit? Contrary to the previous methods, our method does not require feature engineering. For well-organized analysis,we categorize neural systems based on Multi-Layer Perceptron (MLP), Convolution Neural Network (CNN) and Long Short Term Memory (LSTM) architecture and feed input to the system from one of the possible levels, namely, word level, sub-word level, and character level. Our best system with LSTM architecture and FastText embedding with end-to-end training has shown promising results in terms of F-score (0.73) compared to the state of the art method based on feature engineering (0.74) and outperformed in terms of accuracy (77.68%)
Original languageEnglish
Title of host publicationProceedings of the 6th International Sanskrit Computational Linguistics Symposium : Association for Computational Linguistics
PublisherAssociation for Computational Linguistics
Publication date23 Oct 2019
Pages28-44
Publication statusPublished - 23 Oct 2019
Externally publishedYes
Close

    Research areas

  • Sanskrit, Compound words

ID: 84778506