Revisiting the Role of Feature Engineering for Compound Type Identification in Sanskrit

Jivnesh Sandhan, Amrith Krishna, Pawan Goyal, Laxmidhar Behera

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstract

We propose an automated approach for semantic class identification of compounds in Sanskrit. It is essential to extract semantic information hidden in compounds for improving overall downstream Natural Language Processing (NLP) applications such as information extraction, question answering, machine translation, and many more. In this work, we systematically investigate the following research question: Can recent advances in neural network outperform traditional hand engineered feature based methods on the semantic level multi-class compound classi cation task for Sanskrit? Contrary to the previous methods, our method does not require feature engineering. For well-organized analysis,we categorize neural systems based on Multi-Layer Perceptron (MLP), Convolution Neural Network (CNN) and Long Short Term Memory (LSTM) architecture and feed input to the system from one of the possible levels, namely, word level, sub-word level, and character level. Our best system with LSTM architecture and FastText embedding with end-to-end training has shown promising results in terms of F-score (0.73) compared to the state of the art method based on feature engineering (0.74) and outperformed in terms of accuracy (77.68%)
OriginalsprogEngelsk
TitelProceedings of the 6th International Sanskrit Computational Linguistics Symposium : Association for Computational Linguistics
ForlagAssociation for Computational Linguistics
Publikationsdato23 okt. 2019
Sider28-44
StatusUdgivet - 23 okt. 2019
Udgivet eksterntJa

Emneord

  • Sanskrit
  • Compound words

Fingeraftryk

Dyk ned i forskningsemnerne om 'Revisiting the Role of Feature Engineering for Compound Type Identification in Sanskrit'. Sammen danner de et unikt fingeraftryk.

Citationsformater