Character-level Supervision for Low-resource POS Tagging

Katharina Kann, Johannes Bjerva, Isabelle Augenstein, Barbara Plank, Anders Søgaard

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

Neural part-of-speech (POS) taggers are known to not perform well with little train- ing data. As a step towards overcoming this problem, we present an architecture for learning more robust neural POS taggers by jointly training a hierarchical, recurrent model and a recurrent character- based sequence-to-sequence network supervised using an auxiliary objective. This way, we introduce stronger character-level supervision into the model, which enables better generalization to unseen words and provides regularization, making our en- coding less prone to overfitting. We experiment with three auxiliary tasks: lemmatization, character-based word autoencoding, and character-based random string autoencoding. Experiments with minimal amounts of labeled data on 34 languages show that our new architecture out- performs a single-task baseline and, surprisingly, that, on average, raw text autoencoding can be as beneficial for low- resource POS tagging as using lemma in- formation. Our neural POS tagger closes the gap to a state-of-the-art POS tagger (MarMoT) for low-resource scenarios by 43%, even outperforming it on languages with templatic morphology, e.g., Arabic, Hebrew, and Turkish, by some margin.
Original languageEnglish
Title of host publicationProceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Place of PublicationMelbourne
PublisherAssociation for Computational Linguistics
Publication dateAug 2018
Pages1-11
ISBN (Print)978-1-948087-47-6
Publication statusPublished - Aug 2018

Keywords

  • Neural POS taggers
  • Minimal training data
  • Hierarchical recurrent model
  • Recurrent character-based network
  • Auxiliary tasks

Fingerprint

Dive into the research topics of 'Character-level Supervision for Low-resource POS Tagging'. Together they form a unique fingerprint.

Cite this