Subcharacter Information in Japanese Embeddings: When Is It Worth It?

Marzena Karpinska, Bofang Li, Anna Rogers, Aleksandr Drozd

    Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

    Abstract

    Languages with logographic writing systems present a difficulty for traditional character-level models. Leveraging the subcharacter information was recently shown to be beneficial for a number of intrinsic and extrinsic tasks in Chinese. We examine whether the same strategies could be applied for Japanese, and contribute a new analogy dataset for this language.
    Original languageEnglish
    Title of host publicationProceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP
    Number of pages10
    Place of PublicationMelbourne, Australia
    PublisherAssociation for Computational Linguistics
    Publication date2018
    Pages28-37
    Publication statusPublished - 2018

    Keywords

    • Logographic writing systems
    • Character-level models
    • Subcharacter information
    • Chinese language processing
    • Japanese language analogies

    Fingerprint

    Dive into the research topics of 'Subcharacter Information in Japanese Embeddings: When Is It Worth It?'. Together they form a unique fingerprint.

    Cite this