What's in Your Embedding, And How It Predicts Task Performance

Anna Rogers, Shashwath Hosur Ananthakrishna, Anna Rumshisky

    Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

    Abstract

    Attempts to find a single technique for general-purpose intrinsic evaluation of word embeddings have so far not been successful. We present a new approach based on scaled-up qualitative analysis of word vector neighborhoods that quantifies interpretable characteristics of a given model (e.g. its preference for synonyms or shared morphological forms as nearest neighbors). We analyze 21 such factors and show how they correlate with performance on 14 extrinsic and intrinsic task datasets (and also explain the lack of correlation between some of them). Our approach enables multi-faceted evaluation, parameter search, and generally – a more principled, hypothesis-driven approach to development of distributional semantic representations.
    Original languageEnglish
    Title of host publicationProceedings of the 27th International Conference on Computational Linguistics
    Number of pages14
    Place of PublicationSanta Fe, New Mexico, USA, August 20-26, 2018
    PublisherAssociation for Computational Linguistics
    Publication date2018
    Pages2690-2703
    Publication statusPublished - 2018

    Fingerprint

    Dive into the research topics of 'What's in Your Embedding, And How It Predicts Task Performance'. Together they form a unique fingerprint.

    Cite this