What's in Your Embedding, And How It Predicts Task Performance

Anna Rogers, Shashwath Hosur Ananthakrishna, Anna Rumshisky

    Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

    Abstract

    Attempts to find a single technique for general-purpose intrinsic evaluation of word embeddings have so far not been successful. We present a new approach based on scaled-up qualitative analysis of word vector neighborhoods that quantifies interpretable characteristics of a given model (e.g. its preference for synonyms or shared morphological forms as nearest neighbors). We analyze 21 such factors and show how they correlate with performance on 14 extrinsic and intrinsic task datasets (and also explain the lack of correlation between some of them). Our approach enables multi-faceted evaluation, parameter search, and generally – a more principled, hypothesis-driven approach to development of distributional semantic representations.
    OriginalsprogEngelsk
    TitelProceedings of the 27th International Conference on Computational Linguistics
    Antal sider14
    UdgivelsesstedSanta Fe, New Mexico, USA, August 20-26, 2018
    ForlagAssociation for Computational Linguistics
    Publikationsdato2018
    Sider2690-2703
    StatusUdgivet - 2018

    Fingeraftryk

    Dyk ned i forskningsemnerne om 'What's in Your Embedding, And How It Predicts Task Performance'. Sammen danner de et unikt fingeraftryk.

    Citationsformater