RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian

Anna Rogers, Alexey Romanov, Anna Rumshisky, Svitlana Volkova, Mikhail Gronas, Alex Gribov

    Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

    Abstract

    This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages. RuSentiment is currently the largest in its class for Russian, with 31,185 posts annotated with Fleiss’ kappa of 0.58 (3 annotations per post). To diversify the dataset, 6,950 posts were pre-selected with an active learning-style strategy. We report baseline classification results, and we also release the best-performing embeddings trained on 3.2B tokens of Russian VKontakte posts.
    Original languageEnglish
    Title of host publicationProceedings of the 27th International Conference on Computational Linguistics
    Number of pages9
    Place of PublicationSanta Fe, New Mexico, USA
    PublisherAssociation for Computational Linguistics
    Publication date2018
    Pages755-763
    Publication statusPublished - 2018

    Keywords

    • RuSentiment dataset
    • sentiment analysis
    • social media
    • Russian language
    • annotation guidelines

    Fingerprint

    Dive into the research topics of 'RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian'. Together they form a unique fingerprint.

    Cite this