ITU

One of these words is not like the other: a reproduction of outlier identification using non-contextual word representations

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Standard

One of these words is not like the other: a reproduction of outlier identification using non-contextual word representations. / Brink Andersen, Jesper; Bak Bertelsen, Mikkel; Hørby Schou, Mikkel; Ciosici, Manuel Rafael; Assent, Ira.

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing and the 10th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Association for Computational Linguistics, 2020.

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Harvard

Brink Andersen, J, Bak Bertelsen, M, Hørby Schou, M, Ciosici, MR & Assent, I 2020, One of these words is not like the other: a reproduction of outlier identification using non-contextual word representations. in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing and the 10th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Association for Computational Linguistics.

APA

Brink Andersen, J., Bak Bertelsen, M., Hørby Schou, M., Ciosici, M. R., & Assent, I. (2020). One of these words is not like the other: a reproduction of outlier identification using non-contextual word representations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing and the 10th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) Association for Computational Linguistics.

Vancouver

Brink Andersen J, Bak Bertelsen M, Hørby Schou M, Ciosici MR, Assent I. One of these words is not like the other: a reproduction of outlier identification using non-contextual word representations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing and the 10th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Association for Computational Linguistics. 2020

Author

Brink Andersen, Jesper ; Bak Bertelsen, Mikkel ; Hørby Schou, Mikkel ; Ciosici, Manuel Rafael ; Assent, Ira. / One of these words is not like the other: a reproduction of outlier identification using non-contextual word representations. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing and the 10th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Association for Computational Linguistics, 2020.

Bibtex

@inproceedings{5c327a23f54e47dca3cc114eb29b4639,
title = "One of these words is not like the other: a reproduction of outlier identification using non-contextual word representations",
abstract = "Word embeddings are an active topic in the NLP research community. State-of-the-art neural models achieve high performance on downstream tasks, albeit at the cost of computationally expensive training. Cost aware solutions require cheaper models that still achieve good performance. We present several reproduction studies of intrinsic evaluation tasks that evaluate non-contextual word representations in multiple languages.Furthermore, we present 50-8-8, a new data set for the outlier identification task, which avoids limitations of the original data set, such as ambiguous words, infrequent words, and multi-word tokens, while increasing the number of test cases. The data set is expanded to contain semantic and syntactic tests and is multilingual (English, German, and Italian).We provide an in-depth analysis of word embedding models with a range of hyper-parameters. Our analysis shows the suitability of different models and hyper-parameters for different tasks and the greater difficulty of representing German and Italian languages.",
author = "{Brink Andersen}, Jesper and {Bak Bertelsen}, Mikkel and {H{\o}rby Schou}, Mikkel and Ciosici, {Manuel Rafael} and Ira Assent",
year = "2020",
month = nov,
language = "English",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing and the 10th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
publisher = "Association for Computational Linguistics",
address = "United States",

}

RIS

TY - GEN

T1 - One of these words is not like the other: a reproduction of outlier identification using non-contextual word representations

AU - Brink Andersen, Jesper

AU - Bak Bertelsen, Mikkel

AU - Hørby Schou, Mikkel

AU - Ciosici, Manuel Rafael

AU - Assent, Ira

PY - 2020/11

Y1 - 2020/11

N2 - Word embeddings are an active topic in the NLP research community. State-of-the-art neural models achieve high performance on downstream tasks, albeit at the cost of computationally expensive training. Cost aware solutions require cheaper models that still achieve good performance. We present several reproduction studies of intrinsic evaluation tasks that evaluate non-contextual word representations in multiple languages.Furthermore, we present 50-8-8, a new data set for the outlier identification task, which avoids limitations of the original data set, such as ambiguous words, infrequent words, and multi-word tokens, while increasing the number of test cases. The data set is expanded to contain semantic and syntactic tests and is multilingual (English, German, and Italian).We provide an in-depth analysis of word embedding models with a range of hyper-parameters. Our analysis shows the suitability of different models and hyper-parameters for different tasks and the greater difficulty of representing German and Italian languages.

AB - Word embeddings are an active topic in the NLP research community. State-of-the-art neural models achieve high performance on downstream tasks, albeit at the cost of computationally expensive training. Cost aware solutions require cheaper models that still achieve good performance. We present several reproduction studies of intrinsic evaluation tasks that evaluate non-contextual word representations in multiple languages.Furthermore, we present 50-8-8, a new data set for the outlier identification task, which avoids limitations of the original data set, such as ambiguous words, infrequent words, and multi-word tokens, while increasing the number of test cases. The data set is expanded to contain semantic and syntactic tests and is multilingual (English, German, and Italian).We provide an in-depth analysis of word embedding models with a range of hyper-parameters. Our analysis shows the suitability of different models and hyper-parameters for different tasks and the greater difficulty of representing German and Italian languages.

M3 - Article in proceedings

BT - Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing and the 10th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

PB - Association for Computational Linguistics

ER -

ID: 85418519