An In-depth Analysis of the Effect of Lexical Normalization on the Dependency Parsing of Social Media

Publikation: Konference artikel i Proceeding eller bog/rapport kapitelKonferencebidrag i proceedingsForskningpeer review

Abstrakt

Existing natural language processing systems have often been designed with standard texts in mind. However, when these tools are used on the substantially different texts from social media, their performance drops dramatically. One solution is to translate social media data to standard language before processing, this is also called normalization. It is well-known that this improves performance for many natural language processing tasks on social media data. However, little is known about which types of normalization replacements have the most effect. Furthermore, it is unknown what the weaknesses of existing lexical normalization systems are in an extrinsic setting. In this paper, we analyze the effect of manual as well as automatic lexical normalization for dependency parsing. After our analysis, we conclude that for most categories, automatic normalization scores close to manually annotated normalization and that small annotation differences are important to take into consideration when exploiting normalization in a pipeline setup.
OriginalsprogEngelsk
TitelProceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)
Antal sider5
UdgivelsesstedHong Kong, China
ForlagAssociation for Computational Linguistics
Publikationsdatookt. 2019
Sider115–120
DOI
StatusUdgivet - okt. 2019
BegivenhedThe 5th Workshop on Noisy User-generated Text - Asia World Expo, Hong Kong, Hong Kong
Varighed: 4 nov. 20194 nov. 2019
Konferencens nummer: 5
http://noisy-text.github.io/2019/

Konference

KonferenceThe 5th Workshop on Noisy User-generated Text
Nummer5
LokationAsia World Expo
Land/OmrådeHong Kong
ByHong Kong
Periode04/11/201904/11/2019
Internetadresse

Fingeraftryk

Dyk ned i forskningsemnerne om 'An In-depth Analysis of the Effect of Lexical Normalization on the Dependency Parsing of Social Media'. Sammen danner de et unikt fingeraftryk.

Citationsformater