Cross-Lingual Cross-Domain Nested Named Entity Evaluation on English Web Texts

Research output: Conference Article in Proceeding or Book/Report chapterArticle in proceedingsResearchpeer-review

Abstract

Named Entity Recognition (NER) is a key
Natural Language Processing task. However,
most existing work on NER targets flat named
entities (NEs) and ignores the recognition of
nested structures, where entities can be en-
closed within other NEs. Moreover, evaluation
of Nested Named Entity Recognition (NNER)
across domains remains challenging, mainly
due to the limited availability of datasets. To
address these gaps, we present EWT-NNER,
a dataset covering five web domains annotated
for nested named entities on top of the English
Web Treebank (EWT). We present the corpus
and an empirical evaluation, including trans-
fer results from German and Danish. EWT-
NNER is annotated for four major entity types,
including suffixes for derivational entity mark-
ers and partial named entities, spanning a total
of 12 classes. We envision the public release
of EWT-NNER to encourage further research
on nested NER, particularly on cross-lingual
cross-domain evaluation.
Original languageEnglish
Title of host publicationFindings of ACL 2021
Number of pages1815
PublisherAssociation for Computational Linguistics
Publication date2021
Pages1808
DOIs
Publication statusPublished - 2021

Cite this