Abstract
Biomedical event extraction is a crucial task in order to automatically extract information from the increasingly growing body of biomedical literature. Despite advances in the methods in recent years, most event extraction systems are still evaluated in-domain and on complete event structures only. This makes it hard to determine the performance of intermediate stages of the task, such as edge detection, across different corpora. Motivated by these limitations, we present the first cross-domain study of edge detection for biomedical event extraction. We analyze differences between five existing gold standard corpora, create a standardized benchmark corpus, and provide a strong baseline model for edge detection. Experiments show a large drop in performance when the baseline is applied on out-of-domain data, confirming the need for domain adaptation methods for the task. To encourage research efforts in this direction, we make both the data and the baseline available to the research community: https://www.cosbi.eu/cfx/9985.
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020) |
Antal sider | 1982 |
Forlag | European Language Resources Association |
Publikationsdato | maj 2020 |
Sider | 1975 |
Status | Udgivet - maj 2020 |
Emneord
- Biomedical event extraction
- Edge detection
- Cross-domain study
- Standardized benchmark corpus
- Domain adaptation methods