Abstract
In legal NLP, Case Outcome Classification
(COC) must not only be accurate but also
trustworthy and explainable. Existing work
in explainable COC has been limited to an-
notations by a single expert. However, it is
well-known that lawyers may disagree in their
assessment of case facts. We hence collect
a novel dataset RAVE: Rationale Variation
in ECHR1, which is obtained from two ex-
perts in the domain of international human
rights law, for whom we observe weak agree-
ment. We study their disagreements and build a
two-level task-independent taxonomy, supple-
mented with COC-specific subcategories. We
quantitatively assess different taxonomy cate-
gories and find that disagreements mainly stem
from underspecification of the legal context,
which poses challenges given the typically lim-
ited granularity and noise in COC metadata. To
our knowledge, this is the first work in the legal
NLP that focuses on building a taxonomy over
human label variation. We further assess the ex-
plainablility of state-of-the-art COC models on
RAVE and observe limited agreement between
models and experts. Overall, our case study re-
veals hitherto underappreciated complexities in
creating benchmark datasets in legal NLP that
revolve around identifying aspects of a case’s
facts supposedly relevant to its outcome
(COC) must not only be accurate but also
trustworthy and explainable. Existing work
in explainable COC has been limited to an-
notations by a single expert. However, it is
well-known that lawyers may disagree in their
assessment of case facts. We hence collect
a novel dataset RAVE: Rationale Variation
in ECHR1, which is obtained from two ex-
perts in the domain of international human
rights law, for whom we observe weak agree-
ment. We study their disagreements and build a
two-level task-independent taxonomy, supple-
mented with COC-specific subcategories. We
quantitatively assess different taxonomy cate-
gories and find that disagreements mainly stem
from underspecification of the legal context,
which poses challenges given the typically lim-
ited granularity and noise in COC metadata. To
our knowledge, this is the first work in the legal
NLP that focuses on building a taxonomy over
human label variation. We further assess the ex-
plainablility of state-of-the-art COC models on
RAVE and observe limited agreement between
models and experts. Overall, our case study re-
veals hitherto underappreciated complexities in
creating benchmark datasets in legal NLP that
revolve around identifying aspects of a case’s
facts supposedly relevant to its outcome
| Originalsprog | Engelsk |
|---|---|
| Titel | Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing |
| Forlag | Association for Computational Linguistics |
| Publikationsdato | dec. 2023 |
| Sider | 9558–9576 |
| DOI | |
| Status | Udgivet - dec. 2023 |
| Begivenhed | Conference on Empirical Methods in Natural Language Processing - Resorts World Convention Centre, Singapore Varighed: 6 dec. 2023 → 10 dec. 2023 https://2023.emnlp.org/ |
Konference
| Konference | Conference on Empirical Methods in Natural Language Processing |
|---|---|
| Lokation | Resorts World Convention Centre |
| Land/Område | Singapore |
| Periode | 06/12/2023 → 10/12/2023 |
| Internetadresse |
Emneord
- Legal Natural Language Processing
- Case Outcome Classification
- Explainability in Legal AI
- Rationale Variation in ECHR
- Taxonomy of Legal Disagreements
Fingeraftryk
Dyk ned i forskningsemnerne om 'From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification'. Sammen danner de et unikt fingeraftryk.Citationsformater
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver