Abstract
In legal NLP, Case Outcome Classification
(COC) must not only be accurate but also
trustworthy and explainable. Existing work
in explainable COC has been limited to an-
notations by a single expert. However, it is
well-known that lawyers may disagree in their
assessment of case facts. We hence collect
a novel dataset RAVE: Rationale Variation
in ECHR1, which is obtained from two ex-
perts in the domain of international human
rights law, for whom we observe weak agree-
ment. We study their disagreements and build a
two-level task-independent taxonomy, supple-
mented with COC-specific subcategories. We
quantitatively assess different taxonomy cate-
gories and find that disagreements mainly stem
from underspecification of the legal context,
which poses challenges given the typically lim-
ited granularity and noise in COC metadata. To
our knowledge, this is the first work in the legal
NLP that focuses on building a taxonomy over
human label variation. We further assess the ex-
plainablility of state-of-the-art COC models on
RAVE and observe limited agreement between
models and experts. Overall, our case study re-
veals hitherto underappreciated complexities in
creating benchmark datasets in legal NLP that
revolve around identifying aspects of a case’s
facts supposedly relevant to its outcome
(COC) must not only be accurate but also
trustworthy and explainable. Existing work
in explainable COC has been limited to an-
notations by a single expert. However, it is
well-known that lawyers may disagree in their
assessment of case facts. We hence collect
a novel dataset RAVE: Rationale Variation
in ECHR1, which is obtained from two ex-
perts in the domain of international human
rights law, for whom we observe weak agree-
ment. We study their disagreements and build a
two-level task-independent taxonomy, supple-
mented with COC-specific subcategories. We
quantitatively assess different taxonomy cate-
gories and find that disagreements mainly stem
from underspecification of the legal context,
which poses challenges given the typically lim-
ited granularity and noise in COC metadata. To
our knowledge, this is the first work in the legal
NLP that focuses on building a taxonomy over
human label variation. We further assess the ex-
plainablility of state-of-the-art COC models on
RAVE and observe limited agreement between
models and experts. Overall, our case study re-
veals hitherto underappreciated complexities in
creating benchmark datasets in legal NLP that
revolve around identifying aspects of a case’s
facts supposedly relevant to its outcome
Originalsprog | Engelsk |
---|---|
Titel | Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing |
Forlag | Association for Computational Linguistics |
Publikationsdato | dec. 2023 |
Sider | 9558–9576 |
DOI | |
Status | Udgivet - dec. 2023 |
Emneord
- Legal Natural Language Processing
- Case Outcome Classification
- Explainability in Legal AI
- Rationale Variation in ECHR
- Taxonomy of Legal Disagreements