Abstract
In legal NLP, Case Outcome Classification
(COC) must not only be accurate but also
trustworthy and explainable. Existing work
in explainable COC has been limited to an-
notations by a single expert. However, it is
well-known that lawyers may disagree in their
assessment of case facts. We hence collect
a novel dataset RAVE: Rationale Variation
in ECHR1, which is obtained from two ex-
perts in the domain of international human
rights law, for whom we observe weak agree-
ment. We study their disagreements and build a
two-level task-independent taxonomy, supple-
mented with COC-specific subcategories. We
quantitatively assess different taxonomy cate-
gories and find that disagreements mainly stem
from underspecification of the legal context,
which poses challenges given the typically lim-
ited granularity and noise in COC metadata. To
our knowledge, this is the first work in the legal
NLP that focuses on building a taxonomy over
human label variation. We further assess the ex-
plainablility of state-of-the-art COC models on
RAVE and observe limited agreement between
models and experts. Overall, our case study re-
veals hitherto underappreciated complexities in
creating benchmark datasets in legal NLP that
revolve around identifying aspects of a case’s
facts supposedly relevant to its outcome
(COC) must not only be accurate but also
trustworthy and explainable. Existing work
in explainable COC has been limited to an-
notations by a single expert. However, it is
well-known that lawyers may disagree in their
assessment of case facts. We hence collect
a novel dataset RAVE: Rationale Variation
in ECHR1, which is obtained from two ex-
perts in the domain of international human
rights law, for whom we observe weak agree-
ment. We study their disagreements and build a
two-level task-independent taxonomy, supple-
mented with COC-specific subcategories. We
quantitatively assess different taxonomy cate-
gories and find that disagreements mainly stem
from underspecification of the legal context,
which poses challenges given the typically lim-
ited granularity and noise in COC metadata. To
our knowledge, this is the first work in the legal
NLP that focuses on building a taxonomy over
human label variation. We further assess the ex-
plainablility of state-of-the-art COC models on
RAVE and observe limited agreement between
models and experts. Overall, our case study re-
veals hitherto underappreciated complexities in
creating benchmark datasets in legal NLP that
revolve around identifying aspects of a case’s
facts supposedly relevant to its outcome
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing |
| Publisher | Association for Computational Linguistics |
| Publication date | Dec 2023 |
| Pages | 9558–9576 |
| DOIs | |
| Publication status | Published - Dec 2023 |
| Event | Conference on Empirical Methods in Natural Language Processing - Resorts World Convention Centre, Singapore Duration: 6 Dec 2023 → 10 Dec 2023 https://2023.emnlp.org/ |
Conference
| Conference | Conference on Empirical Methods in Natural Language Processing |
|---|---|
| Location | Resorts World Convention Centre |
| Country/Territory | Singapore |
| Period | 06/12/2023 → 10/12/2023 |
| Internet address |
Keywords
- Legal Natural Language Processing
- Case Outcome Classification
- Explainability in Legal AI
- Rationale Variation in ECHR
- Taxonomy of Legal Disagreements
Fingerprint
Dive into the research topics of 'From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver