Published March 2, 2022 | Version v1
Dataset Open

Fairlex: A multilingual benchmark for evaluating fairness in legal text processing

  • 1. University of Copenhagen, Denmark
  • 2. University of Defense Technology, People's Republic of China

Description

We present a benchmark suite of four datasets for evaluating the fairness of pre-trained legal language models and the techniques used to fine-tune them for downstream tasks. Our benchmarks cover four jurisdictions (European Council, USA, Swiss, and Chinese), five languages (English, German, French, Italian, and Chinese), and fairness across five attributes (gender, age, nationality/region, language, and legal area). In our experiments, we evaluate pre-trained language models using several group-robust fine-tuning techniques and show that performance group disparities are vibrant in many cases, while none of these techniques guarantee fairness, nor consistently mitigate group disparities. Furthermore, we provide a quantitative and qualitative analysis of our results, highlighting open challenges in the development of robustness methods in legal NLP.

Files

cail.zip

Files (365.1 MB)

Name Size Download all
md5:9685ab4741109b47bb31e1d18e0afcf9
113.0 MB Preview Download
md5:509a903019df018c0619de4b947bc507
31.9 MB Preview Download
md5:62bdbc95dbf84af688959b02b3dea3c9
85.4 MB Preview Download
md5:741dd0c4495d74510d287b3885cf4f1f
134.8 MB Preview Download