ABSTRACT
While self-supervised learning has made rapid advances in natural language processing, it remains unclear when researchers should engage in resource-intensive domain-specific pretraining (domain pretraining). The law, puzzlingly, has yielded few documented instances of substantial gains to domain pretraining in spite of the fact that legal language is widely seen to be unique. We hypothesize that these existing results stem from the fact that existing legal NLP tasks are too easy and fail to meet conditions for when domain pretraining can help. To address this, we first present CaseHOLD (Case <u>H</u>oldings <u>O</u>n <u>L</u>egal <u>D</u>ecisions), a new dataset comprised of over 53,000+ multiple choice questions to identify the relevant holding of a cited case. This dataset presents a fundamental task to lawyers and is both legally meaningful and difficult from an NLP perspective (F1 of 0.4 with a BiLSTM baseline). Second, we assess performance gains on CaseHOLD and existing legal NLP datasets. While a Transformer architecture (BERT) pretrained on a general corpus (Google Books and Wikipedia) improves performance, domain pretraining (on a corpus of ≈3.5M decisions across all courts in the U.S. that is larger than BERT's) with a custom legal vocabulary exhibits the most substantial performance gains with CaseHOLD (gain of 7.2% on F1, representing a 12% improvement on BERT) and consistent performance gains across two other legal tasks. Third, we show that domain pretraining may be warranted when the task exhibits sufficient similarity to the pretraining corpus: the level of performance increase in three legal tasks was directly tied to the domain specificity of the task. Our findings inform when researchers should engage in resource-intensive pretraining and show that Transformer-based architectures, too, learn embeddings suggestive of distinct legal language.
- Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preoţiuc-Pietro, and Vasileios Lampos. 2016. Predicting judicial decisions of the European Court of Human Rights: A natural language processing perspective. PeerJ Computer Science 2 (2016), e93.Google ScholarCross Ref
- Pablo D Arredondo. 2017. Harvesting and Utilizing Explanatory Parentheticals. SCL Rev. 69 (2017), 659.Google Scholar
- Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3615--3620.Google Scholar
- Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, New York, NY, USA, 610--623.Google ScholarDigital Library
- Ilias Chalkidis, Ion Androutsopoulos, and Nikolaos Aletras. 2019. Neural Legal Judgment Prediction in English. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 4317--4323. https://www.aclweb.org/anthology/P19-1424Google ScholarCross Ref
- Ilias Chalkidis, Ion Androutsopoulos, and Achilleas Michos. 2017. Extracting Contract Elements. In Proceedings of the 16th Edition of the International Conference on Articial Intelligence and Law (London, United Kingdom) (ICAIL '17). Association for Computing Machinery, New York, NY, USA, 19--28.Google ScholarDigital Library
- Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. 2020. LEGAL-BERT: The Muppets straight out of Law School. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 2898--2904. https://www.aclweb.org/anthology/2020.findings-emnlp.261Google ScholarCross Ref
- Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, and Ion Androutsopoulos. 2019. Neural Contract Element Extraction Revisited. Workshop on Document Intelligence at NeurIPS 2019. https://openreview.net/forum?id=B1x6fa95UHGoogle Scholar
- Columbia Law Review Ass'n, Harvard Law Review Ass'n, and Yale Law Journal. 2015. The Bluebook: A Uniform System of Citation (21st ed.). The Columbia Law Review, The Harvard Law Review, The University of Pennsylvania Law Review, and The Yale Law Journal.Google Scholar
- Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, and Guoping Hu. 2019. Pre-Training with Whole Word Masking for Chinese BERT. arXiv:1906.08101 [cs.CL]Google Scholar
- Laura C. Dabney. 2008. Citators: Past, Present, and Future. Legal Reference Services Quarterly 27, 2--3 (2008), 165--190.Google ScholarCross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://www.aclweb.org/anthology/N19-1423Google Scholar
- Pintip Hompluem Dunn. 2003. How judges overrule: Speech act theory and the doctrine of stare decisis. Yale LJ 113 (2003), 493.Google ScholarCross Ref
- Emad Elwany, Dave Moore, and Gaurav Oberoi. 2019. BERT Goes to Law School: Quantifying the Competitive Advantage of Access to Large Legal Corpora in Contract Understanding. arXiv:1911.00473 http://arxiv.org/abs/1911.00473Google Scholar
- David Freeman Engstrom and Daniel E Ho. 2020. Algorithmic accountability in the administrative state. Yale J. on Reg. 37 (2020), 800.Google Scholar
- David Freeman Engstrom, Daniel E. Ho, Catherine Sharkey, and Mariano-Florentino Cuéllar. 2020. Government by Algorithm: Artificial Intelligence in Federal Administrative Agencies. Administrative Conference of the United States, Washington DC, United States.Google ScholarCross Ref
- European Union 1993. Council Directive 93/13/EEC of 5 April 1993 on unfair terms in consumer contracts. European Union.Google Scholar
- Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt. 2021. Aligning AI With Shared Human Values. arXiv:2008.02275 [cs.CY]Google Scholar
- Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring Massive Multitask Language Understanding. arXiv:2009.03300 [cs.CY]Google Scholar
- Michael J. Bommarito II, Daniel Martin Katz, and Eric M. Detterman. 2018. LexNLP: Natural language processing and information extraction for legal and regulatory texts. arXiv:1806.03688 http://arxiv.org/abs/1806.03688Google Scholar
- Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2020. SpanBERT: Improving Pre-training by Representing and Predicting Spans. Transactions of the Association for Computational Linguistics 8 (2020), 64--77. https://www.aclweb.org/anthology/2020.tacl-1.5Google ScholarCross Ref
- David Jurgens, Srijan Kumar, Raine Hoover, Dan McFarland, and Dan Jurafsky. 2018. Measuring the Evolution of a Scientific Field through Citation Frames. Transactions of the Association for Computational Linguistics 6 (2018), 391--406. https://www.aclweb.org/anthology/Q18-1028Google ScholarCross Ref
- Minki Kang, Moonsu Han, and Sung Ju Hwang. 2020. Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 6102--6120. https://www.aclweb.org/anthology/2020.emnlp-main.493Google ScholarCross Ref
- Taku Kudo and John Richardson. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. arXiv:1808.06226 [cs.CL]Google Scholar
- Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2019), 1234--1240.Google ScholarCross Ref
- Marco Lippi, Przemysław Pałka, Giuseppe Contissa, Francesca Lagioia, Hans-Wolfgang Micklitz, Giovanni Sartor, and Paolo Torroni. 2019. CLAUDETTE: an automated detector of potentially unfair clauses in online terms of service. Artificial Intelligence and Law 27, 2 (2019), 117--139.Google ScholarDigital Library
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs.CL]Google Scholar
- David Mellinkoff. 2004. The language of the law. Wipf and Stock Publishers, Eugene, Oregon.Google Scholar
- Elizabeth Mertz. 2007. The Language of Law School: Learning to "Think Like a Lawyer". Oxford University Press, USA.Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. http://arxiv.org/abs/1301.3781Google Scholar
- Octavia-Maria, Marcos Zampieri, Shervin Malmasi, Mihaela Vela, Liviu P. Dinu, and Josef van Genabith. 2017. Exploring the Use of Text Classification in the Legal Domain. Proceedings of 2nd Workshop on Automated Semantic Analysis of Information in Legal Texts (ASAIL).Google Scholar
- Adam R. Pah, David L. Schwartz, Sarath Sanga, Zachary D. Clopton, Peter DiCola, Rachel Davis Mersey, Charlotte S. Alexander, Kristian J. Hammond, and Luís A. Nunes Amaral. 2020. How to build a more open justice system. Science 369, 6500 (2020), 134--136.Google ScholarCross Ref
- Anusri Pampari, Preethi Raghavan, Jennifer Liang, and Jian Peng. 2018. emrQA: A Large Corpus for Question Answering on Electronic Medical Records. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 2357--2368. https://www.aclweb.org/anthology/D18-1258Google ScholarCross Ref
- Marc Queudot, Éric Charton, and Marie-Jean Meurs. 2020. Improving Access to Justice with Legal Chatbots. Stats 3, 3 (2020), 356--375.Google ScholarCross Ref
- Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training.Google Scholar
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 2383--2392. https://www.aclweb.org/anthology/D16-1264Google ScholarCross Ref
- Siva Reddy, Danqi Chen, and Christopher D Manning. 2019. Coqa: A conversational question answering challenge. Transactions of the Association for Computational Linguistics 7 (2019), 249--266.Google ScholarCross Ref
- Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2021. A primer in bertology: What we know about how bert works. Transactions of the Association for Computational Linguistics 8 (2021), 842--866.Google ScholarCross Ref
- Phillip Rust, Jonas Pfeiffer, Ivan Vulić, Sebastian Ruder, and Iryna Gurevych. 2020. How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models. arXiv:2012.15613 [cs.CL]Google Scholar
- Jaromir Savelka, Vern R Walker, Matthias Grabmair, and Kevin D Ashley. 2017. Sentence boundary detection in adjudicatory decisions in the United States. Traitement automatique des langues 58 (2017), 21.Google Scholar
- Or Sharir, Barak Peleg, and Yoav Shoham. 2020. The Cost of Training NLP Models: A Concise Overview. arXiv:2004.08900 [cs.CL]Google Scholar
- Koustuv Sinha, Prasanna Parthasarathi, Joelle Pineau, and Adina Williams. 2020. Unnatural Language Inference. arXiv:2101.00010 [cs.CL]Google Scholar
- Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. ERNIE: Enhanced Representation through Knowledge Integration. arXiv:1904.09223 [cs.CL]Google Scholar
- P.M. Tiersma. 1999. Legal Language. University of Chicago Press, Chicago, Illinois. https://books.google.com/books?id=Sq8XXTo3A48CGoogle Scholar
- George Tsatsaronis, Georgios Balikas, Prodromos Malakasiotis, Ioannis Partalas, Matthias Zschunke, Michael R. Alvers, Dirk Weissenborn, Anastasia Krithara, Sergios Petridis, Dimitris Polychronopoulos, Yannis Almirantis, John Pavlopoulos, Nicolas Baskiotis, Patrick Gallinari, Thierry Artiéres, Axel-Cyrille Ngonga Ngomo, Norman Heino, Eric Gaussier, Liliana Barrio-Alvers, Michael Schroeder, Ion Androutsopoulos, and Georgios Paliouras. 2015. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics 16, 1 (April 2015), 138.Google ScholarCross Ref
- Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, Brussels, Belgium, 353--355. https://www.aclweb.org/anthology/W18-5446Google ScholarCross Ref
- Jonah Wu. 2019. AI Goes to Court: The Growing Landscape of AI for Access to Justice. https://medium.com/legal-design-and-innovation/ai-goes-to-court-the-growing-landscape-of-ai-for-access-to-justice-3f58aca4306fGoogle Scholar
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv:1609.08144 [cs.CL]Google Scholar
- Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, and Maosong Sun. 2020. How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5218--5230. https://www.aclweb.org/anthology/2020.acl-main.466Google ScholarCross Ref
- Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, and Maosong Sun. 2020. JEC-QA: A Legal-Domain Question Answering Dataset., 9701-9708 pages.Google Scholar
Index Terms
- When does pretraining help?: assessing self-supervised learning for law and the CaseHOLD dataset of 53,000+ legal holdings
Recommendations
Natural language processing in law: Prediction of outcomes in the higher courts of Turkey
AbstractNatural language processing (NLP) based approaches have recently received attention for legal systems of several countries. It is of interest to study the wide variety of legal systems that have so far not received any attention. In ...
Extracting value from Brazilian Court decisions
AbstractWe propose a methodology to extract value from Brazilian Court decisions to support judges and lawyers in their decision-making. We instantiate our methodology in one information system we have developed. Such system (i) extracts ...
Highlights- The Brazilian legal system has seen an increase of the case law in the last years.
Lessons in Copyright Activism: K-12 Education and the DMCA 1201 Exemption Rulemaking Process
Digital learning is being transformed by changes in copyright law. This article discusses the author's personal journey as a copyright education activist through two rounds of rulemaking proceedings before the Copyright Office concerning the anti-...
Comments