skip to main content
10.1145/3395363.3404364acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
short-paper

TauJud: test augmentation of machine learning in judicial documents

Published:18 July 2020Publication History

ABSTRACT

The booming of big data makes the adoption of machine learning ubiquitous in the legal field. As we all know, a large amount of test data can better reflect the performance of the model, so the test data must be naturally expanded. In order to solve the high cost problem of labeling data in natural language processing, people in the industry have improved the performance of text classification tasks through simple data amplification techniques. However, the data amplification requirements in the judgment documents are interpretable and logical, as observed from CAIL2018 test data with over 200,000 judicial documents. Therefore, we have designed a test augmentation tool called TauJud specifically for generating more effective test data with uniform distribution over time and location for model evaluation and save time in marking data. The demo can be found at https://github.com/governormars/TauJud.

References

  1. Daniel Berrar and Werner Dubitzky. 2019. Should significance testing be abandoned in machine learning? IJDSA 7, 4 ( 2019 ), 247-257.Google ScholarGoogle Scholar
  2. Reuben Binns. [n.d.]. Fairness in Machine Learning: Lessons from Political Philosophy. ([n. d.]). arXiv:1712.03586 http://arxiv.org/abs/1712.03586Google ScholarGoogle Scholar
  3. Sahaj Garg, Vincent Perot, Nicole Limtiaco, Ankur Taly, Ed H. Chi, and Alex Beutel. [n.d.]. Counterfactual Fairness in Text Classification through Robustness. ([n. d.]). arXiv: 1809.10610 http://arxiv.org/abs/ 1809.10610Google ScholarGoogle Scholar
  4. Zichen Guo, Tieke He, Zemin Qin, Zicong Xie, and Jia Liu. 2019. A Content-Based Recommendation Framework for Judicial Cases. In ICPCSEE. Springer, 76-88.Google ScholarGoogle Scholar
  5. Tie-Ke He, Hao Lian, Ze-Min Qin, Zhen-Yu Chen, and Bin Luo. 2018. PTM: A Topic Model for the Inferring of the Penalty. JCST 33, 4 ( 2018 ), 756-767.Google ScholarGoogle Scholar
  6. Michael Kamp. 2019. Black-Box Parallelization for Machine Learning. Ph.D. Dissertation. Universitäts-und Landesbibliothek Bonn.Google ScholarGoogle Scholar
  7. Stuart Lottier. [n.d.]. Distribution of Criminal Ofenses in Metropolitan Regions. 29, 1 ([n. d.]), 37. https://doi.org/10.2307/1137347 Google ScholarGoogle ScholarCross RefCross Ref
  8. Jason Wei and Kai Zou. [n.d.]. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. ([n. d.]). arXiv: 1901.11196 http://arxiv.org/abs/ 1901.11196Google ScholarGoogle Scholar
  9. Chaojun Xiao, Haoxi Zhong, Zhipeng Guo, et al. 2018. Cail2018: A large-scale legal dataset for judgment prediction. arXiv preprint arXiv: 1807. 02478 ( 2018 ).Google ScholarGoogle Scholar
  10. Zihuan Xu, Tieke He, Hao Lian, Jiabing Wan, and Hui Wang. 2019. Case Facts Analysis Method Based on Deep Learning. In WISA. Springer, 92-97.Google ScholarGoogle Scholar
  11. Ge Yan, Yu Li, Shu Zhang, and Zhenyu Chen. 2019. Data Augmentation for Deep Learning of Judgment Documents. In IScIDE. Springer, 232-242.Google ScholarGoogle Scholar
  12. Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V. Le. [n.d.]. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. ([n. d.]). arXiv: 1804.09541 http://arxiv.org/abs/ 1804.09541Google ScholarGoogle Scholar

Index Terms

  1. TauJud: test augmentation of machine learning in judicial documents

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis
      July 2020
      591 pages
      ISBN:9781450380089
      DOI:10.1145/3395363

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 July 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate58of213submissions,27%

      Upcoming Conference

      ISSTA '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader