short-paper

TauJud: test augmentation of machine learning in judicial documents

Authors:
Zichen Guo

Nanjing University, China

Nanjing University, China
View Profile

,
Jiawei Liu

Nanjing University, China

Nanjing University, China
View Profile

,
Tieke He

Nanjing University, China

Nanjing University, China
View Profile

,
Zhuoyang Li

Nanjing University, China

Nanjing University, China
View Profile

,
Peitian Zhangzhu

Nanjing University, China

Nanjing University, China
View Profile

ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and AnalysisJuly 2020Pages 549–552https://doi.org/10.1145/3395363.3404364

Published:18 July 2020Publication History

ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 549–552

ABSTRACT

The booming of big data makes the adoption of machine learning ubiquitous in the legal field. As we all know, a large amount of test data can better reflect the performance of the model, so the test data must be naturally expanded. In order to solve the high cost problem of labeling data in natural language processing, people in the industry have improved the performance of text classification tasks through simple data amplification techniques. However, the data amplification requirements in the judgment documents are interpretable and logical, as observed from CAIL2018 test data with over 200,000 judicial documents. Therefore, we have designed a test augmentation tool called TauJud specifically for generating more effective test data with uniform distribution over time and location for model evaluation and save time in marking data. The demo can be found at https://github.com/governormars/TauJud.

References

Daniel Berrar and Werner Dubitzky. 2019. Should significance testing be abandoned in machine learning? IJDSA 7, 4 ( 2019 ), 247-257.Google Scholar
Reuben Binns. [n.d.]. Fairness in Machine Learning: Lessons from Political Philosophy. ([n. d.]). arXiv:1712.03586 http://arxiv.org/abs/1712.03586Google Scholar
Sahaj Garg, Vincent Perot, Nicole Limtiaco, Ankur Taly, Ed H. Chi, and Alex Beutel. [n.d.]. Counterfactual Fairness in Text Classification through Robustness. ([n. d.]). arXiv: 1809.10610 http://arxiv.org/abs/ 1809.10610Google Scholar
Zichen Guo, Tieke He, Zemin Qin, Zicong Xie, and Jia Liu. 2019. A Content-Based Recommendation Framework for Judicial Cases. In ICPCSEE. Springer, 76-88.Google Scholar
Tie-Ke He, Hao Lian, Ze-Min Qin, Zhen-Yu Chen, and Bin Luo. 2018. PTM: A Topic Model for the Inferring of the Penalty. JCST 33, 4 ( 2018 ), 756-767.Google Scholar
Michael Kamp. 2019. Black-Box Parallelization for Machine Learning. Ph.D. Dissertation. Universitäts-und Landesbibliothek Bonn.Google Scholar
Stuart Lottier. [n.d.]. Distribution of Criminal Ofenses in Metropolitan Regions. 29, 1 ([n. d.]), 37. https://doi.org/10.2307/1137347 Google ScholarCross Ref
Jason Wei and Kai Zou. [n.d.]. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. ([n. d.]). arXiv: 1901.11196 http://arxiv.org/abs/ 1901.11196Google Scholar
Chaojun Xiao, Haoxi Zhong, Zhipeng Guo, et al. 2018. Cail2018: A large-scale legal dataset for judgment prediction. arXiv preprint arXiv: 1807. 02478 ( 2018 ).Google Scholar
Zihuan Xu, Tieke He, Hao Lian, Jiabing Wan, and Hui Wang. 2019. Case Facts Analysis Method Based on Deep Learning. In WISA. Springer, 92-97.Google Scholar
Ge Yan, Yu Li, Shu Zhang, and Zhenyu Chen. 2019. Data Augmentation for Deep Learning of Judgment Documents. In IScIDE. Springer, 232-242.Google Scholar
Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V. Le. [n.d.]. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. ([n. d.]). arXiv: 1804.09541 http://arxiv.org/abs/ 1804.09541Google Scholar

Index Terms

TauJud: test augmentation of machine learning in judicial documents
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Big Data Processing using Machine Learning algorithms: MLlib and Mahout Use Case
SITA'18: Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications

Machine learning is a field within artificial intelligence that allows machines to learn on their own from existing information to make predictions or/and decisions. There are three main categories of machine learning techniques: Collaborative filtering ...
Read More
Machine learning on big data

Machine learning (ML) is continuously unleashing its power in a wide range of applications. It has been pushed to the forefront in recent years partly owing to the advent of big data. ML algorithms have never been better promised while challenged by big ...
Read More
A Multi-dimensional Comparison of Toolkits for Machine Learning with Big Data
IRI '15: Proceedings of the 2015 IEEE International Conference on Information Reuse and Integration

Big data is a big business, and effective modeling of this data is key. This paper provides a comprehensive multidimensional analysis of various open source tools for machine learning with big data. An evaluation standard is proposed along with detailed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis
July 2020
591 pages
ISBN:9781450380089
DOI:10.1145/3395363
General Chair:
Sarfraz Khurshid
University of Texas at Austin, USA
,
Program Chair:
Corina S. Păsăreanu
Carnegie Mellon University Silicon Valley / NASA Ames Research Center, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 July 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Judicial Documents
Machine Learning
Test Augmentation
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate58of213submissions,27%
Upcoming Conference
ISSTA '24

Sponsor:

sigsoft

33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna , Austria
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 133
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TauJud: test augmentation of machine learning in judicial documents

ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Big Data Processing using Machine Learning algorithms: MLlib and Mahout Use Case

Machine learning on big data

A Multi-dimensional Comparison of Toolkits for Machine Learning with Big Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

TauJud: test augmentation of machine learning in judicial documents

ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Big Data Processing using Machine Learning algorithms: MLlib and Mahout Use Case

Machine learning on big data

A Multi-dimensional Comparison of Toolkits for Machine Learning with Big Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media