ABSTRACT
The booming of big data makes the adoption of machine learning ubiquitous in the legal field. As we all know, a large amount of test data can better reflect the performance of the model, so the test data must be naturally expanded. In order to solve the high cost problem of labeling data in natural language processing, people in the industry have improved the performance of text classification tasks through simple data amplification techniques. However, the data amplification requirements in the judgment documents are interpretable and logical, as observed from CAIL2018 test data with over 200,000 judicial documents. Therefore, we have designed a test augmentation tool called TauJud specifically for generating more effective test data with uniform distribution over time and location for model evaluation and save time in marking data. The demo can be found at https://github.com/governormars/TauJud.
- Daniel Berrar and Werner Dubitzky. 2019. Should significance testing be abandoned in machine learning? IJDSA 7, 4 ( 2019 ), 247-257.Google Scholar
- Reuben Binns. [n.d.]. Fairness in Machine Learning: Lessons from Political Philosophy. ([n. d.]). arXiv:1712.03586 http://arxiv.org/abs/1712.03586Google Scholar
- Sahaj Garg, Vincent Perot, Nicole Limtiaco, Ankur Taly, Ed H. Chi, and Alex Beutel. [n.d.]. Counterfactual Fairness in Text Classification through Robustness. ([n. d.]). arXiv: 1809.10610 http://arxiv.org/abs/ 1809.10610Google Scholar
- Zichen Guo, Tieke He, Zemin Qin, Zicong Xie, and Jia Liu. 2019. A Content-Based Recommendation Framework for Judicial Cases. In ICPCSEE. Springer, 76-88.Google Scholar
- Tie-Ke He, Hao Lian, Ze-Min Qin, Zhen-Yu Chen, and Bin Luo. 2018. PTM: A Topic Model for the Inferring of the Penalty. JCST 33, 4 ( 2018 ), 756-767.Google Scholar
- Michael Kamp. 2019. Black-Box Parallelization for Machine Learning. Ph.D. Dissertation. Universitäts-und Landesbibliothek Bonn.Google Scholar
- Stuart Lottier. [n.d.]. Distribution of Criminal Ofenses in Metropolitan Regions. 29, 1 ([n. d.]), 37. https://doi.org/10.2307/1137347 Google ScholarCross Ref
- Jason Wei and Kai Zou. [n.d.]. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. ([n. d.]). arXiv: 1901.11196 http://arxiv.org/abs/ 1901.11196Google Scholar
- Chaojun Xiao, Haoxi Zhong, Zhipeng Guo, et al. 2018. Cail2018: A large-scale legal dataset for judgment prediction. arXiv preprint arXiv: 1807. 02478 ( 2018 ).Google Scholar
- Zihuan Xu, Tieke He, Hao Lian, Jiabing Wan, and Hui Wang. 2019. Case Facts Analysis Method Based on Deep Learning. In WISA. Springer, 92-97.Google Scholar
- Ge Yan, Yu Li, Shu Zhang, and Zhenyu Chen. 2019. Data Augmentation for Deep Learning of Judgment Documents. In IScIDE. Springer, 232-242.Google Scholar
- Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V. Le. [n.d.]. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. ([n. d.]). arXiv: 1804.09541 http://arxiv.org/abs/ 1804.09541Google Scholar
Index Terms
- TauJud: test augmentation of machine learning in judicial documents
Recommendations
Big Data Processing using Machine Learning algorithms: MLlib and Mahout Use Case
SITA'18: Proceedings of the 12th International Conference on Intelligent Systems: Theories and ApplicationsMachine learning is a field within artificial intelligence that allows machines to learn on their own from existing information to make predictions or/and decisions. There are three main categories of machine learning techniques: Collaborative filtering ...
Machine learning on big data
Machine learning (ML) is continuously unleashing its power in a wide range of applications. It has been pushed to the forefront in recent years partly owing to the advent of big data. ML algorithms have never been better promised while challenged by big ...
A Multi-dimensional Comparison of Toolkits for Machine Learning with Big Data
IRI '15: Proceedings of the 2015 IEEE International Conference on Information Reuse and IntegrationBig data is a big business, and effective modeling of this data is key. This paper provides a comprehensive multidimensional analysis of various open source tools for machine learning with big data. An evaluation standard is proposed along with detailed ...
Comments