ABSTRACT
Positive-Unlabeled (PU) learning is a growing field of machine learning that now consists of numerous algorithms; the number is now so large that considering an extensive manual search to select the best algorithm for a given task is impractical. As such, the area of PU learning could benefit from an Automated Machine Learning (Auto-ML) system, which selects the best algorithm for a given input dataset, among a pre-defined set of candidate algorithms. This work proposes such with GA-Auto-PU, a Genetic Algorithm-based Auto-ML system that can generate PU learning algorithms. Experiments with 20 real-world datasets show that GA-Auto-PU significantly outperformed a state-of-the-art PU learning method.
- Bekker, J. and Davis, J., 2020. Learning from positive and unlabeled data: A survey. Machine Learning, 109(4), 719--760.Google ScholarDigital Library
- Elkan, C. and Noto, K., 2008. Learning classifiers from only positive and unlabeled data. In Proc. 14th ACM SIGKDD Intern. Conf. on Knowledge Discovery and Data Mining, 213--220.Google Scholar
- Li, X. and Liu, B., 2003. Learning to classify texts using positive and unlabeled data. In Proc 18th Int. Joint Conf. on Artif. Intel.. 3, 587--592.Google Scholar
- Zheng, Y., et al., 2019. DDI-PULearn: a positive-unlabeled learning method for large-scale prediction of drug-drug interactions. BMC Bioinformatics, 20(19), 1--12.Google Scholar
- He, X, Zhao, K. and Chu, X., 2021. AutoML: A Survey of the State-of-the-Art. Knowledge-Based Systems, 212, article 106622.Google Scholar
- Brazdil, P., et al., 2008. Metalearning: Applications to data mining. Springer.Google Scholar
- Olson, R.S., et al., 2016, July. Evaluation of a tree-based pipeline optimization tool for automating data science. In Proc. of the Genetic and Evolutionary Computation Conf. 2016, 485--492.Google ScholarDigital Library
- de Sá, A.G., et al., 2017. RECIPE: a grammar-based framework for automatically evolving classification pipelines. Proc. European Conf. on Genetic Programming, 246--261.Google ScholarCross Ref
- Freitas, A.A., 2004. A critical review of multi-objective optimization in data mining: a position paper. ACM SIGKDD Explorations Newsletter, 6(2), 77--86.Google ScholarDigital Library
- Ellis, P.D., 2010. The essential guide to effect sizes. CUP.Google Scholar
- Asuncion, A., Newman, D., 2007. UCI machine learning repository.Google Scholar
- Marcus, D.S. et al., 2010. Open access series of imaging studies: longitudinal MRI data in nondemented and demented older adults. Journal of Cognitive Neuroscience, 22(12), 2677--2684.Google ScholarDigital Library
- Pereira, B., et al., 2016. The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes. Nature Communications, 7(1), 1--16.Google Scholar
- Fleming, T.R. and Harrington, D.P., 1991. Counting Processes and Survival Analysis. John Wiley and Sons.Google Scholar
- Islam, M.F., et al, 2020. Likelihood prediction of diabetes at early stage using data mining techniques. In Computer Vision and Machine Intelligence in Medical Image Analysis, 113--125.Google Scholar
- Chicco, D., Jurman, G., 2020. Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics and Decision Making, 20(1), 1--16.Google ScholarCross Ref
- Hlavnička, J. et al., 2017. Automated analysis of connected speech reveals early biomarkers of Parkinson's disease in patients with rapid eye movement sleep behaviour disorder. Scientific Reports, 7(1), 1--13.Google ScholarCross Ref
- Emon, M.U., et al. 2020. Performance Analysis of Machine Learning Approaches in Stroke Prediction. In Proc. 4th Intern. Conf. on Electronics, Communic. and Aerospace Technol. (ICECA), 1464--1469.Google ScholarCross Ref
- Zeng, X., et al., 2020. Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods. Briefings in Bioinformatics, 21(4), 1425--1436.Google ScholarCross Ref
- Wilcoxon, F., et al., 1963. Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Selected tables in mathematical statistics, 1, 171--259.Google Scholar
Index Terms
- GA-auto-PU: a genetic algorithm-based automated machine learning system for positive-unlabeled learning
Recommendations
A Knowledge-Intensive Genetic Algorithm for Supervised Learning
Special issue on genetic algorithmsSupervised learning in attribute-based spaces is one of the most popular machine learning problems studied and, consequently, has attracted considerable attention of the genetic algorithm community. The full-memory approach developed here uses the same ...
Fuzzy-UCS: preliminary results
GECCO '07: Proceedings of the 9th annual conference companion on Genetic and evolutionary computationThis paper presents Fuzzy-UCS, a Michigan-style Learning Fuzzy-Classifier System designed for supervised learning tasks. Fuzzy-UCS combines the generalization capabilities of UCS with the good interpretability of fuzzy rules to evolve highly accurate ...
Gradient Bias to Solve the Generalization Limit of Genetic Algorithms Through Hybridization with Reinforcement Learning
Machine Learning, Optimization, and Data ScienceAbstractGenetic Algorithms have recently been successfully applied to the Machine Learning framework, being able to train autonomous agents and proving to be valid alternatives to state-of-the-art Reinforcement Learning techniques. Their attractiveness ...
Comments