skip to main content
10.1145/3520304.3528932acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
poster

GA-auto-PU: a genetic algorithm-based automated machine learning system for positive-unlabeled learning

Published:19 July 2022Publication History

ABSTRACT

Positive-Unlabeled (PU) learning is a growing field of machine learning that now consists of numerous algorithms; the number is now so large that considering an extensive manual search to select the best algorithm for a given task is impractical. As such, the area of PU learning could benefit from an Automated Machine Learning (Auto-ML) system, which selects the best algorithm for a given input dataset, among a pre-defined set of candidate algorithms. This work proposes such with GA-Auto-PU, a Genetic Algorithm-based Auto-ML system that can generate PU learning algorithms. Experiments with 20 real-world datasets show that GA-Auto-PU significantly outperformed a state-of-the-art PU learning method.

References

  1. Bekker, J. and Davis, J., 2020. Learning from positive and unlabeled data: A survey. Machine Learning, 109(4), 719--760.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Elkan, C. and Noto, K., 2008. Learning classifiers from only positive and unlabeled data. In Proc. 14th ACM SIGKDD Intern. Conf. on Knowledge Discovery and Data Mining, 213--220.Google ScholarGoogle Scholar
  3. Li, X. and Liu, B., 2003. Learning to classify texts using positive and unlabeled data. In Proc 18th Int. Joint Conf. on Artif. Intel.. 3, 587--592.Google ScholarGoogle Scholar
  4. Zheng, Y., et al., 2019. DDI-PULearn: a positive-unlabeled learning method for large-scale prediction of drug-drug interactions. BMC Bioinformatics, 20(19), 1--12.Google ScholarGoogle Scholar
  5. He, X, Zhao, K. and Chu, X., 2021. AutoML: A Survey of the State-of-the-Art. Knowledge-Based Systems, 212, article 106622.Google ScholarGoogle Scholar
  6. Brazdil, P., et al., 2008. Metalearning: Applications to data mining. Springer.Google ScholarGoogle Scholar
  7. Olson, R.S., et al., 2016, July. Evaluation of a tree-based pipeline optimization tool for automating data science. In Proc. of the Genetic and Evolutionary Computation Conf. 2016, 485--492.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. de Sá, A.G., et al., 2017. RECIPE: a grammar-based framework for automatically evolving classification pipelines. Proc. European Conf. on Genetic Programming, 246--261.Google ScholarGoogle ScholarCross RefCross Ref
  9. Freitas, A.A., 2004. A critical review of multi-objective optimization in data mining: a position paper. ACM SIGKDD Explorations Newsletter, 6(2), 77--86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ellis, P.D., 2010. The essential guide to effect sizes. CUP.Google ScholarGoogle Scholar
  11. Asuncion, A., Newman, D., 2007. UCI machine learning repository.Google ScholarGoogle Scholar
  12. Marcus, D.S. et al., 2010. Open access series of imaging studies: longitudinal MRI data in nondemented and demented older adults. Journal of Cognitive Neuroscience, 22(12), 2677--2684.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Pereira, B., et al., 2016. The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes. Nature Communications, 7(1), 1--16.Google ScholarGoogle Scholar
  14. Fleming, T.R. and Harrington, D.P., 1991. Counting Processes and Survival Analysis. John Wiley and Sons.Google ScholarGoogle Scholar
  15. Islam, M.F., et al, 2020. Likelihood prediction of diabetes at early stage using data mining techniques. In Computer Vision and Machine Intelligence in Medical Image Analysis, 113--125.Google ScholarGoogle Scholar
  16. Chicco, D., Jurman, G., 2020. Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics and Decision Making, 20(1), 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  17. Hlavnička, J. et al., 2017. Automated analysis of connected speech reveals early biomarkers of Parkinson's disease in patients with rapid eye movement sleep behaviour disorder. Scientific Reports, 7(1), 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  18. Emon, M.U., et al. 2020. Performance Analysis of Machine Learning Approaches in Stroke Prediction. In Proc. 4th Intern. Conf. on Electronics, Communic. and Aerospace Technol. (ICECA), 1464--1469.Google ScholarGoogle ScholarCross RefCross Ref
  19. Zeng, X., et al., 2020. Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods. Briefings in Bioinformatics, 21(4), 1425--1436.Google ScholarGoogle ScholarCross RefCross Ref
  20. Wilcoxon, F., et al., 1963. Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Selected tables in mathematical statistics, 1, 171--259.Google ScholarGoogle Scholar

Index Terms

  1. GA-auto-PU: a genetic algorithm-based automated machine learning system for positive-unlabeled learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GECCO '22: Proceedings of the Genetic and Evolutionary Computation Conference Companion
      July 2022
      2395 pages
      ISBN:9781450392686
      DOI:10.1145/3520304

      Copyright © 2022 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2022

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,669of4,410submissions,38%

      Upcoming Conference

      GECCO '24
      Genetic and Evolutionary Computation Conference
      July 14 - 18, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader