Abstract
In analysis of binary outcomes, the receiver operator characteristic (ROC) curve is heavily used to show the performance of a model or algorithm. The ROC curve is informative about the performance over a series of thresholds and can be summarized by the area under the curve (AUC), a single number. When a predictor is categorical, the ROC curve has one less than number of categories as potential thresholds; when the predictor is binary, there is only one threshold. As the AUC may be used in decision-making processes on determining the best model, it important to discuss how it agrees with the intuition from the ROC curve. We discuss how the interpolation of the curve between thresholds with binary predictors can largely change the AUC. Overall, we show using a linear interpolation from the ROC curve with binary predictors corresponds to the estimated AUC, which is most commonly done in software, which we believe can lead to misleading results. We compare R, Python, Stata, and SAS software implementations. We recommend using reporting the interpolation used and discuss the merit of using the step function interpolator, also referred to as the “pessimistic” approach by Fawcett (2006).




Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Allaire, J.J., Ushey, K., Tang, Y. (2018). Reticulate: interface to ‘Python’. https://github.com/rstudio/reticulate.
Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12(4), 387–415.
Blumberg, D.M., De Moraes, C.G., Liebmann, J.M., Garg, R., Chen, C., Theventhiran, A., Hood, D.C. (2016). Technology and the glaucoma suspect. Investigative Ophthalmology & Visual Science, 57(9), OCT80–OCT85.
Budwega, J., Sprengerb, T., De Vere-Tyndall, A., Hagenkordd, A., Stippichd, C., Bergera, C.T. (2016). Factors associated with significant MRI findings in medical walk-in patients with acute headache. Swiss Medical Weekly, 146, w14349.
DeLong, E.R, DeLong, D.M, Clarke-Pearson, D.L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 837–45.
Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861–74.
Glaveckaite, S., Valeviciene, N., Palionis, D., Skorniakov, V., Celutkiene, J., Tamosiunas, A., Uzdavinys, G., Laucevicius, A. (2011). Value of scar imaging and inotropic reserve combination for the prediction of segmental and global left ventricular functional recovery after revascularisation. Journal of Cardiovascular Magnetic Resonance, 13(1), 35.
Hanley, J.A, & McNeil, B.J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36.
Hsu, Y.-C., & Lieli, R. (2014). Inference for ROC curves based on estimated predictive indices: a note on testing AUC = 0.5. Unpublished Manuscript.
Hunter, J.D. (2007). Matplotlib: a 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55.
Kushnir, V.A, Darmon, S.K, Barad, D.H, Gleicher, N. (2018). Degree of mosaicism in trophectoderm does not predict pregnancy potential: a corrected analysis of pregnancy outcomes following transfer of mosaic embryos. Reproductive Biology and Endocrinology, 16(1), 6.
Litvin, TV, Bresnick, GH, Cuadros, JA, Selvin, S, Kanai, K, Ozawa, GY. (2017). A revised approach for the detection of sight-threatening diabetic macular edema. JAMA Ophthalmology, 135(1), 62–68. https://doi.org/10.1001/jamaophthalmol.2016.4772.
Maverakis, E., Ma, C., Shinkai, K., et al. (2018). Diagnostic criteria of ulcerative pyoderma gangrenosum: a Delphi consensus of international experts. JAMA Dermatology, 154(4), 461–66. https://doi.org/10.1001/jamadermatol.2017.5980.
Mwipatayi, B.P, Sharma, S., Daneshmand, A., Thomas, S.D, Vijayan, V., Altaf, N., Garbowski, M., et al. (2016). Durability of the balloon-expandable covered versus bare-metal stents in the covered versus balloon expandable stent trial (COBEST) for the treatment of aortoiliac occlusive disease. Journal of Vascular Surgery, 64(1), 83–94.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., et al. (2011). Scikit-learn: machine learning in python. Journal of Machine Learning Research, 12, 2825–30.
Pepe, M., Longton, G., Janes, H. (2009). Estimation and comparison of receiver operating characteristic curves. The Stata Journal, 9(1), 1.
Peter, E. (2016). Fbroc: fast algorithms to bootstrap receiver operating characteristics curves. https://CRAN.R-project.org/package=fbroc.
R Core Team. (2018). R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., Müller, M. (2011). pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12, 77.
Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS One, 10(3), e0118432.
SAS, S.A.S., & Version, S.T.A.T. (2017). 9.4 [Computer program]. Cary, NC:SAS Institute.
Shterev, I.D, Dunson, D.B, Chan, C., Sempowski, G.D. (2018). Bayesian multi-plate high-throughput screening of compounds. Scientific Reports, 8(1), 9551.
Sing, T, Sander, O, Beerenwinkel, N, Lengauer, T. (2005). ROCR: visualizing classifier performance R. Bioinformatics, 21(20), 7881. http://rocr.bioinf.mpi-sb.mpg.de.
Snarr, B.S, Liu, M.Y, Zuckerberg, J.C, Falkensammer, C.B, Nadaraj, S., Burstein, D., Ho, D., et al. (2017). The parasternal short-axis view improves diagnostic accuracy for inferior sinus venosus type of atrial septal defects by transthoracic echocardiography. Journal of the American Society of Echocardiography, 30(3), 209–15.
Stata, S. (2013). Release 13. Statistical software. StataCorp LP, College Station, TX.
Tuszynski, J. (2018). caTools: Tools: Moving Window Statistics, GIF, Base64, ROC AUC, Etc. https://CRAN.R-project.org/package=caTools.
Veltri, D., Kamath, U., Shehu, A. (2018). Deep learning improves antimicrobial peptide recognition. Bioinformatics, 1, 8.
Xiong, X., Li, Q., Yang, W.-S., Wei, X., Hu, X., Wang, X.-C., Zhu, D., Li, R., Cao, D., Xie, P. (2018). Comparison of swirl sign and black hole sign in predicting early hematoma growth in patients with spontaneous intracerebral hemorrhage. Medical Science Monitor: International Medical Journal of Experimental and Clinical Research, 24, 567.
Funding
This analysis was supported by NIH Grants R01NS060910 and U01NS080824.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Muschelli, J. ROC and AUC with a Binary Predictor: a Potentially Misleading Metric. J Classif 37, 696–708 (2020). https://doi.org/10.1007/s00357-019-09345-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-019-09345-1