The curious case of the test set AUROC

Whilst the size and complexity of ML mod- els have rapidly and significantly increased over the past decade, the methods for assess- ing their performance have not kept pace. In particular, among the many potential per- formance metrics, the ML community stub- bornly continues to use (a) the area under the receiver operating characteristic curve (AUROC) for a validation and test cohort (distinct from training data) or (b) the sensitivity and specificity for the test data at an optimal threshold determined from the validation ROC.

Keywords

46 Information and Computing Sciences, 40 Engineering, Machine Learning and Artificial Intelligence, 3 Good Health and Well Being

Journal Title

Nature Machine Intelligence

Journal ISSN

2522-5839
2522-5839

Publisher

Springer Science and Business Media LLC

Publisher DOI

https://doi.org/10.1038/s42256-024-00817-7

Rights

Attribution 4.0 International

Sponsorship

EPSRC (EP/T017961/1)
Engineering and Physical Sciences Research Council (EP/N014588/1)

Collections

University of Cambridge Research Outputs (Articles and Conferences)