The curious case of the test set AUROC
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Roberts, M https://orcid.org/0000-0002-3484-5031
Hazan, A
Dittmer, S https://orcid.org/0000-0003-2919-4956
Rudd, JHF https://orcid.org/0000-0003-2243-3117
Schönlieb, CB
Abstract
Whilst the size and complexity of ML mod- els have rapidly and significantly increased over the past decade, the methods for assess- ing their performance have not kept pace. In particular, among the many potential per- formance metrics, the ML community stub- bornly continues to use (a) the area under the receiver operating characteristic curve (AUROC) for a validation and test cohort (distinct from training data) or (b) the sensitivity and specificity for the test data at an optimal threshold determined from the validation ROC.
Description
Keywords
46 Information and Computing Sciences, 40 Engineering, Machine Learning and Artificial Intelligence, 3 Good Health and Well Being
Journal Title
Nature Machine Intelligence
Conference Name
Journal ISSN
2522-5839
2522-5839
2522-5839
Volume Title
Publisher
Springer Science and Business Media LLC
Publisher DOI
Sponsorship
EPSRC (EP/T017961/1)
Engineering and Physical Sciences Research Council (EP/N014588/1)
Engineering and Physical Sciences Research Council (EP/N014588/1)