SpPin and SnNout Are Not Enough. It’s Time to Fully Embrace Likelihood Ratios and Probabilistic Reasoning to Achieve Diagnostic Excellence

Fischer, Brett G.; Evans, Arthur T.

doi:10.1007/s11606-023-08177-5

SpPin and SnNout Are Not Enough. It’s Time to Fully Embrace Likelihood Ratios and Probabilistic Reasoning to Achieve Diagnostic Excellence

Viewpoint
Published: 03 April 2023

Volume 38, pages 2202–2204, (2023)
Cite this article

Download PDF

Journal of General Internal Medicine Aims and scope Submit manuscript

SpPin and SnNout Are Not Enough. It’s Time to Fully Embrace Likelihood Ratios and Probabilistic Reasoning to Achieve Diagnostic Excellence

Download PDF

Brett G. Fischer MD¹ &
Arthur T. Evans MD, MPH¹

3131 Accesses
4 Citations
14 Altmetric
Explore all metrics

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

INTRODUCTION

Diagnostic error adversely affects patients and healthcare systems. To achieve diagnostic excellence, correct diagnostic test interpretation is a prerequisite. Two related heuristics designed to help interpret diagnostic tests—SpPin and SnNout—have been taught for decades by leaders in the field of evidence-based medicine.^1,2 SpPin indicates that when Specificity is high, a Positive result rules in the disease in question, and SnNout indicates that when Sensitivity is high, a Negative result rules out the disease in question. Our experience over years teaching diagnostic reasoning to hundreds of medicine residents and faculty at eight academic medical centers is that these heuristics are universally known and frequently relied upon to evaluate the utility of diagnostic tests.

Unfortunately, relying on SpPin and SnNout can be maladaptive, increasing diagnostic error. Previous publications warning about limitations of SpPin and SnNout have focused on data quality issues (risk of bias, imprecision, and generalizability) or have used complicated formulas many find difficult to understand.^3,4 This paper improves upon the existing literature, using simple examples without formulas to illustrate the limitations of SpPin and SnNout that exist even when data for test characteristics are of high quality (large representative sample with low risk of bias). In addition, we demonstrate that to effectively evaluate the utility of diagnostic tests, one must rely on likelihood ratios interpreted in the context of pretest probability, rather than rely on these heuristics.

THE ORIGINS OF SPPIN AND SNNOUT

The SnNout heuristic was conceived over three decades ago in the context of a test that was reported to have 100% sensitivity.¹ Sensitivity is the probability of a positive test result among patients with the disease in question. Specificity is the probability of a negative test result among patients without the disease in question, and the SpPin heuristic was subsequently conceived as a counterpart to SnNout.¹ SpPin and SnNout are guaranteed to work when the corresponding test characteristic is 100%. However, the use of the heuristics has expanded over time to include tests with sensitivity or specificity less than 100%, but with values still considered “high.”^1,2,3

WHAT CONSTITUTES “RULING IN” AND “RULING OUT”?

When tests are less than 100% accurate (which is almost always), residual diagnostic uncertainty will exist. As a practical matter, we consider a disease ruled out when its probability is less than some low threshold (justifying abandonment of further testing for that disease) and ruled in when its probability is greater than some high threshold (justifying initiation of treatment for that disease without further testing).^1,2 Therefore, a test’s utility for ruling in or ruling out disease depends on a patient’s posttest probability.

THE PROBLEMS WITH SPPIN AND SNNOUT

1) Neither Sensitivity Nor Specificity Should Be Considered in Isolation of the Other

Correctly assessing how a test result changes probability of disease requires information about test performance in patients both with and without the disease in question. Neither sensitivity nor specificity contains that information. Likelihood ratios do.

The likelihood ratio (LR) for a given test result is the probability of that result among patients with the disease in question divided by the probability of the same result among patients without that disease. When LR = 1, the result is equally likely in both groups and does not affect probability of disease (pretest probability = posttest probability). When LR > 1, probability of disease increases, and when LR < 1, probability of disease decreases. The further away from one in either direction, the greater the change in probability, with possible values for LR ranging from zero to infinity.

The following exercise illustrates the importance of relying on LR rather than sensitivity or specificity alone. Consider test characteristics of three available tests—A, B, and C—for a certain disease (Table 1). According to SpPin and SnNout, test A is best for ruling in disease (highest specificity) and test B is best for ruling out disease (highest sensitivity). In truth, test C is best at both because it has both the largest and smallest LR and thus generates the highest posttest probability when positive and lowest posttest probability when negative. SpPin and SnNout get it wrong.

Table 1 Likelihood Ratios* Are Superior to SpPin and SnNout—an Illustrative Example

Full size table

A second exercise, using a real-world example, is further illuminating. Kernig’s sign has 5% sensitivity and 95% specificity for meningitis.⁵ According to SpPin and SnNout, this test can rule in meningitis when positive, but cannot rule out meningitis when negative. In truth, because the LR for a positive result (LR+) = 1 and the LR for a negative result (LR−) = 1, probability of meningitis does not change with either result. For any dichotomous test, when sensitivity + specificity = 100%, that test is unhelpful.

2) Pretest Probability Matters

When sensitivity and specificity are both high, SpPin and SnNout are still unreliable, particularly when a patient’s pretest probability is far from the rule-in (for SpPin) or rule-out (for SnNout) threshold. A classic example is a positive HIV antibody test in a patient with a very low pretest probability of HIV (1 in 10,000). Even if specificity = 99.8%—a seemingly clear example of SpPin—and sensitivity = 100% (LR+ = 500), posttest probability after a positive test is just 5%, which is clearly inadequate to rule in HIV and initiate treatment.⁶

3) Most Tests Are Not Truly Dichotomous

Sensitivity and specificity are numbers that imply there are only two possible test results. Such tests are rare in the real world. Physical exam maneuvers and imaging studies tend to be ordinal, with results such as “negative,” “indeterminate,” and “positive” (e.g., chest X-ray for pneumonia), and blood tests tend to be continuous, with essentially infinite possible results (e.g., B-type natriuretic peptide for heart failure).

Dichotomizing non-dichotomous tests introduces measurement error and leads to mistakes. The solution is to instead use multilevel LRs to maximize a test’s utility.⁷ For example, in a recent study evaluating ultrasound measurement of jugular venous pressure for diagnosis of elevated central venous pressure, authors dichotomized the test and reported a sensitivity of 73% and specificity of 79% (LR+ = 3.4, LR− = 0.34). While this test would not be considered very helpful according to SpPin or SnNout, a more useful reanalysis demonstrated six distinct levels of test results with unique LRs that ranged from zero to infinity.⁸

OUT WITH THE OLD RULE, IN WITH THE OLDER RULE

Bayes’ Rule: Pretest Odds x Likelihood Ratio = Posttest Odds^1,7

Bayes’ rule considers test performance in patients both with and without the disease in question, incorporates pretest probability, does not require dichotomization, and allows for easy comparison between posttest probability and decision-making thresholds. While the advantages of this approach have long been recognized, including by evidence-based medicine experts who simultaneously taught SpPin and SnNout,^1,2 it seems that most learners over the past several decades have retained only the heuristics, perhaps due to their simplicity. Fortunately, with the availability of a handy nomogram^1,2 and more recently, online calculators (e.g., https://sample-size.net/post-probability-calculator-test-new/), clinicians need not worry about memorizing formulas, converting between probability and odds, or making any calculations on their own.

LIMITATIONS TO OUR APPROACH

First, when data quality issues are present, LR estimates will be unreliable. However, the same limitations will apply to SpPin and SnNout,³ and Bayes’ rule will still improve upon the heuristics by incorporating pretest probability. Second, accurately estimating a patient’s pretest probability can be difficult. Likewise, finding the correct LR for a test result can be difficult because diagnostic accuracy studies often report dichotomized test characteristics for non-dichotomous tests. However, previously described strategies for estimating pretest probability and for using multiple levels to interpret data from diagnostic accuracy studies can be used to help overcome these challenges.^1,2,7

AN ALTERNATIVE TO OUR APPROACH: THE LIKELIHOOD RATIO HEURISTIC

When teaching diagnostic test interpretation and Bayes’ rule, some evidence-based medicine experts have promoted an alternative heuristic that goes something like this: LRs greater than 10 or less than 0.1 are very powerful and often conclusive; LRs ranging from 5 to 10 or 0.1 to 0.2 have a moderate effect on probability; LRs ranging from 2 to 5 or 0.2 to 0.5 have a small effect on probability; and LRs ranging from 0.5 to 2 are rarely helpful.⁹ We agree that it can be useful for learners to get a feel for the impact of different LRs in order to develop an innate sense of how “good” a test result is based on its LR. However, it is important to contextualize the utility of LRs in terms of the varied magnitudes of effect they will have at different pretest probabilities, the potential availability of other independent tests, and decision-making thresholds.¹ Even tests with very modest LRs can appropriately change patient management, depending on these other factors.

CONCLUSION

While conceived as well-intentioned teaching tools, given their multiple flaws, it’s time to retire the heuristics SpPin and SnNout. A reinvigorated emphasis on considering pretest probability and using multilevel LRs with Bayes’ rule is needed in medical education at all levels, as part of the greater effort in healthcare to achieve diagnostic excellence.

References

Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Chapter 4: The interpretation of diagnostic data. In: Sackett DL, Haynes RB, Guyatt GH, Tugwell P, eds. Clinical Epidemioogy: a Basic Science for Clinical Medicine. 2^nd ed. Boston: Little, Brown and Company; 1991:69–152
Google Scholar
Strauss SE, Glasziou P, Richardson WS, Haynes RB. Chapter 5: Diagnosis and screening. In: Strauss SE, Glasziou P, Richardson WS, Haynes RB, eds. Evidence-Based Medicine: How to Practice and Teach EBM. 5^th ed. Edinburgh: Elsevier; 2019:163–75.
Google Scholar
Pewsner D, Battaglia M, Minder C, Marx A, Bucher HC, Egger M. Ruling a diagnosis in or out with “SpPIn” and “SnNOut”: a note of caution. BMJ. 2004;329(7459):209–13.
Article PubMed PubMed Central Google Scholar
Baeyens JP, Serrien B, Goossens M, Clijsen R. Questioning the “SPIN and SNOUT” rule in clinical testing. Arch Physiother. 2019;9:4.
Article PubMed PubMed Central Google Scholar
Thomas KE, Hasbun R, Jekel J, Quagliarello VJ. The diagnostic accuracy of Kernig’s sign, Brudzinski’s sign, and nuchal rigidity in adults with suspected meningitis. Clin Infect Dis. 2002;35(1):46–52.
Article PubMed Google Scholar
Kim S, Lee JH, Choi JY, Kim JM, Kim HS. False-positive rate of a “fourth-generation” HIV antigen/antibody combination assay in an area of low HIV prevalence. Clin Vaccine Immunol. 2010;17(10):1642–4.
Article CAS PubMed PubMed Central Google Scholar
Baduashvili A, Guyatt G, Evans AT. ROC anatomy—getting the most out of your diagnostic test. J Gen Intern Med. 2019;34(9):1892–8
Article PubMed PubMed Central Google Scholar
Fischer BG. Accuracy of ultrasound jugular venous pressure height in predicting central venous congestion. Ann Intern Med. 2022;175(5):W53.
Article PubMed Google Scholar
Furukawa TA, Strauss SE, Bucher HC, Agoritsas T, Guyatt G. Chapter 18: Diagnostic tests. In: Guyatt G, Rennie D, Meade MO, Cook DJ, eds. Users’ Guides to the Medical Literature: a Manual for Evidence-Based Clinical Practice. 3rd ed. New York: McGraw-Hill Education; 2015:351

Download references

Acknowledgements:

We thank the peer reviewers for their time, effort, and valuable comments and suggestions, which helped us improve the quality of this work.

Author information

Authors and Affiliations

Weill Department of Medicine, Weill Cornell Medicine, New York, NY, USA
Brett G. Fischer MD & Arthur T. Evans MD, MPH

Authors

Brett G. Fischer MD
View author publications
You can also search for this author in PubMed Google Scholar
Arthur T. Evans MD, MPH
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brett G. Fischer MD.

Ethics declarations

Conflict of Interest:

The authors declare that they do not have a conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fischer, B.G., Evans, A.T. SpPin and SnNout Are Not Enough. It’s Time to Fully Embrace Likelihood Ratios and Probabilistic Reasoning to Achieve Diagnostic Excellence. J GEN INTERN MED 38, 2202–2204 (2023). https://doi.org/10.1007/s11606-023-08177-5

Download citation

Received: 06 January 2023
Accepted: 13 March 2023
Published: 03 April 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11606-023-08177-5

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

SpPin and SnNout Are Not Enough. It’s Time to Fully Embrace Likelihood Ratios and Probabilistic Reasoning to Achieve Diagnostic Excellence

INTRODUCTION

THE ORIGINS OF SPPIN AND SNNOUT

WHAT CONSTITUTES “RULING IN” AND “RULING OUT”?

THE PROBLEMS WITH SPPIN AND SNNOUT

1) Neither Sensitivity Nor Specificity Should Be Considered in Isolation of the Other

2) Pretest Probability Matters

3) Most Tests Are Not Truly Dichotomous

OUT WITH THE OLD RULE, IN WITH THE OLDER RULE

Bayes’ Rule: Pretest Odds x Likelihood Ratio = Posttest Odds1,7

LIMITATIONS TO OUR APPROACH

AN ALTERNATIVE TO OUR APPROACH: THE LIKELIHOOD RATIO HEURISTIC

CONCLUSION

References

Acknowledgements:

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest:

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation

Bayes’ Rule: Pretest Odds x Likelihood Ratio = Posttest Odds^1,7