Estimation of sensitivity and specificity of diagnostic tests and disease prevalence when the true disease state is unknown
Introduction
The sensitivity and specificity of a test are usually determined by comparison with a reference test (often referred to as a “gold standard”), which is supposed to determine the true disease state of the animals unambiguously (Office International des Epizooties, 1996; Greiner and Gardner, 2000). When a gold standard is available, sensitivity and specificity can be estimated directly (Kraemer, 1992). The true disease state, however, is rarely known in practice, because perfect test results may be difficult or impossible to obtain (Tyler and Cullor, 1989).
If classification errors in the reference test are ignored, serious bias may be introduced in the assessment of the accuracies of the new test (Staquet et al., 1981; Valenstein, 1990). However, when the error probabilities of the reference test are known, it is possible to obtain unbiased estimates of the accuracies of the test in question (Gart and Buck, 1966; Staquet et al., 1981). The estimation is based on the assumption that the classification errors in the reference and the new test are independent, conditional on the true disease state. However, estimation is possible even when conditional independence is not assumed (Thibodeau, 1981).
Hui and Walter (1980) considered the case where two tests (both with unknown sensitivity and specificity) were simultaneously applied to individuals from two populations with different prevalences of disease. They showed that sensitivity and specificity of both tests (assuming conditional independence) — as well as true prevalence in both populations — could be estimated by maximum likelihood (ML). A thorough discussion of the applicability of the method in other settings (such as the case with one-population and three or more tests) is given by Walter and Irwig (1988). Bayesian methodology has also been used for the model proposed by Hui and Walter and ones like it (Joseph et al., 1995; Johnson et al., 2000). Computations are accomplished by Gibbs sampling (Gelfand and Smith, 1990). Hui and Zhou (1998) presented an overview of available methods for diagnostic test evaluation with an emphasis on methodology for estimation of sensitivity and specificity (without need for the assumption of conditional independence).
Although not widely adopted, the Hui and Walter model (and models similar to it) has been applied in statistical research (McClish and Quade, 1985; Vacek, 1985; Ashton and Moeschberger, 1988; Walter and Irwig, 1988; Qu et al., 1996; Sinclair and Gastwirth, 1996; Weng, 1996; Torrance-Rynard and Walter, 1997; Johnson and Pearson, 1999; Johnson et al., 2000) and in human medical science (van Ulsen et al., 1986; Shaw et al., 1987; Walter et al., 1991; de Bock et al., 1994; Faraone and Tsuang, 1994; Faraone et al., 1996; Line et al., 1997; McDermott et al., 1997; Mahoney et al., 1998; Rybicki et al., 1998). The methods have been introduced only recently in the evaluation of diagnostic tests used for detection of animal disease (Spangler et al., 1992; Agger et al., 1997; Chriel and Willeberg, 1997; Enøe et al., 1997; Sørensen et al., 1997; Willeberg et al., 1997; Georgiadis et al., 1998; Singer et al., 1998).
In this paper, we describe methods of estimating the sensitivity and specificity of diagnostic tests and disease prevalence when the true disease state is unknown. We present methods beginning with the case where an imperfect reference test is available, and we ultimately give special emphasis to the model and methods described by Hui and Walter (1980). Methods are illustrated using data from Georgiadis et al. (1998).
Section snippets
Reference test with known sensitivity and specificity
Consider the case where the true disease state of animals cannot be determined perfectly, but where the sensitivity (SeR) and the specificity (SpR) of a reference test are presumed known. When each individual animal in a random sample of size n is tested by a new diagnostic test and a reference test, four outcomes are possible: both tests positive (T1+, T2+; denoted a); one test positive and one negative (T1+, T2−; denoted b) and (T1−, T2+; denoted c); both tests negative (T1−, T2−; denoted d).
Methods of estimation and computational techniques for the Hui and Walter model
ML estimates are a set of parameter estimates that were most “likely” to have generated the observed data and are obtained by maximizing the likelihood function (Tanner, 1996). Variances are obtained by calculating the Fisher Observed Information matrix and inverting it (Gelman et al., 1995, p. 100). The square roots of the diagonals of this matrix are the corresponding S.E.s. ML estimates have many optimal properties when sample sizes are large. They are asymptotically unbiased and efficient
Assumptions
The methods presented in this paper are based on several assumptions that — if not taken into careful consideration — can seriously invalidate the results.
In the models proposed by Gart and Buck (1966), Staquet et al. (1981) and Hui and Walter (1980), the two tests are assumed to be conditionally independent. The assumption of conditional independence implies that given that an animal is diseased (or not), the probability of positive (or negative) outcomes for T1 is the same regardless of a
Illustrations
To illustrate the reviewed methods and models, we used data from Georgiadis et al. (1998) who evaluated a nested polymerase chain reaction (PCR) test (Barlough et al., 1995) and microscopic examination (ME) of kidney imprints for detection of the microsporidian parasite Nucleospora salmonis in rainbow trout.
Briefly, Georgiadis et al. (1998) used the NR and the EM algorithms to assess the accuracy of the PCR test and ME using the Hui and Walter two-population model. Thus, some of the results in
Conclusions
When evaluating a new diagnostic test, it is generally wise to assume that the sensitivity and specificity of the reference test are not precisely known, and to use available methods to estimate them as well. For this purpose, we advocate the use of the ML methods when two or more populations can be sampled, and when the assumptions for the Hui and Walter model can be justified. A simple Newton–Raphson approach should suffice when the cells of the 2×2 tables displaying the cross-classified test
Acknowledgements
The study was supported in part by the NRI Competitive Grants Program/USDA Award No. 98-35204-6535. Additionally, we thank S. Andersen, J. Barlough, W. Cox, I.A. Gardner, R.P. Hedrick, L.M. Pearson, R. Singh and M. Thurmond for valuable contributions to this work.
References (64)
- et al.
Sensitivity and specificity of diagnostic tests in acute maxillary sinusitis determined by maximum likelihood in the absence of an external standard
J. Clin. Epidemiol.
(1994) - et al.
Conditional dependence between tests affects the diagnosis and surveillance of animal diseases
Prev. Vet. Med.
(2000) - et al.
Epidemiologic issues in the validation of veterinary diagnostic tests
Prev. Vet. Med.
(2000) - et al.
Dual group screening
J. Statist. Plann. Inference
(2000) - et al.
Reliability and accuracy of differentiating pervasive developmental disorder subtypes
J. Am. Acad. Child Adolesc. Psych.
(1998) - et al.
Evaluation of bluetongue virus diagnostic tests in free-ranging bighorn sheep
Prev. Vet. Med.
(1998) - et al.
Diagnostic performance of two tests and fecal culture for subclinical paratuberculosis and associations with production
Prev. Vet. Med.
(1992) - et al.
Methodology for assessment of new dichotomous diagnostic tests
J. Chronic. Dis.
(1981) - et al.
Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review
J. Clin. Epidemiol.
(1988) - et al.
Evaluation of clinical mastitis and somatic cell count as diagnostic tests for surveillance of udder health in dairy herds
Epidémiologie et Santé Animale
(1997)
Re: Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard
Am. J. Epidemiol.
Nested polymerase chain reaction for detection of Enterocytozoon salmonis genomic DNA in chinook salmon Oncorhynchus tshawytscha
Dis. Aquat. Org.
Bayesian binomial regression: predicting survival at a trauma center
Am. Statist.
How independent are multiple “independent” diagnostic classifications
Statist. Med.
Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence
Statist. Med.
Causal modeling to estimate sensitivity and specificity of a test when prevalence changes
Epidemiology
Dependency between sensitivity, specificity and prevalence analysed by means of Gibbs sampling
Epidémiologie et Santé Animale
Detection of influential observations in linear regression
Technometrics
Maximum likelihood from incomplete data via the EM algorithm
J. Roy. Statist. Soc. Ser. B
Estimation of the sensitivity and the specificity of two diagnostic tests for the detection of antibodies against Actinobacillus pleuropneumoniae serotype 2 in pigs by maximum-likelihood-estimation and Gibbs sampling
Epidémiologie et Santé Animale
Measuring diagnostic accuracy in the absence of a gold standard
Am. J. Psychiatr.
Diagnostic accuracy and confusability analyses: an application to the diagnostic interview for genetic studies
Psychol. Med.
Comparison of a screening test and a reference test in epidemiologic studies. II. A probabilistic model for the comparison of diagnostic tests
Am. J. Epidemiol.
The statistical precision of medical screening tests
Statist. Sci.
Bayesian analysis of screening data: application to AIDS in blood donors
Can. J. Statist.
Sampling-based approaches to calculating marginal densities
J. Am. Statist. Assoc.
Optimal administration of dual screening tests for detecting a characteristic with special reference to low prevalence diseases
Biometrics
Field evaluation of sensitivity and specificity of a polymerase chain reaction (PCR) for detection of N. salmonis in rainbow trout
J. Aquat. Anim. Health
A biomedical application of latent class models with random effects
Appl. Statist.
Cited by (394)
Evaluating serological tests for foot-and-mouth disease while accounting for different serotypes and uncertain vaccination status
2023, Preventive Veterinary MedicineImplications of covariate induced test dependence on the diagnostic accuracy of latent class analysis in pulmonary tuberculosis
2022, Journal of Clinical Tuberculosis and Other Mycobacterial Diseases