A note on ranked-set sampling using a covariate
Introduction
Ranked-set sampling (RSS), proposed by McIntyre, 1952, McIntyre, 2005, is a sampling method that improves on simple random sampling (SRS) by taking advantage of auxiliary ranking information. To obtain a balanced ranked-set sample, one begins by specifying a set size m. One then draws m independent simple random samples (sets) of size m and ranks the units in each set from smallest to largest. The ranking may be done either by judgment or by using an easily available covariate, and it need not be perfectly accurate. One then selects for measurement the unit ranked smallest in the first set, the unit ranked second-smallest in the second set, and so on. This process yields a sample of m independent values. To obtain a larger sample, one repeats the process for n independent cycles. The sample then consists of independent values, with n values from units ranked smallest, n values from units ranked second-smallest, and so on. If the rankings are perfect, these values are independent order statistics from the parent distribution. Otherwise, they are independent judgment order statistics.
In some statistical problems, it is helpful to allow the number of measured units with each rank to vary from one rank to another. In this case, one may use unbalanced RSS. One simply specifies a set size m and a vector (n1,…,nm), where is the number of units with in-set rank i to be selected for measurement. The sample then consists of independent judgment order statistics. If is the sample mean for the measured values from units given rank i, then the standard RSS nonparametric mean estimator is , which is unbiased under either balanced or unbalanced RSS.
Another variation on balanced RSS is judgment post-stratification (JPS), proposed by MacEachern et al. (2004). To collect a JPS sample of size N using set size m, one first selects a simple random sample of size N. Each of the N units is measured, and some additional ranking information is also collected. For each of the N measured units, one obtains an additional m−1 independent units to create a set of size m. One then ranks the m units from smallest to largest, noting the rank of the measured unit. The full data set then consists of the N measured values and their associated ranks. As in unbalanced RSS, the number of units with each rank may vary from one rank to another. However, while the number ni of measured values with rank i is fixed in advance in RSS, it is random in JPS. In fact, the vector (n1,…,nm) has a multinomial distribution with mass parameter N and probability vector (1/m,…,1/m). JPS tends to be somewhat less efficient than balanced RSS, but it offers increased flexibility. One advantage is that since JPS is based on a simple random sample, researchers retain the option of using SRS-based methods. In addition, JPS allows rankers to declare ties (MacEachern et al., 2004). The standard JPS nonparametric mean estimator is , the average of over all ranks i such that .
Both RSS and JPS were developed with subjective, judgment-based rankings in mind, but the idea of ranking using a covariate has received a lot of attention. For example, Chen et al. (2006) discussed how RSS with covariate-based rankings can be used in estimating a population proportion, and Wang et al. (2006) discussed how one can implement JPS using more than one covariate. We provide evidence here that when rankings are done using a covariate, the estimators and no longer make efficient use of the available information. Others such as Ridout and Cobby (1987) and Wang et al. (2008) have noted that when covariates are used in the ranking process, incorporating the covariates into the estimation process should lead to improved estimators. We expand on this observation in two ways. We first show that when covariates are used to do the rankings in JPS and unbalanced RSS, estimators like and that do not use the covariate information are actually inadmissible under squared error loss. We then propose some alternate nonparametric estimators that do incorporate the covariate information, and we show that these estimators tend to be more efficient than and .
In Section 2, we show that when rankings are done using a covariate, and are inadmissible under squared error loss. The argument that we use also shows that several other estimators in the literature are inadmissible when the rankings are done using a covariate. Kvam and Samaniego (1993) showed that in balanced RSS, is inadmissible when sampling from specific parametric families of distributions. Our results here are less general in that they apply only when the ranking is done using a covariate, but more general in that they require neither parametric assumptions nor perfect rankings. In Section 3, we show that when rankings are done using a covariate, nonparametric regression techniques yield estimators that tend to be significantly more efficient than and . We give our conclusions in Section 4.
Section snippets
Inadmissibility results
Suppose that a sample is obtained using either JPS or unbalanced RSS, with the rankings done using a covariate. It then follows that for the N units that are measured, both the variable of interest Y and the ranking variable X are known, while for the N(m−1) units used only for ranking, X is known, but Y is unknown. Using this data, we obtain better estimators through conditioning. Specifically, we condition on (i) the set S1 of units used for ranking and (ii) the set of units chosen for
Alternate mean estimators
In this section, we describe some nonparametric mean estimators that tend to improve on and when rankings are done using a covariate. The basic strategy used in creating these estimators is to ignore the set structure and the in-set rankings, but to condition on the full set of Nm units used in ranking. It may seem that we are losing information by ignoring the in-set rankings, but what we are actually doing is replacing in-set rankings with the more informative ranking of each unit
Conclusions
We have shown that when rankings are done using a covariate, the standard nonparametric mean estimators in JPS and unbalanced RSS are inadmissible under squared error loss. The new estimators and are guaranteed to improve upon and , but even greater gains in efficiency are possible if one is willing to risk a small loss in efficiency should X and Y turn out to be only loosely related. In particular, estimators based on nonparametric regression techniques seem to offer
Acknowledgment
The author thanks the referees for helpful comments that have led to an improved paper.
References (19)
Ranked-set sampling with regression-type estimators
Journal of Statistical Planning and Inference
(2001)- et al.
On the inadmissibility of empirical averages as estimators in ranked set sampling
Journal of Statistical Planning and Inference
(1993) - et al.
An empirical distribution function for sampling with incomplete information
Annals of Mathematical Statistics
(1955) - et al.
Statistical Inference
(2002) - et al.
Unbalanced ranked set sampling for estimating a population proportion
Biometrics
(2006) Adaptive ranked-set sampling with multiple concomitant variables: an effective way to observational economy
Bernoulli
(2002)- et al.
Ranked set sampling theory with order statistics background
Biometrics
(1972) The uniform convergence of the Nadaraya–Watson regression function estimate
Canadian Journal of Statistics
(1978)- et al.
Kernel method for the estimation of the distribution function and the mean with auxiliary information in ranked set sampling
Environmetrics
(2002)
Cited by (29)
Variance estimation of persons infected with AIDS under ranked set sampling
2019, Clinical Epidemiology and Global HealthCitation Excerpt :Ref. [10] were the first who proved that the mean estimator from RSS is more efficient than that from SRS. This led a lot of research has been done by various authors including, Refs. [1–4,7–9]. In this paper, we propose a model using RSS, instead of SRS with replacement (SRSWR), for studies of the variance.
A more efficient proportion estimator in ranked set sampling
2017, Statistics and Probability LettersCitation Excerpt :Although McIntyre (1952) proposed the RSS for estimating a population mean, virtually all statistical problems are now addressed in the RSS literature. These statistical problems include statistical inference and estimation about the population mean (Takahasi and Wakimoto, 1968; Frey, 2011), the population variance (Stokes, 1980; MacEachern et al., 2002; Zamanzade and Vock, 2015), the cumulative distribution function (Stokes and Sager, 1988; Kvam and Samaniego, 1994; Duembgen and Zamanzade, 2013), and the population proportion (Terpstra and Liudahl, 2004; Terpstra and Miller, 2006; Terpstra and Wang, 2008; Chen et al., 2005, 2007). Gory and Ozturk (2015) provided a new approach for analysing Third National Health and Nutrition Examination Survey (NHANES III) data set using the RSS design.
Variance estimation in ranked set sampling using a concomitant variable
2015, Statistics and Probability LettersCitation Excerpt :Frey and Ozturk (2011) and Wang et al. (2012) proposed some distribution function estimators. Frey (2011) proposed some mean estimators in RSS and JPS based on measuring a concomitant variable, showing how the values of the concomitant variable can be used more efficiently than just for ranking. A good review on existing literature on RSS and its variations is given by Wolfe (2012).
A review on concomitants of order statistics and its application in parameter estimation under ranked set sampling
2024, Journal of the Korean Statistical SocietyNonparametric estimation of mean residual lifetime in ranked set sampling with a concomitant variable
2024, Journal of Applied StatisticsInformation theory approach to ranked set sampling and new sub-ratio estimators
2024, Communications in Statistics - Theory and Methods