A note on ranked-set sampling using a covariate

https://doi.org/10.1016/j.jspi.2010.08.002Get rights and content

Abstract

Ranked-set sampling (RSS) and judgment post-stratification (JPS) use ranking information to obtain more efficient inference than is possible using simple random sampling. Both methods were developed with subjective, judgment-based rankings in mind, but the idea of ranking using a covariate has received a lot of attention. We provide evidence here that when rankings are done using a covariate, the standard RSS and JPS mean estimators no longer make efficient use of the available information. We first show that when rankings are done using a covariate, the standard nonparametric mean estimators in JPS and unbalanced RSS are inadmissible under squared error loss. We then show that when rankings are done using a covariate, nonparametric regression techniques yield mean estimators that tend to be significantly more efficient than the standard RSS and JPS mean estimators. We conclude that the standard estimators are best reserved for settings where only subjective, judgment-based rankings are available.

Introduction

Ranked-set sampling (RSS), proposed by McIntyre, 1952, McIntyre, 2005, is a sampling method that improves on simple random sampling (SRS) by taking advantage of auxiliary ranking information. To obtain a balanced ranked-set sample, one begins by specifying a set size m. One then draws m independent simple random samples (sets) of size m and ranks the units in each set from smallest to largest. The ranking may be done either by judgment or by using an easily available covariate, and it need not be perfectly accurate. One then selects for measurement the unit ranked smallest in the first set, the unit ranked second-smallest in the second set, and so on. This process yields a sample of m independent values. To obtain a larger sample, one repeats the process for n independent cycles. The sample then consists of Nnm independent values, with n values from units ranked smallest, n values from units ranked second-smallest, and so on. If the rankings are perfect, these values are independent order statistics from the parent distribution. Otherwise, they are independent judgment order statistics.

In some statistical problems, it is helpful to allow the number of measured units with each rank to vary from one rank to another. In this case, one may use unbalanced RSS. One simply specifies a set size m and a vector (n1,…,nm), where ni>0 is the number of units with in-set rank i to be selected for measurement. The sample then consists of Ni=1mni independent judgment order statistics. If Y¯i is the sample mean for the measured values from units given rank i, then the standard RSS nonparametric mean estimator is Y¯RSS=(1/m)i=1mY¯i, which is unbiased under either balanced or unbalanced RSS.

Another variation on balanced RSS is judgment post-stratification (JPS), proposed by MacEachern et al. (2004). To collect a JPS sample of size N using set size m, one first selects a simple random sample of size N. Each of the N units is measured, and some additional ranking information is also collected. For each of the N measured units, one obtains an additional m−1 independent units to create a set of size m. One then ranks the m units from smallest to largest, noting the rank of the measured unit. The full data set then consists of the N measured values and their associated ranks. As in unbalanced RSS, the number of units with each rank may vary from one rank to another. However, while the number ni of measured values with rank i is fixed in advance in RSS, it is random in JPS. In fact, the vector (n1,…,nm) has a multinomial distribution with mass parameter N and probability vector (1/m,…,1/m). JPS tends to be somewhat less efficient than balanced RSS, but it offers increased flexibility. One advantage is that since JPS is based on a simple random sample, researchers retain the option of using SRS-based methods. In addition, JPS allows rankers to declare ties (MacEachern et al., 2004). The standard JPS nonparametric mean estimator is Y¯JPS, the average of Y¯i over all ranks i such that ni>0.

Both RSS and JPS were developed with subjective, judgment-based rankings in mind, but the idea of ranking using a covariate has received a lot of attention. For example, Chen et al. (2006) discussed how RSS with covariate-based rankings can be used in estimating a population proportion, and Wang et al. (2006) discussed how one can implement JPS using more than one covariate. We provide evidence here that when rankings are done using a covariate, the estimators Y¯JPS and Y¯RSS no longer make efficient use of the available information. Others such as Ridout and Cobby (1987) and Wang et al. (2008) have noted that when covariates are used in the ranking process, incorporating the covariates into the estimation process should lead to improved estimators. We expand on this observation in two ways. We first show that when covariates are used to do the rankings in JPS and unbalanced RSS, estimators like Y¯JPS and Y¯RSS that do not use the covariate information are actually inadmissible under squared error loss. We then propose some alternate nonparametric estimators that do incorporate the covariate information, and we show that these estimators tend to be more efficient than Y¯JPS and Y¯RSS.

In Section 2, we show that when rankings are done using a covariate, Y¯JPS and Y¯RSS are inadmissible under squared error loss. The argument that we use also shows that several other estimators in the literature are inadmissible when the rankings are done using a covariate. Kvam and Samaniego (1993) showed that in balanced RSS, Y¯RSS is inadmissible when sampling from specific parametric families of distributions. Our results here are less general in that they apply only when the ranking is done using a covariate, but more general in that they require neither parametric assumptions nor perfect rankings. In Section 3, we show that when rankings are done using a covariate, nonparametric regression techniques yield estimators that tend to be significantly more efficient than Y¯JPS and Y¯RSS. We give our conclusions in Section 4.

Section snippets

Inadmissibility results

Suppose that a sample is obtained using either JPS or unbalanced RSS, with the rankings done using a covariate. It then follows that for the N units that are measured, both the variable of interest Y and the ranking variable X are known, while for the N(m−1) units used only for ranking, X is known, but Y is unknown. Using this data, we obtain better estimators through conditioning. Specifically, we condition on (i) the set S1 of units used for ranking and (ii) the set S2S1 of units chosen for

Alternate mean estimators

In this section, we describe some nonparametric mean estimators that tend to improve on Y¯JPS and Y¯RSS when rankings are done using a covariate. The basic strategy used in creating these estimators is to ignore the set structure and the in-set rankings, but to condition on the full set of Nm units used in ranking. It may seem that we are losing information by ignoring the in-set rankings, but what we are actually doing is replacing in-set rankings with the more informative ranking of each unit

Conclusions

We have shown that when rankings are done using a covariate, the standard nonparametric mean estimators in JPS and unbalanced RSS are inadmissible under squared error loss. The new estimators Y˜JPS and Y˜RSS are guaranteed to improve upon Y¯JPS and Y¯RSS, but even greater gains in efficiency are possible if one is willing to risk a small loss in efficiency should X and Y turn out to be only loosely related. In particular, estimators based on nonparametric regression techniques seem to offer

Acknowledgment

The author thanks the referees for helpful comments that have led to an improved paper.

References (19)

  • Z. Chen

    Ranked-set sampling with regression-type estimators

    Journal of Statistical Planning and Inference

    (2001)
  • P.H. Kvam et al.

    On the inadmissibility of empirical averages as estimators in ranked set sampling

    Journal of Statistical Planning and Inference

    (1993)
  • M. Ayer et al.

    An empirical distribution function for sampling with incomplete information

    Annals of Mathematical Statistics

    (1955)
  • G. Casella et al.

    Statistical Inference

    (2002)
  • H. Chen et al.

    Unbalanced ranked set sampling for estimating a population proportion

    Biometrics

    (2006)
  • Z. Chen

    Adaptive ranked-set sampling with multiple concomitant variables: an effective way to observational economy

    Bernoulli

    (2002)
  • T.R. Dell et al.

    Ranked set sampling theory with order statistics background

    Biometrics

    (1972)
  • L.P. Devroye

    The uniform convergence of the Nadaraya–Watson regression function estimate

    Canadian Journal of Statistics

    (1978)
  • K.F. Lam et al.

    Kernel method for the estimation of the distribution function and the mean with auxiliary information in ranked set sampling

    Environmetrics

    (2002)
There are more references available in the full text version of this article.

Cited by (29)

  • Variance estimation of persons infected with AIDS under ranked set sampling

    2019, Clinical Epidemiology and Global Health
    Citation Excerpt :

    Ref. [10] were the first who proved that the mean estimator from RSS is more efficient than that from SRS. This led a lot of research has been done by various authors including, Refs. [1–4,7–9]. In this paper, we propose a model using RSS, instead of SRS with replacement (SRSWR), for studies of the variance.

  • A more efficient proportion estimator in ranked set sampling

    2017, Statistics and Probability Letters
    Citation Excerpt :

    Although McIntyre (1952) proposed the RSS for estimating a population mean, virtually all statistical problems are now addressed in the RSS literature. These statistical problems include statistical inference and estimation about the population mean (Takahasi and Wakimoto, 1968; Frey, 2011), the population variance (Stokes, 1980; MacEachern et al., 2002; Zamanzade and Vock, 2015), the cumulative distribution function (Stokes and Sager, 1988; Kvam and Samaniego, 1994; Duembgen and Zamanzade, 2013), and the population proportion (Terpstra and Liudahl, 2004; Terpstra and Miller, 2006; Terpstra and Wang, 2008; Chen et al., 2005, 2007). Gory and Ozturk (2015) provided a new approach for analysing Third National Health and Nutrition Examination Survey (NHANES III) data set using the RSS design.

  • Variance estimation in ranked set sampling using a concomitant variable

    2015, Statistics and Probability Letters
    Citation Excerpt :

    Frey and Ozturk (2011) and Wang et al. (2012) proposed some distribution function estimators. Frey (2011) proposed some mean estimators in RSS and JPS based on measuring a concomitant variable, showing how the values of the concomitant variable can be used more efficiently than just for ranking. A good review on existing literature on RSS and its variations is given by Wolfe (2012).

  • Information theory approach to ranked set sampling and new sub-ratio estimators

    2024, Communications in Statistics - Theory and Methods
View all citing articles on Scopus
View full text