Confidence intervals for quantiles in finite populations with randomized nomination sampling

https://doi.org/10.1016/j.csda.2013.11.020Get rights and content

Abstract

Given a finite population consisting of N elements, it is desired to obtain confidence intervals for (t/N)th quantile x(t) of the population based on the randomized nomination sampling (RNS) design. Three without replacement sampling protocols are described and procedures for constructing nonparametric confidence intervals for population quantiles are developed. Formulas for computing coverage probabilities for these confidence intervals are presented. Simulation studies are conducted and the performance of the RNS based confidence intervals is compared with those based on the simple random sample without replacement design.

Introduction

Suppose we have a finite population of N elements, labeled U={1,,N}, consisting of values x1,,xN taken by the study variable x. Throughout, we will assume that the x1,,xN are unique, x(i) denotes the ith ordered x value in the population and xr:k refers to the rth order statistic in a simple random sample of size k from U. Let {Ki:i{1,,m}} be a sequence of independent random variables taking values in {1,,M},1MN, with probabilities DK={(p1,p2,,pM):0pi1,i=1Mpi=1}. Given Ki=ki, assume that xi=(xi1,,xiki),i=1,,m, represent a simple random without replacement subsample of size ki taken from the underlying population. Define the map Ψi:RKiR. If the map Ψi gives particular elements in xi, then we call Ψi a “nomination map” and the resulting sample is called a “nomination sample” while the whole process is referred to as “nomination sampling”. The “nomination” process is usually accomplished through visual inspection or auxiliary information. Some well-known examples of nomination sampling are given below:

  • (1)

    The choice of Ψi(xi)=min1iKixij=x1:Ki nominates the minimum from each subsample and results in a minima nomination sample of size m as (x1:K1,,x1:Km). Minima nomination sampling was introduced by Wells and Tiwari (1990) and has been used in estimating distribution functions.

  • (2)

    The choice of Ψi(xi)=max1iKixij=xKi:Ki nominates the maximum from each subsample and results in a maxima nomination sample of size m as (xK1:K1,,xKm:Km). Maxima nomination sampling was first introduced by Willemain (1980) in estimating an infinite population median. This sampling design has been the topic of many research articles, e.g. in estimating distribution functions (Boyles and Samaniego, 1986, Tiwari, 1988, Kvam and Samaniego, 1993a), in quantile estimation (Tiwari and Wells, 1989), and recently in acceptance sampling and attribute control charts (Jafari Jozani and Mirkamali, 2010, Jafari Jozani and Mirkamali, 2011).

  • (3)

    Let {Wi,i{1,,m}} be a sequence of independent Bernoulli random variables with the success probability ζ[0,1] (independent of the Ki). The choice of Ψi(xi)=WixKi:Ki+(1Wi)x1:Ki nominates the maxima (with probability ζ) or the minima (with probability 1ζ) from each subsample. This results in a randomized (minima–maxima) nomination sample of size m as (Y1,,Ym), where Yi=dWixKi:Ki+(1Wi)x1:Ki. This sampling design has recently been introduced by Jafari Jozani and Johnson (2012) for estimating the mean value of the characteristic of interest in finite populations.

In this paper, we study the problem of constructing confidence intervals for population quantiles under different randomized nomination sampling (RNS) designs. In recent years, many researchers have considered similar problems for finite and infinite populations under different rank-based sampling designs. For example, under the ranked set sampling (RSS) design, Ozturk and Deshpande (2006) proposed RSS based distribution free confidence intervals for quantiles of infinite populations, and they showed that their RSS based intervals tend to be shorter than their counterparts based on simple random sampling (SRS). Later, Deshpande et al. (2006) developed nonparametric RSS based confidence intervals for quantiles of finite populations. For recent developments in this direction see Frey (2007a), Ozturk (2012) and the references therein.

In the finite population setting, the construction of a randomized nomination sample can be done in different ways. It is usual to assume that subsamples are drawn without replacement from the underlying population. However, different replacement policies for the measured and ranked units in a subsample, prior to the selection of the units in the next subsample, result in different RNS designs. Following Deshpande et al. (2006), in the Level 0 design, subsamples are drawn without replacement, but all units in the subsample including the measured unit, are replaced back into the population prior to selection of the next subsample. In the Level 1 RNS design, all units in the subsample except the unit selected for full measurement, are replaced into the population. If none of the units from the subsamples are replaced into the population before drawing the next subsample, then we call this the Level 2 design. Jafari Jozani and Johnson (2012) developed recursive algorithms to obtain the first and second order inclusion probabilities for population units under the Level 0 and Level 1 RNS sampling designs.

While the RNS design does not preclude the use of fixed subsample sizes (by taking P(Ki=k)=1 for some fixed k), allowing for random subsample sizes provides additional flexibility in the design. In many practical situations, subsamples may not have a predetermined fixed size. For example, see Gemayel et al. (2010) for a discussion of random set sizes in the ranked set sampling setting and Boyles and Samaniego (1986) for a discussion of random subsample sizes in maxima nomination sampling. Another advantage in allowing random subsample sizes is that, when p1>0, we have, on average, mp1 observations which comprise a simple random subsample. Indeed, on average, RNS samples will contain mζpk maximums from subsamples of size k and m(1ζ)pk minimums from subsamples of size k for k=1,,M. Thus, in addition to the simple random sample portion of the RNS sample, we also have a collection of extremal order statistics from various set sizes, which can contain much more information about the population than SRS observations. In particular, when p1>0 is moderately large, as proposed in Nourmohammadi et al. (submitted), after observing the RNS sample we can bootstrap its SRS portion to estimate the ranking error probabilities in an imperfect RNS design. One may also want to choose the number of maximums (and so the minimums) in advance, instead of getting involved in a randomized process. This can be accomplished following a conditioning argument on Wi’s (see Section  6). Despite the complexity of making inference based on conditioning on Wi=wi after randomization, the conditioning argument may lead to better results. However, the proportion of required maximums in this setting would be another concern requiring attention. This concern can be answered using the results we obtain in the randomized setting.

The outline of this paper is as follows. In Section  2, we discuss the three different ways of constructing an RNS design in finite populations using the replacement policies Level 0, Level 1 and Level 2. Section  3 deals with the construction of confidence intervals for population quantiles x(t) under Level 0 RNS design. Several interesting theoretical results are presented in this section. Also, we provide a guideline for choosing the design parameter ζ in Level 0 RNS design to obtain more efficient confidence intervals for specific population quantiles compared with its SRS counterpart. In Sections  4 Confidence interval in Level 1 RNS design, 5 Confidence interval in Level 2 RNS design, we develop recursive algorithms that can be used to obtain the confidence coefficient associated with Level 1 and Level 2 RNS confidence intervals, respectively. In Section  6, numerical studies are conducted to evaluate the performance of the RNS based symmetric and equal-tail confidence intervals compared with their counterparts based on SRS design. Section  6 also contains a discussion on the effect of the fixed set size and conditional results given Wi=wi,i=1,,m, on the length of the constructed symmetric and equal-tail confidence intervals. In Section  7, we give some concluding remarks. Finally, some of the proofs are presented in the Appendix.

Section snippets

RNS replacement protocols

In this section, we describe three protocols for drawing randomized nomination samples from the finite population U. Following Deshpande et al. (2006) we refer to these protocols as Level 0, Level 1 and Level 2. We assume that ranking of the units in each subsample is done based on an auxiliary variable. To set the notation, suppose we have a finite population of N elements, labeled U={1,,N}, consisting of bivariate pairs (x1,z1),,(xN,zN), where x is the study variable and z is an auxiliary

Confidence intervals for the Level 0 RNS design

Suppose Y1,Y2,,Ym is a sample of size m drawn from the finite population U (consisting of N elements) using the Level 0 RNS design with design parameters DK and ζ. Let x(1)<x(2)<<x(N) represent the ordered values of the characteristic of interest for the population elements. Letting t to be a fixed integer, t{1,,N}, it is desired to obtain a confidence interval for x(t), the (t/N)th quantile of U. Let Y1:m<Y2:m<<Ym:m represent the ordered observations obtained from the Level 0 RNS design

Confidence interval in Level 1 RNS design

Suppose Y1,Y2,,Ym is a sample of size m drawn from the finite population U using the Level 1 RNS design with design parameters DK and ζ. Consider the Level 1 RNS confidence interval [Yr:m,Ys:m] for x(t), the (t/N)th quantile of the population. To obtain a conservative confidence interval of level 1α we need to find the largest r such that P(Yr:mx(t)Ys:m)=P(Yr:mx(t))P(Ys:mx(t1))1α. Since the Level 1 RNS design is without replacement and the measured unit obtained from the ith cycle of

Confidence interval in Level 2 RNS design

In this section we show how to obtain a confidence interval for the population quantile x(t) under the Level 2 RNS design. Suppose Y1,Y2,,Ym is a sample of size m drawn from the finite population U using the Level 2 RNS design with design parameters DK and ζ. Consider the confidence interval [Yr:m,Ys:m] for x(t). To obtain a confidence interval of level 1α we need to find r and s such that P(Yr:mx(t)Ys:m)=P(Yr:mx(t))P(Ys:mx(t1))1α. In this case, since none of the units selected in

Numerical study

In this section, we compare the performance of RNS confidence intervals for population quantiles x(t) with their SRS counterparts. To this end, we consider both symmetric and equal-tail confidence intervals and concentrate on the expected length of these intervals to assess their performance under the proposed sampling design. To highlight the effect of the without replacement policy we select small population sizes and the sampling fraction is also chosen to be moderately large. To take into

Concluding remarks

Three protocols of RNS design in finite population setting were described and some procedures for constructing symmetric and equal-tail confidence intervals for population quantiles were proposed. It was shown that for all three hypothetical population shapes, when the purpose of study was constructing confidence interval for population quantile x(t), except the median, there was an optimal choice of ζ which improved all three levels RNS over SRS. It was also shown that the design parameters ζ=0

Acknowledgments

The authors gratefully acknowledge the partial support of the NSERC Canada. The authors would like to thank an anonymous referee for constructive comments and suggestions which resulted in this improved version.

References (24)

  • J. Frey

    Distribution-free statistical intervals via ranked-set sampling

    Canad. J. Statist.

    (2007)
  • J. Frey

    A note on a probability involving independent order statistics

    J. Stat. Comput. Simul.

    (2007)
  • Cited by (10)

    View all citing articles on Scopus
    View full text