Confidence intervals for quantiles in finite populations with randomized nomination sampling
Introduction
Suppose we have a finite population of elements, labeled , consisting of values taken by the study variable . Throughout, we will assume that the are unique, denotes the th ordered value in the population and refers to the th order statistic in a simple random sample of size from . Let be a sequence of independent random variables taking values in , with probabilities . Given , assume that , represent a simple random without replacement subsample of size taken from the underlying population. Define the map . If the map gives particular elements in , then we call a “nomination map” and the resulting sample is called a “nomination sample” while the whole process is referred to as “nomination sampling”. The “nomination” process is usually accomplished through visual inspection or auxiliary information. Some well-known examples of nomination sampling are given below:
- (1)
The choice of nominates the minimum from each subsample and results in a minima nomination sample of size as . Minima nomination sampling was introduced by Wells and Tiwari (1990) and has been used in estimating distribution functions.
- (2)
The choice of nominates the maximum from each subsample and results in a maxima nomination sample of size as . Maxima nomination sampling was first introduced by Willemain (1980) in estimating an infinite population median. This sampling design has been the topic of many research articles, e.g. in estimating distribution functions (Boyles and Samaniego, 1986, Tiwari, 1988, Kvam and Samaniego, 1993a), in quantile estimation (Tiwari and Wells, 1989), and recently in acceptance sampling and attribute control charts (Jafari Jozani and Mirkamali, 2010, Jafari Jozani and Mirkamali, 2011).
- (3)
Let be a sequence of independent Bernoulli random variables with the success probability (independent of the ). The choice of nominates the maxima (with probability ) or the minima (with probability ) from each subsample. This results in a randomized (minima–maxima) nomination sample of size as , where . This sampling design has recently been introduced by Jafari Jozani and Johnson (2012) for estimating the mean value of the characteristic of interest in finite populations.
In this paper, we study the problem of constructing confidence intervals for population quantiles under different randomized nomination sampling (RNS) designs. In recent years, many researchers have considered similar problems for finite and infinite populations under different rank-based sampling designs. For example, under the ranked set sampling (RSS) design, Ozturk and Deshpande (2006) proposed RSS based distribution free confidence intervals for quantiles of infinite populations, and they showed that their RSS based intervals tend to be shorter than their counterparts based on simple random sampling (SRS). Later, Deshpande et al. (2006) developed nonparametric RSS based confidence intervals for quantiles of finite populations. For recent developments in this direction see Frey (2007a), Ozturk (2012) and the references therein.
In the finite population setting, the construction of a randomized nomination sample can be done in different ways. It is usual to assume that subsamples are drawn without replacement from the underlying population. However, different replacement policies for the measured and ranked units in a subsample, prior to the selection of the units in the next subsample, result in different RNS designs. Following Deshpande et al. (2006), in the Level 0 design, subsamples are drawn without replacement, but all units in the subsample including the measured unit, are replaced back into the population prior to selection of the next subsample. In the Level 1 RNS design, all units in the subsample except the unit selected for full measurement, are replaced into the population. If none of the units from the subsamples are replaced into the population before drawing the next subsample, then we call this the Level 2 design. Jafari Jozani and Johnson (2012) developed recursive algorithms to obtain the first and second order inclusion probabilities for population units under the Level 0 and Level 1 RNS sampling designs.
While the RNS design does not preclude the use of fixed subsample sizes (by taking for some fixed ), allowing for random subsample sizes provides additional flexibility in the design. In many practical situations, subsamples may not have a predetermined fixed size. For example, see Gemayel et al. (2010) for a discussion of random set sizes in the ranked set sampling setting and Boyles and Samaniego (1986) for a discussion of random subsample sizes in maxima nomination sampling. Another advantage in allowing random subsample sizes is that, when , we have, on average, observations which comprise a simple random subsample. Indeed, on average, RNS samples will contain maximums from subsamples of size and minimums from subsamples of size for . Thus, in addition to the simple random sample portion of the RNS sample, we also have a collection of extremal order statistics from various set sizes, which can contain much more information about the population than SRS observations. In particular, when is moderately large, as proposed in Nourmohammadi et al. (submitted), after observing the RNS sample we can bootstrap its SRS portion to estimate the ranking error probabilities in an imperfect RNS design. One may also want to choose the number of maximums (and so the minimums) in advance, instead of getting involved in a randomized process. This can be accomplished following a conditioning argument on ’s (see Section 6). Despite the complexity of making inference based on conditioning on after randomization, the conditioning argument may lead to better results. However, the proportion of required maximums in this setting would be another concern requiring attention. This concern can be answered using the results we obtain in the randomized setting.
The outline of this paper is as follows. In Section 2, we discuss the three different ways of constructing an RNS design in finite populations using the replacement policies Level 0, Level 1 and Level 2. Section 3 deals with the construction of confidence intervals for population quantiles under Level 0 RNS design. Several interesting theoretical results are presented in this section. Also, we provide a guideline for choosing the design parameter in Level 0 RNS design to obtain more efficient confidence intervals for specific population quantiles compared with its SRS counterpart. In Sections 4 Confidence interval in Level 1 RNS design, 5 Confidence interval in Level 2 RNS design, we develop recursive algorithms that can be used to obtain the confidence coefficient associated with Level 1 and Level 2 RNS confidence intervals, respectively. In Section 6, numerical studies are conducted to evaluate the performance of the RNS based symmetric and equal-tail confidence intervals compared with their counterparts based on SRS design. Section 6 also contains a discussion on the effect of the fixed set size and conditional results given , on the length of the constructed symmetric and equal-tail confidence intervals. In Section 7, we give some concluding remarks. Finally, some of the proofs are presented in the Appendix.
Section snippets
RNS replacement protocols
In this section, we describe three protocols for drawing randomized nomination samples from the finite population . Following Deshpande et al. (2006) we refer to these protocols as Level 0, Level 1 and Level 2. We assume that ranking of the units in each subsample is done based on an auxiliary variable. To set the notation, suppose we have a finite population of elements, labeled , consisting of bivariate pairs , where is the study variable and is an auxiliary
Confidence intervals for the Level 0 RNS design
Suppose is a sample of size drawn from the finite population (consisting of elements) using the Level 0 RNS design with design parameters and . Let represent the ordered values of the characteristic of interest for the population elements. Letting to be a fixed integer, , it is desired to obtain a confidence interval for , the th quantile of . Let represent the ordered observations obtained from the Level 0 RNS design
Confidence interval in Level 1 RNS design
Suppose is a sample of size drawn from the finite population using the Level 1 RNS design with design parameters and . Consider the Level 1 RNS confidence interval for , the th quantile of the population. To obtain a conservative confidence interval of level we need to find the largest such that Since the Level 1 RNS design is without replacement and the measured unit obtained from the th cycle of
Confidence interval in Level 2 RNS design
In this section we show how to obtain a confidence interval for the population quantile under the Level 2 RNS design. Suppose is a sample of size drawn from the finite population using the Level 2 RNS design with design parameters and . Consider the confidence interval for . To obtain a confidence interval of level we need to find and such that In this case, since none of the units selected in
Numerical study
In this section, we compare the performance of RNS confidence intervals for population quantiles with their SRS counterparts. To this end, we consider both symmetric and equal-tail confidence intervals and concentrate on the expected length of these intervals to assess their performance under the proposed sampling design. To highlight the effect of the without replacement policy we select small population sizes and the sampling fraction is also chosen to be moderately large. To take into
Concluding remarks
Three protocols of RNS design in finite population setting were described and some procedures for constructing symmetric and equal-tail confidence intervals for population quantiles were proposed. It was shown that for all three hypothetical population shapes, when the purpose of study was constructing confidence interval for population quantile , except the median, there was an optimal choice of which improved all three levels RNS over SRS. It was also shown that the design parameters
Acknowledgments
The authors gratefully acknowledge the partial support of the NSERC Canada. The authors would like to thank an anonymous referee for constructive comments and suggestions which resulted in this improved version.
References (24)
- et al.
Randomized nomination sampling for finite populations
J. Statist. Plann. Inference
(2012) - et al.
Improved attribute acceptance sampling plans based on maxima nomination sampling
J. Statist. Plann. Inference
(2010) - et al.
Control charts for attributes with maxima nominated samples
J. Statist. Plann. Inference
(2011) - et al.
On the inadmissibility of empirical averages as estimators in ranked set sampling
J. Statist. Plann. Inference
(1993) Note on interpolated order statistics
Statist. Probab. Lett.
(1992)Quantile inference based on partially rank-ordered set samples
J. Statist. Plann. Inference
(2012)- et al.
Ranked-set sample nonparametric quantile confidence intervals
J. Statist. Plann. Inference
(2006) - et al.
Estimating a distribution function based on nomination sampling
J. Amer. Statist. Assoc.
(1986) - et al.
Order Statistics
(2003) - et al.
Nonparametric ranked-set sampling confidence intervals for quantiles of a finite population
Environ. Ecol. Stat.
(2006)
Distribution-free statistical intervals via ranked-set sampling
Canad. J. Statist.
A note on a probability involving independent order statistics
J. Stat. Comput. Simul.
Cited by (10)
Distribution-free tolerance intervals with nomination samples: Applications to mercury contamination in fish
2015, Statistical MethodologyUsing nomination sampling in estimating the area under the ROC curve
2023, Computational StatisticsParametric inference using nomination sampling with an application to mercury contamination in fish
2020, Sankhya: The Indian Journal of StatisticsOn multiple imputation for unbalanced ranked set samples with applications in quantile estimation
2020, Brazilian Journal of Probability and Statistics