Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A neutral comparative analysis of additive, multiplicative, and mixed quantitative randomized response models

  • Muhammad Azeem ,

    Roles Conceptualization, Data curation, Investigation, Methodology, Supervision, Visualization, Writing – original draft

    azeemstats@uom.edu.pk

    Affiliation Department of Statistics, University of Malakand, Lower Dir, KP, Pakistan

  • Sidra Ali

    Roles Data curation, Formal analysis, Methodology, Software, Writing – review & editing

    Affiliation Department of Statistics, University of Malakand, Lower Dir, KP, Pakistan

Abstract

In survey sampling, the randomized response technique is a useful tool to collect reliable data in many fields including sociology, education, economics, and psychology etc. Over the past few decades, many variants of quantitative randomized response models have been developed by researchers. The existing literature on randomized response models lacks a neutral comparative study of different models to help the practitioners choose the appropriate model for a given practical problem. In most of the existing studies, the authors tend to show only the favorable results by hiding the cases where their suggested models are inferior to the existing models. This approach often leads to biased comparisons which may badly misguide the practitioners when choosing a randomized response model for a practical problem at hand. This paper attempts a neutral comparison of six existing quantitative randomized response models using separate as well as joint measures of respondent-privacy and model-efficiency. The findings suggest that one model may perform better than the other model in terms of efficiency but may perform worse when other metrics of model quality are taken into account. The current study guides practitioners in choosing the right model for a given problem under a particular situation.

1. Introduction

Survey researchers often face refusals and false responses by respondents while collecting data on sensitive variables. A few examples of sensitive variables are: cheating in examination, illegal income, expenditure on luxury items, the amount of tax payable, and the number of cigarettes used per day etc. In order to deal with non-response on questions regarding sensitive characteristics, a useful procedure, popularly known as the randomized response technique, was suggested by Warner [1]. The original randomized response technique was designed for practical situations where the researcher collects data on binary-type qualitative variables. Warner [2] extended the original qualitative technique to the case of quantitative variables by introducing an additive-type scrambling. Motivated by Warner [2], a new variant of the quantitative scrambling techniques was suggested by Eichhorn and Hayre [3] by using a multiplicative-type scrambling variable.

Gupta et al. [4] devised a randomized strategy where the respondents are offered the choice to report either the true or a scrambled response. If a respondent opts for the scrambled response, he/she has to use an additive-type scrambling procedure to report the response. Later on, a multiplicative version of the Gupta et al. [4] procedure was suggested by Bar-Lev et al. [5]. Gjestvang and Singh [6] introduced an additive-type scrambling procedure for data collection on quantitative sensitive variables. Diana and Perri [7] suggested a quantitative randomized technique by utilizing both additive and multiplicative-type scrambling noises. Al-Sobhi et al. [8] introduced a new quantitative technique by using additive-subtractive scrambling noise. Gupta et al. [9] presented a measure for evaluation of randomized response models by quantifying the respondents’ privacy and efficiency as a single number. Narjis and Shabbir [10] suggested an efficient variant of the Gjestvang and Singh [6] technique for data collection on quantitative sensitive variables. Khalil et al. [11] analyzed the influence of measurement errors on the mean estimator of the sensitive quantitative variable. Recently, Gupta et al. [12] introduced a new quantitative randomized technique, showing the improvement over the Diana and Perri [7] technique with regard to the respondents’ privacy as well as the efficiency of model. In addition to the above studies, various aspects of the randomized response techniques have been analyzed by Yan et al. [13], Kalucha et al. [14], Young et al. [15], Zhang et al. [16], Murtaza et al. [17], Zapata et al. [18], and Saleem and Sanaullah [19] etc.

Chen et al. [20] presented the direct probability integral method to cope with Stochastic response and global dynamic analyses of structures with chaotic motion. Torkayesh et al. [21] presented a new method for minimizing air pollutants and to enhance environmental sustainability. Mondal et al. [22] analyzed the robustness of multilayer perceptron with regard to additive or multiplicative input noise. Silva et al. [23] recently studied a Bayesian analysis of the additive main effects and multiplicative interaction model utilizing three a priori distributions. Akgun et al. [24] studied multi-objective optimization for the rating of carbon-based additives in phase change materials using different criteria for evaluation.

Recently, Singh et al. [25] developed two new quantitative randomized response models which were shown to be better than the existing models in terms of efficiency as well as privacy protection level. In another study, Singh et al. [26] utilized Poisson distribution to develop a three-stage randomized response model which helps in estimating the mean number of persons having a sensitive attribute. The Singh et al. [26] model improved the efficiency of the existing models.

The research studies mentioned above have presented many variants of randomized response models. Some of these existing models utilize additive scrambling variables whereas others use multiplicative scrambling or mixed scrambling where both additive and multiplicative variables have been used by researchers. However, to our knowledge, no attempt has been made to conduct a detailed comparative analysis of the different versions of the existing randomized response models. The present study compares six existing randomized response models using three measures of model-quality: (i) model efficiency, (ii) measure of respondent-privacy, and (iii) joint measure of model-efficiency and respondent-privacy.

2. Selected existing models for comparative analysis

In the current study, we have chosen six existing quantitative randomized response models for comparative analysis. Out of the six selected models for comparison, two models are based on scrambled responses with no option for true response. The next two of the selected six models are optional models, including one additive and one multiplicative model. Finally, the last two of the selected six models are mixed models, that is, they use both additive and multiplicative scrambling noise. Before proceeding to comparative analysis, we introduce the notations used for the variables and their parameters, along with some distributional assumptions under which the comparative study is carried out.

Let the population under study contains N units and let a simple random sample of size n units is obtained with replacement. Let the quantitative sensitive variable under study be denoted by Y, and let the additive scrambling variable be denoted by S. We further assume that E(Yi) = μY, E(S) = θ, , , where and are the variances of the variable Y and S, respectively, for the population data. Further, let μY and θ be the population means of the variable Y and S, respectively. Likewise, let T be a multiplicative-type scrambling variable, such that E(T) =1, and . It is also assumed that all of the three variables work independently of each other. In this section, some existing quantitative scrambling techniques are presented.

2.1 Warner’s [2] model

The additive model suggested by Warner [2] is as follows: (1) where Z is the reported response. An unbiased mean estimator of Y based on Warner’s [2] model is given as: (2)

The variance of is given as: (3)

2.2 Eichhorn and Hayre [3] model

The responses reported by the respondents under the Eichhorn and Hayre [3] model, are as follows: (4)

An unbiased estimator of μY under the Eichhorn and Hayre [3] technique is: (5)

The variance of is given as: (6)

2.3 Gupta et al. [4] model

The reported responses under the Gupta et al. [4] model are given as: (7)

The mean estimator under the Gupta et al. [4] technique is given by: (8) where Z is defined in Eq (7). The variance of the mean estimator is as follows: (9)

2.4 Bar-Lev et al. [5] model

The responses reported by the respondents under the Bar-Lev et al. [5] technique, are as follows: (10)

The mean estimator under the Bar-Lev et al. [5] technique is given by: (11) where Z is defined in Eq (10). The variance of the mean estimator is as follows: (12)

2.5 Murtaza et al. [17] model

The reported responses under the Murtaza et al. [17] model, are given as: (13) where α is a constant. An unbiased mean estimator of the sensitive variable based on Murtaza et al. [17] model is given as: (14)

The Murtaza et al. [17] model is based on correlated scrambling variables. In order to make the comparison feasible, the assumption of uncorrelated variables is used, as the other models selected for comparison also use uncorrelated scrambling variables. The variance of is given by: (15)

2.6 Gupta et al. [12] model

Gupta et al. [12] introduced the following optional scrambling model: (16) where A is a constant, 0 < A < 1. An unbiased mean estimator on the basis of the Gupta et al. [12] model, is given by: (17)

The sampling variance of is as follows: (18)

3. Privacy and efficiency metrics

The Yan et al. [13] measure for quantifying the respondent-privacy is as follows: (19)

A higher the value of ∇ translates into a better level of respondents’ privacy provided by a given quantitative randomized response model.

The Gupta et al. [9] joint measure of efficiency and privacy-protection is as follows: (20)

From Eq (20), one may clearly observe that lower values of δ are desirable.

For the Warner’s [2] model, the measure of privacy can be expressed as: (21)

The joint measure of privacy and efficiency for the Warner’s [2] model is given as: (22)

For the Eichhorn and Hayre [3] quantitative technique, the measure of privacy can be obtained as: or, (23)

The joint measure of model-efficiency and respondent-privacy for the Eichhorn and Hayre [3] quantitative technique is given as: (24)

The privacy level offered by the Gupta et al. [4] model is: (25)

The joint measure of model-efficiency and respondent-privacy for the Gupta et al. [4] quantitative technique is given as: (26)

The measure of privacy for the Bar-Lev et al. [5] technique is: (27)

The joint measure of model-efficiency and respondent-privacy for the Bar-Lev et al. [5] quantitative technique is given as: (28)

The measure of privacy for the Murtaza et al. [17] model is given as: (29)

The joint measure of privacy and efficiency for the Murtaza et al. [17] model is given as: (30)

The privacy level offered by the Gupta et al. [12] model is given by: (31)

The joint measure of privacy and efficiency for the Gupta et al. [12] model is given as: (32)

4. Efficiency conditions

In this section, the mathematical conditions for the efficiency are derived.

4.1 Warner’s [2] model vs. Eichhorn and Hayre [3] model

Warner’s [2] model is more precise compared to the Eichhorn and Hayre [3] model, if or or (33)

Condition (33) may not always be true.

4.2 Gupta et al. [4] model vs. Bar-Lev et al. [5] model

The Gupta et al. [4] model will be more efficient than the Bar-Lev et al. [5] model, if or or (34)

Condition (34) is the same as condition (33). This is because the Gupta et al. [4] procedure is the optional variant of the Warner’s [2] model; and the Bar-Lev et al. [5] technique is simply the optional variant of the Eichhorn and Hayre [3] model.

4.3 Gupta et al. [12] model vs. Murtaza et al. [17] model

The Gupta et al. [12] model will be more efficient than the Murtaza et al. [17] model, if or or (35)

Condition (35) may not always be true.

5. Comparison of models

Table 1 presents the variances of the mean under various models for different choices of α, and A. Tables 2 and 3 show the values of ∇ and δ, respectively, for different models.

thumbnail
Table 1. Variances of the mean under different models for μY = 10, , n = 500.

https://doi.org/10.1371/journal.pone.0284995.t001

thumbnail
Table 2. Values of ∇ under different models for μY = 10, , n = 500.

https://doi.org/10.1371/journal.pone.0284995.t002

thumbnail
Table 3. Values of δ under different models for μY = 10, , n = 500.

https://doi.org/10.1371/journal.pone.0284995.t003

6. Discussion and conclusion

The current study is based on a neutral comparison of six existing quantitative randomized response models: (i) Warner’s [2] model, (ii) Eichhorn and Hayre [3] model, (iii) Gupta et al. [4] model, (iv) Bar-Lev et al. [5] model, (v) Murtaza et al. [17] model, and (vi) Gupta et al. [12] model. We presented the comparative analysis in a neutral manner, that is, our analysis doesn’t favor one model over the other, it just evaluates the strengths and weaknesses of the models chosen for the comparative analysis.

Table 1 shows that the Gupta et al. [4] model is the most efficient model, whereas the oldest model of Warner’s [2] is the second-best model in terms of efficiency. Moreover, one may also observe that the recently developed model of Murtaza et al. [17] is less precise than the much older model of Bar-Lev et al. [5]. It is also interesting to observe that the newest of the of the six selected models–the Gupta et al. [12] model, is less efficient than the oldest Warner’s [2] model for the selected choices of values of the parameters of scrambling variables. Further, the Murtaza et al. [17] model is also less efficient than the 50 years old Warner’s [2] model.

Model-efficiency is not the only criterion for assessing the quality of a given quantitative randomized response model. The respondents’ privacy-protection offered by the model is also equally important to judge the quality of a randomized response model. The respondent-privacy level can be measured by the value of ∇ with a higher value indicating better level of privacy protection offered by the model. Table 2 displays the values of ∇ for different choices of values of the parameters of scrambling variables. Table 2 indicates that the old Eichhorn and Hayre [3] model has the highest values of ∇, indicating the best level of privacy protection. It is also observed from Table 2 that the Gupta et al. [4] optional model has the smallest values of ∇, making it the worst of all six models.

Finally, as far as the overall quality of the six selected models is concerned, the δ values are displayed in Table 3. One may clearly observe that the Gupta et al. [12] model has the smallest δ values, making it the best among all six models.

We can conclude that a model which performs better on one measure of model-quality may perform worse on other measure. In practice, the researcher may prefer model-efficiency over respondent-privacy, and vice-versa, depending on the requirements of the survey. If model-efficiency alone is preferable, the researcher may choose one model over the other. Likewise, if respondent-privacy level is preferable over efficiency, then another model may be more appropriate. Since the joint measure, δ, assigns equal weights to efficiency and privacy, so it alone may not guide the researcher in choosing a particular randomized response model, as efficiency and privacy may not be equally important in practice. Thus, it is recommended to the researchers to keep in mind the particular situation at hand while choosing a randomized response model for data collection on sensitive variables.

7. Future research

This paper compares six existing quantitative randomized response models in a neutral manner. There is also a need to conduct a comparative study on qualitative randomized response models. This will help the researchers choose the appropriate model in situations where the variable of interest is of qualitative nature, such as gender, marital status, socio-economic class, etc.

References

  1. 1. Warner SL. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association 1965; 60(309): 63–69. pmid:12261830
  2. 2. Warner SL. The linear randomized response model. Journal of the American Statistical Association 1971; 66(336): 884–888.
  3. 3. Eichhorn BH, Hayre LS. Scrambled randomized response methods for obtaining sensitive quantitative data. Journal of Statistical Planning and Inference 1983; 7(4): 307–316.
  4. 4. Gupta S, Gupta B, Singh S. Estimation of sensitivity level of personal interview survey questions. Journal of Statistical Planning and Inference 2002; 100(2): 239–247.
  5. 5. Bar-Lev SK, Bobovitch E, Boukai B. A note on randomized response models for quantitative data. Metrika 2004; 60(3): 255–260.
  6. 6. Gjestvang CR, Singh S. An improved randomized response model: Estimation of mean. Journal of Applied Statistics 2009; 36(12): 1361–1367.
  7. 7. Diana G, Perri PF. A class of estimators of quantitative sensitive data. Statistical Papers 2011; 52(3): 633–650.
  8. 8. Al-Sobhi MM, Hussain Z, Al-Zahrani B, Singh HP, Tarray TA. Improved randomized response approaches for additive scrambling models. Mathematical Population Studies 2016; 23(4): 205–221.
  9. 9. Gupta S, Mehta S, Shabbir J, Khalil S. A unified measure of respondent privacy and model efficiency in quantitative rrt models. Journal of Statistical Theory and Practice 2018; 12(3): 506–511.
  10. 10. Narjis G, Shabbir J. An efficient new scrambled response model for estimating sensitive population mean in successive sampling. Communications in Statistics–Simulation and Computation 2021; 1–18. https://doi.org/10.1080/03610918.2021.1986528
  11. 11. Khalil S, Zhang Q, Gupta S. Mean estimation of sensitive variables under measurement errors using optional rrt models. Communications in Statistics–Simulation and Computation 2021; 50(5): 1417–1426.
  12. 12. Gupta S, Zhang J, Khalil S, Sapra P. Mitigating lack of trust in quantitative randomized response technique models. Communications in Statistics–Simulation and Computation 2022; 1–9. https://doi.org/10.1080/03610918.2022.2082477
  13. 13. Yan Z, Wang J, Lai J. An efficiency and protection degree-based comparison among the quantitative randomized response strategies. Communications in Statistics–Theory and Methods 2008; 38(3): 400–408.
  14. 14. Kalucha G, Gupta S, Shabbir J. A two-step approach to ratio and regression estimation of finite population mean using optional randomized response models. Hacettepe Journal of Mathematics and Statistics 2016; 45: 1819–1830.
  15. 15. Young A, Gupta S, Parks R. A binary unrelated-question rrt model accounting for untruthful responding. Involve, A Journal of Mathematics 2019; 12(7): 1163–1173.
  16. 16. Zhang Q, Khalil S, Gupta S. Mean estimation in the simultaneous presence of measurement errors and non-response using optional RRT models under stratified sampling. Journal of Statistical Computation and Simulation 2021; 91(17): 3492–3504.
  17. 17. Murtaza M, Singh S, Hussain Z. “An innovative optimal randomized response model using correlated scrambling variables” Journal of Statistical Computation and Simulation 2020; 1–17. https://doi.org/10.1080/00949655.2020.1791118
  18. 18. Zapata Z, Sedory SA, Singh S. An innovative improvement in Warner’s randomized response device for evasive answer bias. Journal of Statistical Computation and Simulation 2022. https://doi.org/10.1080/00949655.2022.2101649
  19. 19. Saleem I, Sanaullah A. Estimation of mean of a sensitive variable using efficient exponential-type estimators in stratified sampling. Journal of Statistical Computation and Simulation 2022; 92(2): 232–248. https://doi.org/10.1080/00949655.2021.1940182
  20. 20. Chen H, Zhao J, Meng Z, Chen G, Yang D. Stochastic dynamic analysis of nonlinear MDOF systems with chaotic motion under combined additive and multiplicative excitation. Communications in Nonlinear Science and Numerical Simulation 2023; 118. https://doi.org/10.1016/j.cnsns.2022.107034
  21. 21. Torkayesh AE, Alizadeh R, Soltanisehat L, Torkayesh SE, Lund PD. A comparative assessment of air quality across European countries using an integrated decision support model. Socio-Economic Planning Sciences 2022; 81. https://doi.org/10.1016/j.seps.2021.101198
  22. 22. Mondal R, Pal T, Dey P. A Hybrid Regularized Multilayer Perceptron for Input Noise Immunity. IEEE Transactions on Artificial Intelligence 2023; 1, 1–12. https://doi.ieeecomputersociety.org/10.1109/TAI.2022.3225124
  23. 23. Silva CPd, Mendes CTE, Silva AQd, Oliveira LAd, Von Pinho RG, Balestre M. Use of the reversible jump Markov chain Monte Carlo algorithm to select multiplicative terms in the AMMI-Bayesian model. PLoS ONE 2023; 18(1). pmid:36595526
  24. 24. Akgün H, Yapıcı E, Özkan A, Günkaya Z, Banar M. A combined multi-criteria decision-making approach for the selection of carbon-based nanomaterials in phase change materials Journal of Energy Storage 2023; 60. https://doi.org/10.1016/j.est.2023.106619
  25. 25. Singh C, Kamal M, Singh GN, Kim JM. Study to Alter the Nuisance Effect of Non-Response Using Scrambled Mechanism. Risk Management and Healthcare Policy 2021; 1595–1613. pmid:33889040
  26. 26. Singh C, Singh GN, Kim JM. A randomized response model for sensitive attribute with privacy measure using Poisson distribution. Ain Shams Engineering Journal 2021; 12(4), 4051–4061.