Abstract
This paper presents a newly available technique to adjust for bias in non-probabilistically selected samples. To date, applications of this innovative technique—termed entropy balancing—have been restricted to evaluation settings, where the goal is to reduce model dependence prior to the estimation of treatment effects. In a novel application, we demonstrate the technique’s utility in cases where the goal is to correct for sample bias originating in coverage error. The appeal of entropy balancing in this latter setting lies in its capacity to optimise the twin goals of improved balance in covariate distribution and maximum retention of information. Entropy balancing combines the opportunity to incorporate a large set of moment conditions in the calculation of weights, with the ability to directly implement exact balance. The technique thus builds upon the theoretical appeal of the more widely known and applied propensity score adjustment method, while addressing that method’s practical limitations. We demonstrate the utility of the entropy balancing technique empirically, through an example using the Young Lives Project survey data for rural Andhra Pradesh, South India. We conclude by summarising the potential of this procedure to contribute to robust survey-based research more widely.


Similar content being viewed by others
Notes
See Hainmueller (2012) for a comprehensive presentation of the theoretical framework.
In cases where only marginal population probabilities are available (from summarised census data for example) the ebalance procedure allows for values to be manually specified to reweight the non-probability sample covariates in line with available known population targets.
All analysis is conducted in STATA 13 software; Hainmueller’s “ebalance” suite of commands to perform the entropy balance procedure can be imported to STATA in the usual manner, i.e. “ssc install ebalance, all replace”.
The survey was sponsored by the UK Department for International Development (DFID), and is led by the Oxford Department of International Development at the University of Oxford, in collaboration with academic institutions in each of the four project countries.
In the second round of data collection all individuals resident in a selected household were included in the survey.
Andersson (1996) discusses the general method of sentinel site sampling in some detail.
At the all India level a total of 124,680 households and 602,833 individuals took part in the survey for schedule 10 of the 61st round of the NSS.
Household class is calculated on the basis of household landholding and dominant labour relations.
The default iteration number is 20, the default tolerance level 0.015, and both can be increased if convergence fails.
References
Abadie, A., Imbens, G.W.: Bias-corrected matching estimators for average treatment effects. J. Bus. Econ. Stat. 29(1), 1–11 (2011)
Andersson, N.: Evidence-based planning: the philosophy and methods of sentinel community surveillance. Economic Development Institute of the World Bank, Washington (1996)
Duffy, B., Terhanian, G., Bremer, J., Smith, K.: Comparing data from online and face-to-face surveys. Int. J. Market Res. 47(6), 615–639 (2005)
Frölich, M.: Propensity score matching without conditional independence assumption—with an application to the gender wage gap in the United Kingdom. Econ. J. 10(2), 359–407 (2007)
Galab, S., Reddy, G. M., Antony, P., McCoy, A., Ravi, C., Raju, D. S., Mayuri, K., Reddy, P. P.: Young Lives Preliminary Country Report: India. Young Lives, Oxford (2003). http://www.younglives.org.uk/files/country-reports/country-report-1-india-2003. Retrieved 12 Oct 2013
Hainmueller, J.: Entropy balancing for causal effects: a multivariate reweighting method to produce balanced samples in observational studies. Polit. Anal. 20(1), 25–46 (2012)
Hainmueller, J., Xu, Y.: Ebalance: a stata package for entropy balancing. J. Stat. Softw. 54(7), (2013). http://www.jstatsoft.org/v54/i07/paper. Retrieved 3 Mar 2014
Heckman, J.J., Ichimura, H., Todd, P.: Matching as an econometric evaluation estimator. Rev. Econ. Stud. 65(2), 261–294 (1998)
Ho, D.E., Imai, K., King, G., Stuart, E.A.: Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit. Anal. 15(3), 199–236 (2007)
Isaksson, A., Forsman, G.: A comparison between using the web and using the telephone to survey political opinions. In: Proceedings of the section on survey research methods, American Statistical Association, Alexandria (2003)
Kalton, G.: Models in the practice of survey sampling (revisited). J. Off. Stat. 18(2), 129–154 (2002)
Kumra, N.: An assessment of the young lives sampling approach in Andhra Pradesh, India. Young lives technical note 2. Young lives, Oxford (2008). http://www.younglives.org.uk/files/technical-notes/an-assessment-of-the-young-lives-sampling-approach-in-andhra-pradesh-india. Retrieved 11 Mar 2014
Lee, S.: Propensity score adjustment as a weighting scheme for volunteer panel web surveys. J. Off. Stat. 22(2), 329–349 (2006)
MSPI: Annexure II—population projection. National Sample Survey Organisation, New Delhi (2006)
MSPI: How to use unit level data. Ministry of Statistics and Programme Implementation, New Delhi (2008)
NSSO: Note on estimation procedure of NSS 61st round. Government of India, New Delhi (2004)
Rivers, D.: Sampling for web surveys. Joint statistical meetings, Salt Lake City (2012). http://www.laits.utexas.edu/txp_media/html/poll/files/Rivers_matching.pdf. Retrieved 1 Dec 2012
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)
Schonlau, M., Van Soest, A., Kapteyn, A., Couper, M.: Selection bias in web surveys and the use of propensity scores. Sociol. Methods Res. 37(3), 291–318 (2009)
Sekhon, J.S.: Opiates for the matches: matching methods for causal inference. Annu. Rev. Polit. Sci. 12, 487–508 (2009)
Steinmetz, S.M., Tijdens, K.: Can weighting improve the representativeness of volunteer online panels? Insights from the German wage indicator data. Concepts and Methods 5(1), 7–11 (2009). http://www.concepts-methods.org/newsletters/20091215_55_C&M_Newsletter_2009_2.pdf. Retrieved 9 Dec 2012
Steinmetz, S.M., Bianchi, A., Tijdens, K.G., Biffignandi, S.: Improving web survey quality—potentials and constraints of propensity score weighting. In: Callegaro, M., Baker, R., Bethlehem, J., Göritz, A., Krosnick, J., Lavrakas, P. (eds.) Online panel research: a data quality perspective. Wiley, New York (2014)
Stuart, E., Cole, S.R., Cole, Stephen R., Bradshaw, C.P., Leaf, P.J.: The use of propensity scores to assess the generalizability of results from randomized trials. J. R. Stat. Soc. 174(2), 369–386 (2011)
UN: Household sample surveys in developing and transition countries. New York (2006). http://unstats.un.org/unsd/hhsurveys/. Retrieved 2 Mar 2014
Vavreck, L., Rivers, D.: The 2006 cooperative congressional election study. J. Elect. 18(4), 355–366 (2008)
Wilson, I., Huttly, S.R., Fenn, B.: A case study of sample design for longitudinal research: Young Lives. Int. J. Soc. Res. Methodol. 9(5), 351–365 (2006)
Yoshimura, O.: Adjusting responses in a non-probability web panel survey by the propensity score weighting. In: Proceedings of the American Statistical Association (2004). http://www.amstat.org/sections/srms/Proceedings/y2004f.html. Retrieved 11 Mar 2014
Zhao, Z.: Sensitivity of propensity score methods to the specifications. IZA forschungsinstitut zur Zukunft der Arbeit (Institute for the Study of Labour). Discussion Paper No. 1873 (2005) Bonn. Available at ftp.iza.org/dp1873.pdf
Acknowledgments
We are grateful to Natalie Shlomo for her detailed comments on an earlier draft. This paper is an outcome of research funded by the UK Economic and Social Research Council (ESRC). Grant number: ES/G015473/1.
Author information
Authors and Affiliations
Corresponding author
Appendix : A condensed version of the theoretical framework for entropy balancing
Appendix : A condensed version of the theoretical framework for entropy balancing
Under ebalance, weights are selected to minimize the entropy distance metric:
where w i is the weight selected for each non-random sample units.
Di \( \in \left\{ {1,0} \right\} \) is a binary indicator coded 1 unit i is drawn from the reference sample or 0 if it is drawn from the non-random sample. q i = 1/n0 and is a base weight.
The selection of weights is subjected to the balance constraints defined in Eq. 2.1, the normalising constraints defined in Eq. 2.2, and the non-negativity constraints defined in Eq. 2.3:
X is a matrix that contains the data of J exogenous pre-treatment covariates with X ij denoting the values of the j-th covariate characteristic for unit i.
\( C_{ri} (X_{i} ) = m_{r} \) describes a set of R balance constraints imposed on the covariate moments of the reweighted non-random sample group.
The ebalance approach accommodates high dimensionality to assign one weight to each control unit. The weights that solve the entropy balancing scheme are computed from a dual problem that is unconstrained and reduced to a system of non-linear equations in R Langrange multipliers. The dual problem is given by:
where \( {\text{Z = }}\left\{ {\lambda_{1} , \ldots ,\lambda_{\text{R}} } \right\}^{\prime } \) is a vector (Z*) of Langrange multiplier for the balance constraints, rewritten in matrix form as CW=M with the \( ({\text{R}}\; \times \;{\text{n}}_{0} ) \) constraint matrix, C=[c1(Xi),…,cR(Xi)]′, and the moment vector, \( M\, = \,[m_{1} , \ldots ,m_{R} ]^{\prime} \). Thevector Z* that solves the dual problem also the primal problem. The solution weights are recover using:
An iterative Levenberg–Marquardt scheme exploits second order information to solve the dual problem:
Here,I is a scalar denoting the step length. The optimal step length (either the full newton step or I) is selected for each iteration (Hainmueller and Xu 2013).
Rights and permissions
About this article
Cite this article
Watson, S.K., Elliot, M. Entropy balancing: a maximum-entropy reweighting scheme to adjust for coverage error. Qual Quant 50, 1781–1797 (2016). https://doi.org/10.1007/s11135-015-0235-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11135-015-0235-8