Abstract
Commercial health plans need member racial/ethnic information to address disparities, but often lack it. We incorporate the U.S. Census Bureau’s latest surname list into a previous Bayesian method that integrates surname and geocoded information to better impute self-reported race/ethnicity. We validate this approach with data from 1,921,133 enrollees of a national health plan. Overall, the new approach correlated highly with self-reported race-ethnicity (0.76), which is 19% more efficient than its predecessor (and 41% and 108% more efficient than single-source surname and address methods, respectively, P < 0.05 for all). The new approach has an overall concordance statistic (area under the Receiver Operating Curve or ROC) of 0.93. The largest improvements were in areas where prior performance was weakest (for Blacks and Asians). The new Census surname list accounts for about three-fourths of the variance explained in the new estimates. Imputing Native American and multiracial identities from surname and residence remains challenging.
Similar content being viewed by others
Notes
The specific counts that were suppressed were also known.
Exploratory analyses (not shown) demonstrated better overall predictive performance with this approach than with several alternatives we considered.
We present the results treating surname information as the prior that is updated by the geocoded information; however, we would obtain the same results if we treated the geocoded information as the prior and updated with the surname data.
Because the racial/ethnic categories are mutually exclusive, estimates for the groups are negatively correlated.
A squared correlation of 0.49 between estimated race/ethnicity and self-reported implies approximately 49% efficiency relative to known race/ethnicity for estimating a disparity between two racial/ethnic groups under the assumptions in that paper.
References
Abrahamse, A.F., Morrison, P.A., Bolton, N.M.: Surname analysis for estimating local concentration of Hispanics and Asians. Popul. Res. Policy Rev. 13(4), 383–398 (1994). doi:10.1007/BF01084115
Boston Public Health Commission: Data Collection Regulation. Boston, MA (2006)
California State Senate: Senate Bill Analysis of SB 853. Sacramento, CA (2007)
Elliott, M.N., Finch, B.K., Klein, D.J., Ma, S., Do, P., Beckett, M.K., Orr, N., Lurie, N.: Sample designs for measuring the health of small racial ethnic subgroups. Stat. Med. 27(20), 4016–4029 (2008a). doi:10.1002/sim.3244
Elliott, M.N., Fremont, A.M., Morrison, P.A., Pantoja, P., Lurie, N.: A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity. Health Serv. Res. 43(5p1), 1722–1736 (2008b)
Elliott, M.N., Haviland, A.: Use of a web-based convenience sample to supplement and improve the accuracy of a probability sample. Surv. Methodol. 33(2), 211–215 (2007)
Falkenstein, M.R.: The Asian and Pacific Islander surname list: as developed from Census 2000. In: Joint Statistical Meetings, New York, NY (2002)
Fiscella, K., Fremont, A.M.: Use of geocoding and surname analysis to estimate race and ethnicity. Health Serv. Res. 41(4 Pt 1), 1482–1500 (2006)
Fremont, A.M., Bierman, A.S., Wickstrom, S.L., Bird, C.E., Shah, M.M., Escarce, J.J., Rector, T.S.: Use of indirect measures of race/ethnicity and socioeconomic status in managed care settings to identify disparities in cardiovascular and diabetes care quality. Health Aff. 24(2), 516–526 (2005). doi:10.1377/hlthaff.24.2.516
Fremont, A.M., Lurie, N.: The Role of Race and Ethnic Data Collection in Eliminating Health Disparities. National Academies Press, Washington, DC (2004)
Ghosh-Dastidar, B., Elliott, M.N., Haviland, A., Karoly, L.: Composite estimates from incomplete and complete frames for minimum-MSE estimation in a rare population: an application for families with young children. Public Opin. Q. (in press)
Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification. Mach. Learn. 45(2), 171–186 (2001). doi:10.1023/A:1010920819831
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)
Institute of Medicine: Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care. National Academies Press, Washington, DC (2002)
Jirousek, R., Preucil, S.: On the effective implementation of the iterative proportional fitting procedure. Comput. Stat. Data Anal. 19(2), 177–189 (1995). doi:10.1016/0167-9473(93)E0055-9
Kestenbaum, B.B., Ferguson, R., Elo, I., Turra, C.: Hispanic identification. In: Southern Demographic Association Meetings, New Orleans, LA (2000)
Lauderdale, D., Kestenbaum, B.B.: Asian American ethnic identification by surname. Popul. Dev. Rev. 19(3), 283–300 (2000)
Logan, J.: Ethnic Diversity Grows, Neighborhood Integration Lags Behind. Lewis Mumford Center, University at Albany, Albany, NY (2001)
Massey, D.S., Denton, N.A.: Hypersegregation in U.S. metropolitan areas: black and hispanic segregation along five dimensions. Demography 26(3), 373–391 (1989). doi:10.2307/2061599
McCaffrey, D., Elliott, M.N.: Power of tests for a dichotomous independent variable measured with error. Health Serv. Res. 43(3), 1085–1101 (2008). doi:10.1111/j.1475-6773.2007.00810.x
Morrison, P.A., Word, D.L., Coleman, C.D.: Using first names to estimate racial proportions in populations. In: Population Association of America Annual Meeting, Washington, DC (2001)
National Health Plan Collaborative: Phase 1 summary report: reducing racial and ethnic disparities improving quality of health care. Hamilton, NJ (2006)
National Research Council: Eliminating Health Disparities: Measurement and Data Needs. National Academies Press, Washington, DC (2004)
Perkins, R.C.: Evaluating the Passel-Word Spanish Surname List: 1990 Decennial Census Post Enumeration Survey Results. U.S. Census Bureau, Population Division (1993)
Schenker, N., Parker, J.D.: From single-race reporting to multiple-race reporting: using imputation methods to bridge the transition. Stat. Med. 22(9), 1571–1587 (2003). doi:10.1002/sim.1512
U.S. Office of Management of Budget: Revisions to the standards for the classifications of federal data on race and ethnicity. Notice. Federal Register, Washington, DC (1997)
Word, D.L., Coleman, C.D., Nunziata, R., Kominski, R.: Demographic aspects of surnames from Census 2000. Available at: http://www.census.gov/genealogy/www/surnames.pdf (2008). Accessed 30 July 2008
Acknowledgments
This study was supported, in part, by contract 282-00-0005, Task Order 13 from DHHS: Agency for Healthcare Research and Quality. Additional funding and support was provided by RWJF and the Brookings Institute. Marc Elliott is supported in part by the Centers for Disease Control and Prevention (CDC U48/DP000056). The authors thank Bryan GeoDemographics for their work in modifying SF1 Census files for these purposes and Jacquelyn Chou for assistance with manuscript preparation. We thank plans participating in the National Health Plan Collaborative, particularly Aetna, for sharing selected data to help improve efforts to address disparities in care and improve overall quality.
Disclaimer
The contents of the publication are solely the responsibility of the authors and do not necessarily reflect the official views of the DHHS.
Author information
Authors and Affiliations
Corresponding author
Additional information
An erratum to this article can be found at http://dx.doi.org/10.1007/s10742-009-0055-1
Rights and permissions
About this article
Cite this article
Elliott, M.N., Morrison, P.A., Fremont, A. et al. Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities. Health Serv Outcomes Res Method 9, 69–83 (2009). https://doi.org/10.1007/s10742-009-0047-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10742-009-0047-1