Abstract
OBJECTIVES: Postal codes are often the only available geographic identifiers in many sources of health data in Canada. In order to conduct geographic analyses, postal codes are routinely geocoded to census geography to link to ecological data. Despite common use of this method, the extent of geographic misclassification errors is poorly understood. We estimated misclassification errors in the geocoding of postal codes to assign census geography in Nova Scotia, Canada.
METHODS: We examined differences between counts and match rates for postal-code geocoded and actual locations of buildings in Nova Scotia at two census administrative area levels: dissemination areas (DAs) and census subdivisions (CSDs). Actual locations were based on the data collected by the provincial government containing actual latitude/longitude of buildings. Variation in misclassification by rurality, using Statistics Canada’s classification, was also assessed.
RESULTS: Outside two urban areas (Halifax Metro and Sydney) which had <10% differences in counts, many DAs had >30% differences. Match rates showed similar patterns, with the vast majority of non-urban DAs having <40% match rates. Even in major urban areas, 10% of DAs had large misclassification errors. Misclassification errors at the CSD level were still too great to estimate counts or rates without further area aggregation.
CONCLUSION: Routine use of postal code geocoding should be replaced with geocoding of location information using additional identifiers such as civic addresses or latitude and longitude. If data holders did this in-house before providing data to researchers, the accuracy and capacity of geographic analysis would be enhanced while protecting confidentiality.
Résumé
OBJECTIFS: Les codes postaux sont souvent les seuls identifiants géographiques disponibles dans de nombreuses sources de données sanitaires au Canada. Afin de procéder à des analyses géographiques, les codes postaux sont habituellement géocodés à la géographie du recensement pour être reliés aux données écologiques. Bien que ce soit une méthode couramment utilisée, on connaît mal l’étendue des erreurs de classification géographique. Nous avons estimé les erreurs de classification dans le géocodage des codes postaux pour fins d’association à la géographie du recensement en Nouvelle-Écosse, au Canada.
MÉTHODE: Nous avons examiné les écarts entre les numérations et les taux d’appariement d’emplacements géocodés selon le code postal et d’emplacements réels de bâtiments en Nouvelle-Écosse à deux niveaux de régions administratives du recensement: les aires de diffusion (AD) et les subdivisions de recensement (SDR). Les emplacements réels ont été déterminés selon les données recueillies par le gouvernement provincial indiquant la latitude et la longitude réelles des bâtiments. Nous avons aussi évalué la variation des erreurs de classification par ruralité à l’aide de la classification de Statistique Canada.
RÉSULTATS: Sauf dans deux agglomérations urbaines (Sydney et la région métropolitaine de Halifax) où il y avait <10 % d’écarts dans les numérations, beaucoup d’AD affichaient des écarts >30 %. Les tendances étaient semblables pour les taux d’appariement: la très grande majorité des AD non urbaines affichaient des taux d’appariement <40 %. Même dans les grandes agglomérations urbaines, 10 % des AD comportaient d’importantes erreurs de classification. Les erreurs de classification à l’échelle des SDR étaient encore trop importantes pour estimer les numérations ou les taux sans un regroupement plus poussé des zones.
CONCLUSION: L’utilisation habituelle du géocodage par code postal devrait être remplacée par le géocodage de l’information de localisation à l’aide d’identifiants supplémentaires, comme les adresses de voirie ou la latitude et la longitude. Si les détenteurs de données faisaient cela à l’interne avant de fournir leurs données aux chercheurs, l’exactitude et la capacité des analyses géographiques seraient rehaussées, et la confidentialité des données serait protégée.
Similar content being viewed by others
References
Krieger N, Waterman P, Lemieux K, Zieler S, Hogan JW. Evaluating the accuracy of geocoding in public health research. Am J Public Health 2001; 90:1114–16.
Rushton G, Armstrong MP, Gittler J, Greene BR, Pavlik CE, West MM, et al. Geocoding in cancer research: A review. Am J Prev Med 2006;30(2S):S16–24. doi: 10.1016/j.amepre.2005.09.011.
Auger N, Daniel M, Platt RW, Wu Y, Luo ZC, Choiniere R. Association between perceived security of the neighbourhood and small-for-gestational-age birth. Paediatr Perinat Epidemiol 2008;22(5):467–77. doi: 10.1111/j.1365-3016.2008.00959.x.
Wilkins R, Peters PA. PCCF+ Version 5K User’s Guide. Automated Geographic Coding Based on the Statistics Canada Postal Code Conversion Files, Including Postal Codes Through May 2011. Catalogue no. 82F0086-XDB. Ottawa, ON: Health Analysis Division, Statistics Canada, 2012.
Peller P. An Analysis of the Postal Code Conversion File’s Use in Research. Calgary, AB: University of Calgary, 2011; 1–24.
Jacquez GM. A research agenda: Does geocoding positional error matter in health GIS studies? Spat Spatio-temporal Epidemiol 2012;3:7–16. doi: 10.1016/j. sste.2012.02.002.
Bell NJ, Schuurman N, Morad Hameed S. A small-area population analysis of socioeconomic status and incidence of severe burn/fire-related injury in British Columbia, Canada. Burns 2009;35(8):1133–41. PMID: 19553025. doi: 10.1016/j.burns.2009.04.028.
Wang C, Guttmann A, To T, Dick PT. Neighborhood income and health outcomes in infants: How do those with complex chronic conditions fare? Arch Pediatr Adolesc Med 2009;163(7):608–15. PMID: 19581543. doi: 10.1001/ archpediatrics.2009.36.
Zhang X, Onufrak S, Holt JB, Croft JB. A multilevel approach to estimating small area childhood obesity prevalence at the census block-group level. Prev Chronic Dis 2013;10:E68. doi: 10.5888/pcd10.120252.
Terashima M, Guernsey JR, Andreou P. What type of rural? Assessing the variations in life expectancy at birth at small area-level for a small population province using classes of locally defined settlement types. BMC Public Health 2014;14:162. PMID: 24524307. doi: 10.1186/1471-2458-14-162.
Pampalon R, Hamel D, Gamache P. Recent changes in the geography of social disparities in premature mortality in Québec. Soc Sci Med 2008;67(8):1269–81. PMID: 18639966. doi: 10.1016/j.socscimed.2008.06.010.
Matheson FI, Moineddin R, Glazier RH. The weight of place: A multilevel analysis of gender, neighborhood material deprivation, and body mass index among Canadian adults. Soc Sci Med 2008;66(3):675–90. PMID: 18036712. doi: 10.1016/j.socscimed.2007.10.008.
Terashima M, Rainham DGC, Levy AR. A small-area analysis of inequalities in chronic disease prevalence across urban and non-urban communities in the Province of Nova Scotia, Canada, 2007–2012. BMJ Open 2014; 4(e004459):1–10.
Armstrong B. Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occup Environ Med 1998; 55(10):651–56. PMID: 9930084. doi: 10.1136/oem.55.10.651.
Rhomberg L, Chandalia J, Long J, Goodman J. Measurement error in environmental epidemiology and the shape of exposure-response curves. Crit Rev Toxicol 2011;41(8):651–71. PMID: 21823979. doi: 10.3109/10408444. 2011.563420.
Government of Nova Scotia. Nova Scotia Civic Address Users Guide. Halifax, NS: GeoNOVA, 2015.
Statistics Canada. Postal Code Conversion File Plus (PCCF+) Reference Guide. Catalogue no. 82-E0086-XDB 6A. Ottawa, ON: Statistics Canada, 2014.
Statistics Canada. 2011 Census Dictionary. Catalogue no. 98-301-X, 2012. Available at: http://www12.statcan.gc.ca/census-recensement/2011/ref/dict/index-eng.cfm (Accessed December 10, 2015).
Ross NA, Tremblay S, Graham K. Neighbourhood influences on health in Montreal, Canada. Soc Sci Med 2004;28:443–78.
Goldberg DW, Jacquez GM. Advances in geocoding for the health sciences. Spat Spatio-temporal Epidemiol 2012;3:1–5. doi: 10.1016/j.sste.2012.02.001.
Census of Population. Catalogue no. 12-581-X. Available at: http://www.statcan.gc.ca/pub/12-581-x/2012000/pop-eng.htm (Accessed November 30, 2015).
Iburi S, Fujita J, Yajima H, Kakuda H, Sakamoto M, Matsumura A. The intervention against an outbreak of pulmonary tuberculosis in the dormitory of construction laborers - Connection with approaches from public health, medical treatment, social welfare, and labor management. Kekkaku 2001; 76(11):691–98. PMID: 11766360.
Ratcliffe JH. Geocoding crime and a first estimate of a minimum acceptable hit rate. Int J Geogr Inform Sci 2004;18(1):61–72. doi: 10.1080/ 13658810310001596076.
DMTI Spatial. Platinum Postal Code Suite v2011.3. Markham, ON: Multiple Enhanced Postalcodes (MEP), 2011.
Kephart G, Asada Y, Atherton F, Burge F, Campbell L-A, Dowling L, et al. Small Area Variation in Rates of High-Cost Healthcare Use Across Nova Scotia. Halifax, NS: Maritime SPOR Support Unit, 2016.
Fuller D, Shareck M. Canada Post community mailboxes: Implications for health research. Can J Public Health 2014;105(6):e453-55.
Shah TI, Bell S, Wilson K. Geocoding for public health research: Empirical comparison of two geocoding services applied to Canadian cities. Can Geogr 2014;58(4):400–17. doi: 10.1111/cag.12091.
Office for National Statistics UK. Guidance and Methodology, Super Output Areas. ONS, London, UK.
Author information
Authors and Affiliations
Corresponding author
Additional information
Conflict of Interest: None to declare
Rights and permissions
About this article
Cite this article
Terashima, M., Kephart, G. Misclassification errors from postal code-based geocoding to assign census geography in Nova Scotia, Canada. Can J Public Health 107, e424–e430 (2016). https://doi.org/10.17269/CJPH.107.5459
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.17269/CJPH.107.5459