Record Linkage Reconciliation of Arlington Department of Human Services Administrative Data Using Potts Models

Main Article Content

Ian Crandell
Aaron Schroeder
Dave Higdon
Michael-dharma Irwin

Abstract

Situated at the nexus of federal, state, and local governments, the Arlington Department of Human Services (DHS) receives service utilization data from a multitude of different sources. Because of their “no wrong door” policy, customers can sign up for any DHS service from any DHS department. A practical consequence of this is that a single person can appear as multiple records from multiple databases with no unambiguous key between these records. Merging these records requires a probabilistic linkage approach. Classical approaches to record linkage, such as the method of Felligi and Sunter, consider each possible pair of records between databases and assigning link probabilities to each one. A drawback of considering pairwise links alone is that sometimes the transitive nature of links is violated. In order to better handle such information clashes, we propose a Bayesian linkage method that considers a large set of possible pairs at once. At the heart of this approach is a Potts model representation that tracks which records are assigned to the same individual. This allows us to assign probabilities to the various reconciliations of inconsistent linkage assignments.

Situated at the nexus of federal, state, and local governments, the Arlington Department of Human Services (DHS) receives service utilization data from a multitude of different sources. Because of their “no wrong door” policy, customers can sign up for any DHS service from any DHS department. A practical consequence of this is that a single person can appear as multiple records from multiple databases with no unambiguous key between these records. Merging these records requires a probabilistic linkage approach. Classical approaches to record linkage, such as the method of Felligi and Sunter, consider each possible pair of records between databases and assigning link probabilities to each one. A drawback of considering pairwise links alone is that sometimes the transitive nature of links is violated. In order to better handle such information clashes, we propose a Bayesian linkage method that considers a large set of possible pairs at once. At the heart of this approach is a Potts model representation that tracks which records are assigned to the same individual. This allows us to assign probabilities to the various reconciliations of inconsistent linkage assignments.

Article Details

How to Cite
Crandell, I., Schroeder, A., Higdon, D. and Irwin, M.- dharma (2018) “Record Linkage Reconciliation of Arlington Department of Human Services Administrative Data Using Potts Models”, International Journal of Population Data Science, 3(5). doi: 10.23889/ijpds.v3i5.1061.