C-A5-04: A Simple, Accurate SAS Algorithm for Electronic Abstraction of Race from Digitized Progress Notes

  1. Donald Bachman, MS4
  1. 1Kaiser Permanente Georgia;
  2. 2University of Massachusetts;
  3. 3Group Health;
  4. 4Kaiser Permanente Northwest

Abstract

Background and Aims: Individual-level race/ethnicity is important for research into causes and consequences of health disparities. For various non-research reasons, it has rarely been collected on enrollees in integrated delivery systems. Individual-level race/ethnicity can be found in medical record documentation. Manual abstraction on large numbers of medical records is costly. We developed a simple SAS algorithm for electronic abstraction of white and African American race from digitized progress notes and evaluated its accuracy by comparing electronically abstracted race with other data sources.

Methods: A simple SAS algorithm, based on text search strings (e.g. white male, African American woman), scanned digitized progress notes for provider face-to-face visits from 2005 through July 2009 in Kaiser Permanente Georgia’s (KPG) and Group Health Cooperative’s (GHC) electronic medical record systems. White and African American race was abstracted. If the patient had more than 1 visit with abstracted race, the patient was classified using the earliest visit. Abstracted race was linked at the individual-level to survey datasets with self-reported race (2005 survey of working age adults, 2007 survey of adults with hypertension, 2000–2005 Medicare surveys) and mother’s race on 2000–2006 birth certificates. White and African American race was abstracted from GHC progress notes from 2005 through July 2009 using the same algorithm and compared to self-reported race on health risk appraisals. Accuracy of the SAS algorithm was assessed by overall proportion matching race from the other datasets, Cohen’s kappa, and McNemar’s test.

Results: White or African American race was electronically abstracted for 56,261 KPG and 6,427 GHC enrollees. Abstracted race matched race from the other datasets in 97–99% of enrollees. Cohen’s kappas were highly significant (p<0.05), ranging from 0.939 ± 0.013 (N=657 matches with hypertension survey records) to 0.994 ± 0.006 (N=518 matches with Medicare surveys). McNemar’s tests were marginally significant for several datasets; and, misclassification was not systematically biased toward white or African American race.

Conclusions: The SAS algorithm was highly accurate in electronically abstracting white and African American race from digitized progress notes of provider visits at KPG and GHC. We are expanding the evaluation to include additional sites and additional race/ ethnic categories (e.g. Asian, Hispanic).

| Table of Contents