Yonsei Med J. 2019 Dec;60(12):1216-1222. English.
Published online Nov 21, 2019.
© Copyright: Yonsei University College of Medicine 2019
Brief Communication

The Multi-Institutional Health Screening Records Database of South Korea: Description and Evaluation of Its Characteristics

Yunha Noh,1,* Han Eol Jeong,1,* Hye-Jun Kim,1 Hanju Ko,2 Eun-Hee Nah,3 and Ju-Young Shin1
    • 1School of Pharmacy, Sungkyunkwan University, Suwon, Korea.
    • 2IT Development & Support Office, Seoul, Korea.
    • 3Health Promotion Research Institute, Korea Association of Health Promotion, Seoul, Korea.
Received August 05, 2019; Revised September 18, 2019; Accepted October 04, 2019.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

This study sought to describe and to evaluate the characteristics of the Health Screening Records Database (HSRD) of the Korea Association of Health Promotion as a data source for epidemiologic studies. The HSRD was compared to a National Health Insurance Service-Health Screening Cohort (NHIS-HEALS) database for 2015. Common variables between the databases were selected, and sex-based analyses were conducted. The HSRD showed statistical concordance when NHIS-HEALS estimates fell within the HSRD estimate's 95% confidence interval. The HSRD and NHIS-HEALS included 946461 and 111690 participants in health screening programs, respectively. Compared to the NHIS-HEALS, the HSRD had more female (55.2% vs. 42.6%) but fewer older adult participants (34.4% vs. 51.2%). Virtually all variables had clinical concordance, with some having statistical concordance as well, among both general and life-transition program participants. The HSRD comprised more clinical information over a wider age range in contrast to the NHIS-HEALS, while showing clinical concordance. Providing more comprehensive clinical data, the HSRD may serve as an alternative resource for epidemiologic studies.

Keywords
Health screening records database; physical examination; database; characteristics; observational study

The Health Screening Records Database (HSRD) of the Korea Association of Health Promotion (KAHP) is South Korea's largest multi-institutional health screening records database. The KAHP, established in 1964, comprises 16 medical institutions in South Korea that professionally offer health screening programs. The KAHP accommodates both national health screening programs (general and life-transition) and other patient-oriented and personalized screening programs. The screening results from these 16 medical centers are gathered in the HSRD.1

As health screening records contain information often deemed to be valuable and not available in claims databases, such as laboratory test data, disease history, lifestyle factors, or more clinically detailed information, they possess great value for extensive use in real-world epidemiological studies.2 For example, in a previous study using HSRD data to investigate the prevalence, awareness, treatment, and control rates of dyslipidemia among adults, low-density lipoprotein cholesterol levels and questionnaires were used to determine the diagnosis and awareness of dyslipidemia, respectively.3 However, despite the numerous studies that have used data from the HSRD, its characteristics have yet to be described and examined.4, 5 Therefore, we conducted a descriptive study of the HSRD comparing it to Korea's nationwide National Health Insurance Service-Health Screening Cohort (NHIS-HEALS) database in order to describe and to evaluate its characteristics for use as a reliable real-world data source for future epidemiological studies.

This descriptive study used health screening records from the HSRD and the NHIS-HEALS database for 2015 (Supplementary Table 1, only online). The Health Promotion Research Institute and IT Development & Support Office of the KAHP integrated and standardized all health screening records and established the HSRD for research purposes. The HSRD contains records for participants in either general or life-transition health screening programs. This database comprises anonymized patient codes with data on sex, age, laboratory test results, personal and family disease history, lifestyle risk factors, and cognitive and mood function.1

The NHIS-HEALS is a 10% sample cohort randomly extracted from 5150000 nationwide health screening program participants. It is large in scale, stable, and based on qualified health screening participants 40–79 years of age as of 2002 and 2003.6 As we used data only from 2015, the minimum follow-up period was 13 years; thus, the age distribution was 53–79 years. As the NHIS is the universal single-payer national healthcare system of South Korea, coverage is provided to the entire population. The NHIS-HEALS contains similar variables as those in the HSRD, such as anonymized patient codes with data on sex, age, laboratory results, disease history, lifestyle risk factors, and cognitive and mood function.

In South Korea, two national health screening programs are available: general and life-transition programs. The NHIS-HEALS covers only individuals who have participated in either of these two programs, whereas the HSRD encompasses all individuals who participated in other programs in addition to the two national programs.

For concordance evaluation, the NHIS-HEALS was considered the gold standard, as it is a 10% sample cohort randomly extracted from 5150000 nationwide health screening program participants, thus providing national representativeness. Common variables present in both databases were selected, including sex, age, laboratory results, disease history, lifestyle risk factors, and cognitive and mood function (Supplementary Table 2, only online). Frequencies and proportions or means and standard deviations (SDs) were calculated for categorical or continuous variables, wherever appropriate. For the HSRD, 95% confidence intervals (CIs) for each variable's proportion or mean was calculated as follows, depending on whether the variable was categorical or continuous, respectively (X=sample mean, s=sample SD, n=number of samples):

Proportion±1.96×Proportion×1Proportionn
X±1.96×sn

Concordance was classified as clinical or statistical; the latter was defined when a certain variable's estimate from the NHIS-HEALS fell within the HSRD estimate's 95% CI. Variables without statistical concordance were thoroughly reviewed by a group of physicians from various specialties based on clinical reference values to determine their clinical concordance. All statistical analyses were performed using Microsoft Excel (Microsoft, Washington DC, WA, USA) and SAS version 9.4 (SAS Institute Inc., Cary, NC, USA). The study protocol was approved by the Institutional Review Board of Sungkyunkwan University (SKKU 2018-04-006), and the need for obtaining informed consent from the study population was waived by the board.

In total, the HSRD and NHIS-HEALS included 946461 and 111690 individuals who participated in health screening programs in 2015, respectively. Compared to the NHIS-HEALS, the HSRD had more female (55.2% vs. 42.6%), but fewer older adults (34.4% vs. 51.2%). Unlike the NHIS-HEALS, which included only participants ≥53 years of age, the HSRD included participants ≤49 years of age. As for region of residence, the HSRD included more male (46.8% vs. 40.3%) and female (49.5% vs. 37.9%) participants residing in urban areas. For insurance type, the HSRD had fewer employee-insured participants in both sexes (male: 74.6% vs. 85.2%; female: 73.6% vs. 76.0%) (Fig. 1).

Fig. 1
(A–C) Comparisons of the socio-demographic and regional characteristics of health screening participants in the Health Screening Records Database (HSRD) and National Health Insurance Service-Health Screening Cohort (NHIS-HEALS) databases for 2015.

Comparison of general health screening program participants showed clinical concordance for all continuous variables, except for γ-glutamyl transferase in males and systolic blood pressure (BP) and total cholesterol in females. In both databases, personal disease history of hypertension showed the highest proportion, whereas for family disease history, it was others (including cancer). Moreover, the HSRD had a higher proportion of current smokers, but a lower proportion of participants who drank or exercised at all intensities, than the NHIS-HEALS (Table 1). Similar results were found for the life-transition health screening program participants, where diastolic BP and fasting blood glucose in females did not show clinical concordance, in addition to systolic BP and total cholesterol as mentioned above. Compared to the NHIS-HEALS, the HSRD showed lower proportions of participants for all categories of personal disease history and those who exercised often (3–7 times/week) (Table 2).

Table 1
Comparisons of the Clinical Characteristics of Individuals Who Participated in General Health Screening Programs and Are Included in the HSRD and NHIS-HEALS

Table 2
Comparisons of the Clinical Characteristics of Individuals Who Participated in Life-Transition Health Screening Programs and Are Included in the HSRD and NHIS-HEALS

Nearly all variables had clinical concordance: serum creatinine and family disease history of heart disease also had statistical concordance among participants in the general program. Analogous results were observed among participants in the life-transition program, with more variables, such as cognitive function, showing statistical concordance (Table 3).

Table 3
Variables with Clinical or Statistical Concordance between the HSRD and NHIS-HEALS

Compared to the NHIS-HEALS, the HSRD showed high clinical concordance for both general and life-transition program participants, with some even having statistical concordance, suggesting that the HSRD may serve as an appropriate data source for use in epidemiologic studies. The sociodemographic characteristics of the health screening program participants differed between the HSRD and NHIS-HEALS, in which the HSRD had more females (55.2% vs. 42.6%), but fewer older adult participants (≥60 years; 34.4% vs. 51.2%). The HSRD contains participants of all ages, including those aged <53 years, whereas the NHIS-HEALS contains only those aged ≥53 years. Moreover, the HSRD had more participants residing in urban areas (48.3% vs. 39.3%), but fewer employee-insured participants (74.0% vs. 81.3%). As for disease history, the HSRD had fewer participants with personal disease history, but had more participants for most categories of family disease history.

With the HSRD and NHIS-HEALS having different characteristics, the suitability of each database may depend on the type of epidemiologic study to be conducted. In studying diseases with a high prevalence, such as diabetes mellitus or hypertension, both databases may be appropriate as it would be relatively easy to acquire enough number of study subjects for ample power. However, when studying more specific diseases or conditions that are less prevalent in the general population, the preferred database may differ. For example, the HSRD would be the better choice for studies of rare disease in pediatric patients or in an age group under 40 years, as the HSRD contains a wider age range than the NHIS-HEALS. On the other hand, the NHIS-HEALS would be preferred when studying rare diseases in an older age group, because it represents the entire national population. Moreover, with the HSRD containing more clinical information than the NHIS-HEALS, which, in turn, may assist in determining the severity of disease, the HSRD would be preferred when studying severe diseases. However, there may be limitations when conducting longitudinal studies using the HSRD. Follow-up loss may occur in the HSRD should a patient transfer to another medical center that does not belong to one of the 16 medical centers of the KAHP. In the HSRD, as follow-up loss is most likely to occur when patients either change or quit their jobs, change their region of residence, or emigrate to another country, the frequency of follow-up loss is expected to be smaller when compared with general prospective cohorts or registries, but more prevalent than that with the NHIS-HEALS.

The wide range of clinical and lifestyle data from health screening records provide tremendous added value. Some studies have utilized these to more specifically define disease conditions; for example, hemoglobin A1c and fasting blood glucose levels were used to define diabetes mellitus, in addition to diagnosis codes.7 Other studies have used this information to identify associations between outcomes. One study reported that albuminuria may be a biomarker for hypertension and diabetes mellitus;8 another study reported serum uric acid to be positively associated with pulmonary function.9 Moreover, lifestyle factors have been shown to be associated with gastroesophageal reflux disease, and one study linked health screening and claims data to predict hospitalization due to pneumonia.10, 11 Thus, the use of health screening records either alone or linked with claims data may increase the value of epidemiological studies.

The strengths of our study are that this is the first study to describe and evaluate characteristics of the HSRD. Our exploration of its sociodemographic and clinical characteristics revealed high clinical concordance for the HSRD with the nationwide NHIS-HEALS. Second, the HSRD is unrestricted with regards to participant age and health screening programs; therefore, it contains all program participants and, thus, a broader spectrum of participants. Third, the well-validated NHIS-HEALS was used for comparison, ensuring the validity of the HSRD.6 Notwithstanding, the present study has some limitations. First, the medical centers of the KAHP are located in metropolitan cities, whereas the NHIS-HEALS receive health screening records from 22785 medical institutions across the Korean nation. Thus, individuals residing in rural areas may not be well represented in the HSRD (Supplementary Fig. 1, only online). Second, not all variables were found to have concordance; however, for these variables, various approaches to obtain concordance exist. For instance, post-stratification or benchmark weighting may be applied. Alternatively, iterative proportional fitting or inverse probability of treatment weighting with propensity scores may enhance concordance.12, 13, 14 Third, the non-random inclusion of subjects within the HSRD may have caused selection bias arising from the differences in health care utilization and health status when compared to that of the national average. This discrepancy may be due to the characteristics of the KAHP, as it is a multi-institutional organization of hospitals specializing in health screening programs. Finally, as we compared only data for the year 2015, not all potential health screening participants were included as not all health screening programs are performed annually.

The HSRD had more clinical information for a wider age range than the NHIS-HEALS, while simultaneously showing an exceptional level of clinical concordance. The HSRD alone or by linkage with other data may serve as an alternative data source for future epidemiologic studies by providing more comprehensive information and, in turn, evidence for health promotion or disease prevention policies.

SUPPLEMENTARY MATERIALS

Supplementary Table 1

Comparisons between the HSRD and NHIS-HEALS Databases

Click here to view.(28K, pdf)

Supplementary Table 2

Variables Included for Analysis if Present in both the HSRD and NHIS-HEALS Databases and Excluded Otherwise

Click here to view.(38K, pdf)

Supplementary Fig. 1

Distribution and location of the 16 medical institutions of the Korea Association of Health Promotion. Location of the 16 medical institutions: Seoul (3), Busan (1), Incheon (1), Daegu (1), Ulsan (1), Gyeonggi-do (1), Gangwon-do (1), Chungcheongbuk-do+Sejong (1), Daejeon+Chungcheongnam-do (1), Jeollabuk-do (1), Gwangju+Jeollanam-do (1), Gyeongsangbuk-do (1), Gyeongsangnam-do (1) and Jeju (1).

Click here to view.(55K, pdf)

Notes

The authors have no potential conflicts of interest to disclose.

AUTHOR CONTRIBUTIONS:

  • Conceptualization: Yunha Noh, Han Eol Jeong, Eun-Hee Nah, and Ju-Young Shin.

  • Data curation: Hanju Ko.

  • Formal analysis: Hye-Jun Kim.

  • Funding acquisition: Yunha Noh, Han Eol Jeong, and Ju-Young Shin.

  • Investigation: Yunha Noh and Han Eol Jeong.

  • Methodology: Yunha Noh and Han Eol Jeong.

  • Project administration: Eun-Hee Nah and Ju-Young Shin.

  • Resources: Hanju Ko and Eun-Hee Nah.

  • Software: Hanju Ko and Hye-Jun Kim.

  • Supervision: Ju-Young Shin and Eun-Hee Nah.

  • Validation: Yunha Noh, Han Eol Jeong, Eun-Hee Nah, and Ju-Young Shin.

  • Visualization: Yunha Noh, Han Eol Jeong, and Hye-Jun Kim.

  • Writing—original draft: Yunha Noh and Han Eol Jeong.

  • Writing—review & editing: Yunha Noh, Han Eol Jeong, Eun-Hee Nah, and Ju-Young Shin.

ACKNOWLEDGEMENTS

Access to the National Health Insurance Service-Health Screening Cohort (NHIS-HEALS) database was provided by the National Health Insurance Service of South Korea. We thank Yeon-Hee Baek and Ha-Lim Jeon at the School of Pharmacy, Sungkyunkwan University for their contribution in drafting the initial research protocol.

This study was supported by the Korea Association of Health Promotion.

References

    1. Korea Association of Health Promotion. Introduction of the Korea Association of Health Promotion [Internet]. Seoul: Korea Association of Health Promotion; c2011 [accessed on 2019 July 5].
    1. March S. Individual data linkage of survey data with claims data in Germany-an overview based on a cohort study. Int J Environ Res Public Health 2017;14:E1543
    1. Jang S, Lee J. Prevalence and management of dyslipidemia, hypertension, diabetes among adults in Gangwon-do, Korea: the 2013-2014 KNHSP. Journal of the Korea Academia-Industrial Cooperation Society 2017;18:625–636.
    1. Lee YJ, Kang JW, Kim JY, Na EH, Kim YR, Ko KS, et al. Clustering of health risk behaviors for chronic diseases in Korean adults. Korean J Health Educ Promot 2017;34:21–31.
    1. Nah EH, Cho S, Kim S, Cho HI, Chai JY. Comparison of traditional and reverse syphilis screening algorithms in medical health checkups. Ann Lab Med 2017;37:511–515.
    1. Seong SC, Kim YY, Park SK, Khang YH, Kim HC, Park JH, et al. Cohort profile: the National Health Insurance Service-National Health Screening Cohort (NHIS-HEALS) in Korea. BMJ Open 2017;7:e016640
    1. Ooba N, Setoguchi S, Sato T, Kubota K. Lipid-lowering drugs and risk of new-onset diabetes: a cohort study using Japanese healthcare data linked to clinical data for health screening. BMJ Open 2017;7:e015935
    1. Gu D, Xu P, Yuan Y, Fu H. Albuminuria is suggested as a potential health screening biomarker for senior citizens and general population with hypertension or diabetes in China. Clin Lab 2016;62:2267–2269.
    1. Song JU, Hwang J, Ahn JK. Serum uric acid is positively associated with pulmonary function in Korean health screening examinees. Mod Rheumatol 2017;27:1057–1065.
    1. Matsuki N, Fujita T, Watanabe N, Sugahara A, Watanabe A, Ishida T, et al. Lifestyle factors associated with gastroesophageal reflux disease in the Japanese population. J Gastroenterol 2013;48:340–349.
    1. Uematsu H, Yamashita K, Kunisawa S, Otsubo T, Imanaka Y. Prediction of pneumonia hospitalization in adults using health check-up data. PLoS One 2017;12:e0180159
    1. Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med 2015;34:3661–3679.
    1. El-Khorazaty JA, Koch G, Preisser J. The iterative proportional fitting algorithm for adjusted agreement in a non-inferiority diagnostic clinical trial. Pharm Stat 2014;13:173–178.
    1. Neumann A, Billionnet C. Covariate adjustment of cumulative incidence functions for competing risks data using inverse probability of treatment weighting. Comput Methods Programs Biomed 2016;129:63–70.

Metrics
Share
Figures

1 / 1

Tables

1 / 3

PERMALINK