Skip to main content
Log in

Transparent Reporting on Research Using Unstructured Electronic Health Record Data to Generate ‘Real World’ Evidence of Comparative Effectiveness and Safety

  • Leading Article
  • Published:
Drug Safety Aims and scope Submit manuscript

Abstract

Research that makes secondary use of administrative and clinical healthcare databases is increasingly influential for regulatory, reimbursement, and other healthcare decision-making. Consequently, there are numerous guidance documents on reporting for studies that use ‘real-world’ data captured in administrative claims and electronic health record (EHR) databases. These guidance documents are intended to improve transparency, reproducibility, and the ability to evaluate validity and relevance of design and analysis decisions. However, existing guidance does not differentiate between structured and unstructured information contained in EHRs, registries, or other healthcare data sources. While unstructured text is convenient and readily interpretable in clinical practice, it can be difficult to use for investigation of causal questions, e.g., comparative effectiveness and safety, until data have been cleaned and algorithms applied to extract relevant information to structured fields for analysis. The goal of this paper is to increase transparency for healthcare decision makers and causal inference researchers by providing general recommendations for reporting on steps taken to make unstructured text-based data usable for comparative effectiveness and safety research. These recommendations are designed to be used as an adjunct for existing reporting guidance. They are intended to provide sufficient context and supporting information for causal inference studies involving use of natural language processing- or machine learning-derived data fields, so that researchers, reviewers, and decision makers can be confident in their ability to evaluate the validity and relevance of derived measures for exposures, inclusion/exclusion criteria, covariates, and outcomes for the causal question of interest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. National Academies of Sciences, Engineering, and Medicine; Health and Medicine Division; Board on Health Sciences Policy; Forum on Drug Discovery, Development, and Translation. Real-world evidence generation and evaluation of therapeutics: proceedings of a workshop. Washington, DC: National Academies Press (US). 2017.

  2. Psaty BM, Breckenridge AM. Mini-Sentinel and regulatory science—big data rendered fit and functional. N Engl J Med. 2014;370(23):2165–7.

    Article  PubMed  Google Scholar 

  3. Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc. 2014;21(4):578–82.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Oliveira JL, Lopes P, Nunes T, Campos D, Boyer S, Ahlberg E, et al. The EU-ADR web platform: delivering advanced pharmacovigilance tools. Pharmacoepidemiol Drug Saf. 2013;22(5):459–67.

    Article  PubMed  Google Scholar 

  5. Collaborators A, Andersen M, Bergman U, Choi N-K, Gerhard T, Huang C, et al. The Asian Pharmacoepidemiology Network (AsPEN): promoting multi-national collaboration for pharmacoepidemiologic research in Asia. Pharmacoepidemiol Drug Saf. 2013;22(7):700–4.

    Article  Google Scholar 

  6. Behrman RE, Benner JS, Brown JS, McClellan M, Woodcock J, Platt R. Developing the Sentinel system—a national resource for evidence development. N Engl J Med. 2011;364(6):498–9.

    Article  CAS  PubMed  Google Scholar 

  7. Suissa S, Henry D, Caetano P, Dormuth CR, Ernst P, Hemmelgarn B, et al. CNODES: the Canadian network for observational drug effect studies. Open Med. 2012;6(4):e134–40.

    PubMed  PubMed Central  Google Scholar 

  8. Trifiro G, Coloma PM, Rijnbeek PR, Romio S, Mosseveld B, Weibel D, et al. Combining multiple healthcare databases for postmarketing drug and vaccine safety surveillance: why and how? J Intern Med. 2014;275(6):551–61.

    Article  CAS  PubMed  Google Scholar 

  9. Engel P, Almas MF, De Bruin ML, Starzyk K, Blackburn S, Dreyer NA. Lessons learned on the design and the conduct of post-authorization safety studies: review of 3 years of PRAC oversight. Br J Clin Pharmacol. 2017;83(4):884–93.

    Article  PubMed  Google Scholar 

  10. Eichler H-G, Hurts H, Broich K, Rasi G. Drug regulation and pricing—can regulators influence affordability? N Engl J Med. 2016;374(19):1807–9.

    Article  CAS  PubMed  Google Scholar 

  11. Makady A, Ham RT, de Boer A, Hillege H, Klungel O, Goettsch W, et al. Policies for use of real-world data in health technology assessment (HTA): a comparative study of six HTA agencies. Value Health. 2017;20(4):520–32.

    Article  PubMed  Google Scholar 

  12. von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP, et al. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. PLoS Med. 2007;4(10):e296.

    Article  Google Scholar 

  13. Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, et al. Strengthening the reporting of observational studies in epidemiology (STROBE): explanation and elaboration. PLoS Med. 2007;4(10):e297.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Wang SV, Schneeweiss S, Berger ML, Brown J, de Vries F, Douglas I, et al. Reporting to improve reproducibility and facilitate validity assessment for healthcare database studies V1.0. Pharmacoepidemiol Drug Saf. 2017;26(9):1018–32.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Benchimol EI, Smeeth L, Guttmann A, Harron K, Moher D, Petersen I, et al. The REporting of studies conducted using observational routinely-collected health Data (RECORD) statement. PLoS Med. 2015;12(10):e1001885.

    Article  PubMed  PubMed Central  Google Scholar 

  16. EMA. ENCePP guide on methodological standards in pharmacoepidemiology. London: EMA; 2014.

    Google Scholar 

  17. Us FDA. Guidance for industry and FDA staff: best practices for conducting and reporting pharmacoepidemiologic safety studies using electronic healthcare data. Rockville: US FDA; 2013.

    Google Scholar 

  18. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent Reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594.

    Article  PubMed  Google Scholar 

  19. Wong A, Plasek JM, Montecalvo SP, Zhou L. Natural language processing and its implications for the future of medication safety: a narrative review of recent advances and challenges. Pharmacotherapy. 2018;38(8):822–41.

    Article  PubMed  Google Scholar 

  20. Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc. 2016;23(5):1007–15.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Weiss LS, Zhou X, Walker AM, Ananthakrishnan AN, Shen R, Sobel RE, et al. A case study of the incremental utility for disease identification of natural language processing in electronic medical records. Pharm Med. 2018;32(1):31–7.

    Article  CAS  Google Scholar 

  22. Walker AM, Zhou X, Ananthakrishnan AN, Weiss LS, Shen R, Sobel RE, et al. Computer-assisted expert case definition in electronic health records. Int J Med Inform. 2016;86:62–70.

    Article  PubMed  Google Scholar 

  23. Khalifa A, Meystre S. Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes. J Biomed Inform. 2015;58:S128–32.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Li Q, Spooner SA, Kaiser M, Lingren N, Robbins J, Lingren T, et al. An end-to-end hybrid algorithm for automated medication discrepancy detection. BMC Med Inform Decis Mak. 2015;15:37.

    Article  PubMed  PubMed Central  Google Scholar 

  25. White RW, Wang S, Pant A, Harpaz R, Shukla P, Sun W, et al. Early identification of adverse drug reactions from search log data. J Biomed Inform. 2016;59:42–8.

    Article  PubMed  Google Scholar 

  26. Han L, Ball R, Pamer CA, Altman RB, Proestel S. Development of an automated assessment tool for MedWatch reports in the FDA adverse event reporting system. J Am Med Inform Assoc. 2017;24(5):913–20.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Sarker A, Gonzalez G. Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J Biomed Inform. 2015;53:196–207.

    Article  PubMed  Google Scholar 

  28. Strandell J, Caster O, Bate A, Norén N, Edwards IR. Reporting patterns indicative of adverse drug interactions. Drug Saf. 2011;34(3):253–66.

    Article  PubMed  Google Scholar 

  29. Botsis T, Buttolph T, Nguyen MD, Winiecki S, Woo EJ, Ball R. Vaccine adverse event text mining system for extracting features from vaccine safety reports. J Am Med Inform Assoc. 2012;19(6):1011–8.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Wunnava S, Qin X, Kakar T, Kong X, Rundensteiner EA, Sahoo SK, et al. One size does not fit all: an ensemble approach towards information extraction from adverse drug event narratives. In: Proceedings of the 11th international joint conference on biomedical engineering systems and technologies, vol 5. HEALTHINF. 2018. p. 176–188

  31. Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C. Automated acquisition of disease—drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inform Assoc. 2008;15(1):87–98.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Califf RM. The patient-centered outcomes research network: a national infrastructure for comparative effectiveness research. N C Med J. 2014;75(3):204–10.

    PubMed  Google Scholar 

  33. Ball R, Robb M, Anderson SA, Dal Pan G. The FDA’s sentinel initiative—a comprehensive approach to medical product surveillance. Clin Pharmacol Ther. 2016;99(3):265–8.

    Article  CAS  PubMed  Google Scholar 

  34. US FDA. Safety: FDA’s sentinel initiative. http://www.fda.gov/Safety/FDAsSentinelInitiative/ucm2007250.htm. Accessed Jan 2016.

  35. Duke-Margolis Center for Health Policy. Discussion guide. Improving the efficiency of outcome validation in the Sentinel System. Washington, DC: Duke-Margolis Center for Health Policy. 2018.

  36. PCORNet. PTNP-CCR. PCORnet common data model. 2016. http://www.pcornet.org/pcornet-common-data-model/. Accessed Jan 2016.

  37. Brown JB, N; Curtis, L; Raebel, MA; Haynes, K, Rosofsky, R. Sentinel common data model. 2017. https://www.sentinelinitiative.org/sentinel/data/distributed-database-common-data-model/sentinel-common-data-model. Accessed 14 June 2018.

  38. Ball R, Toh S, Nolan J, Haynes K, Forshee R, Botsis T. Evaluating automated approaches to anaphylaxis case classification using unstructured data from the FDA sentinel system. Pharmacoepidemiol Drug Saf. 2018;27(10):1077–84.

    Article  PubMed  Google Scholar 

  39. Seninel. Surveillance tools. Health outcome of interest validations and literature reviews. https://www.sentinelinitiative.org/sentinel/surveillance-tools/validations-lit-review. Accessed 11 Dec 2018.

  40. Huerta C, Abbing-Karahagopian V, Requena G, Oliva B, Alvarez Y, Gardarsdottir H, et al. Exposure to benzodiazepines (anxiolytics, hypnotics and related drugs) in seven European electronic healthcare databases: a cross-national descriptive study from the PROTECT-EU Project. Pharmacoepidemiol Drug Saf. 2016;25(Suppl. 1):56–65.

    Article  CAS  PubMed  Google Scholar 

  41. Lai ECC, Stang P, Yang YHK, Kubota K, Wong ICK, Setoguchi S. International multi-database pharmacoepidemiology: potentials and pitfalls. Curr Epidemiol Rep. 2015;2(4):229–38.

    Article  Google Scholar 

  42. Pratt N, Andersen M, Bergman U, Choi N-K, Gerhard T, Huang C, et al. Multi-country rapid adverse drug event assessment: the Asian Pharmacoepidemiology Network (AsPEN) antipsychotic and acute hyperglycaemia study. Pharmacoepidemiol Drug Saf. 2013;22(9):915–24.

    Article  CAS  PubMed  Google Scholar 

  43. Wang S, Verpillat P, Rassen J, Patrick A, Garry E, Bartels D. Transparency and reproducibility of observational cohort studies using large healthcare databases. Clin Pharmacol Ther. 2016;99(3):325–32.

    Article  CAS  PubMed  Google Scholar 

  44. Schneeweiss S, Rassen JA, Brown JS, Rothman KJ, Happe L, Arlett P, et al. Graphical depiction of longitudinal study designs in health care databases. Ann Intern Med. 2019;170(6):398–406.

    Article  PubMed  Google Scholar 

  45. Datta-Nemdharry P, Thomson A, Beynon J. Opportunities and challenges in developing a cohort of patients with type 2 diabetes mellitus using electronic primary care data. PloS One. 2016;11(11):e0162236.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Reeves D, Springate DA, Ashcroft DM, Ryan R, Doran T, Morris R, et al. Can analyses of electronic patient records be independently and externally validated? The effect of statins on the mortality of patients with ischaemic heart disease: a cohort study with nested case–control analysis. BMJ Open. 2014;4(4):e004952.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Shiloach M, Frencher SK, Steeger JE, Rowell KS, Bartzokis K, Tomeh MG, et al. Toward robust information: data quality and inter-rater reliability in the American College of Surgeons National Surgical Quality Improvement Program. J Am Coll Surg. 2010;210(1):6–16.

    Article  PubMed  Google Scholar 

  48. Hallgren KA. Computing inter-rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol. 2012;8(1):23–34.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Rothman KJ, Greenland S, Lash TL. Modern epidemiology. Philadelphia: Lippincott, Williams & Wilkins; 2008.

    Google Scholar 

  50. van Zaane B, Vergouwe Y, Donders ART, Moons KGM. Comparison of approaches to estimate confidence intervals of post-test probabilities of diagnostic test results in a nested case–control study. BMC Med Res Methodol. 2012;12:166.

    Article  PubMed  PubMed Central  Google Scholar 

  51. Pencina MJ, D’Agostino RB, Massaro JM. Understanding increments in model performance metrics. Lifetime Data Anal. 2013;19(2):202–18.

    Article  PubMed  Google Scholar 

  52. Demler OV, Paynter NP, Cook NR. Tests of calibration and goodness-of-fit in the survival setting. Stat Med. 2015;34(10):1659–80.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Berger ML, Sox H, Willke R, Brixner D, Eichler H-G, Goettsch W, et al. Good practices for real-world data studies of treatment and/or comparative effectiveness: Recommendations from the joint ISPOR-ISPE Special Task Force on real-world evidence in health care decision making. Pharmacoepidemiol Drug Saf. 2017;20(8):1003–8

    Google Scholar 

  54. Requena G, Huerta C, Gardarsdottir H, Logie J, González-González R, Abbing-Karahagopian V, et al. Hip/femur fractures associated with the use of benzodiazepines (anxiolytics, hypnotics and related drugs): a methodological approach to assess consistencies across databases from the PROTECT-EU project. Pharmacoepidemiol Drug Saf. 2016;25(Suppl 1):66–78.

    Article  CAS  PubMed  Google Scholar 

  55. Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, et al. Promoting an open research culture. Science. 2015;348(6242):1422–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Sarmiento RF, Dernoncourt F. Improving patient cohort identification using natural language processing. In: MIT Critical Data, editor. Secondary analysis of electronic health records. Springer, Cham. 2016, pp 405–17.

    Chapter  Google Scholar 

  57. Miller M, Banerjee T, Muppalla R, Romine W, Sheth A. What are people tweeting about zika? An exploratory study concerning its symptoms, treatment, transmission, and prevention. JMIR Public Health Surveill. 2017;3(2):e38.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Toh S, Reichman ME, Houstoun M, Ross Southworth M, Ding X, Hernandez AF, et al. Comparative risk for angioedema associated with the use of drugs that target the renin-angiotensin-aldosterone system. Arch Intern Med. 2012;172(20):1582–9.

    Article  CAS  PubMed  Google Scholar 

  59. Tian Z, Sun S, Eguale T, Rochefort CM. Automated extraction of VTE events from narrative radiology reports in electronic health records: a validation study. Med Care. 2017;55(10):e73–80.

    Article  PubMed  Google Scholar 

  60. Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T, Zink RM, et al. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. J Am Med Inform Assoc. 2012;19(e1):e162–9.

    Article  PubMed  PubMed Central  Google Scholar 

  61. Wright A, Pang J, Feblowitz JC, Maloney FL, Wilcox AR, Ramelson HZ, et al. A method and knowledge base for automated inference of patient problems from structured data in an electronic medical record. J Am Med Inform Assoc. 2011;18(6):859–67.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shirley V. Wang.

Ethics declarations

Conflict of Interest

Dr Shirley V Wang has received salary support on investigator-initiated grants from Novartis Pharmaceuticals Corporation, Boehringer Ingelheim, and J&J to Brigham and Women’s Hospital, and was a consultant to Aetion, Inc., all for unrelated work. Dr Olga V. Patterson receives research grants from the following for-profit organizations: Amgen Inc., Anolinx LLC, AstraZeneca Pharmaceuticals LP, Genentech Inc., Genomic Health, Inc., Gilead Sciences Inc., HITEKS Solutions Inc., Merck & Co., Inc., Northrop Grumman Information Systems, Novartis International AG, PAREXEL International Corporation, and Shire PLC through the University of Utah or Western Institute for Biomedical Research. Dr Patterson also receives research funding from the following federal and non-profit organizations: Agency for Healthcare Research and Quality, Brigham and Women’s Hospital, Centers for Disease Control and Prevention, Department of Defense, Department of Veterans Affairs, Intermountain Healthcare, National Heart, Lung, and Blood Institute, National Institute on Alcohol Abuse and Alcoholism, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institute of General Medical Sciences, National Institute of Standards and Technology, National Library of Medicine, National Science Foundation, Patient Centered Outcomes Research Institute, and RAND Corporation. Dr Joshua J. Gagne has received salary support from grants from Novartis Pharmaceuticals Corporation and Eli Lilly and company to Brigham and Women’s Hospital and is a consultant to Aetion, Inc. and to Optum, Inc., all for unrelated work. Dr Andrew Bate is an employee and shareholder of Pfizer. The views expressed in this paper are those of Dr Bate and may not necessarily reflect those of Pfizer. Dr Robert Ball is an author of US Patent 9,075,796, “Text mining for large medical text datasets and corresponding medical text classification using informative feature selection”. Dr Li Zhou has received research funding from the Agency of Healthcare Research and Quality (AHRQ): R01HS022728 and CRICO/RMF. Dr Jeffrey S Brown, Dr Pall Jonsson, Dr Adam Wright, and Dr Wim Goettsch have no conflicts of interest that are directly relevant to the content of this article. The views expressed in this article are the personal views of the authors and may not be understood or quoted as being made on behalf of or reflecting the position of the US Food and Drug Administration or the National Institute for Health and Care Excellence.

Funding

This study was supported by funds from the Division of Pharmacoepidemiology and Pharmacoeconomics and Brigham and Women’s Hospital.

Ethical Approval

Not applicable; no data were analyzed.

Patient Consent

No patient contact or data were involved.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S.V., Patterson, O.V., Gagne, J.J. et al. Transparent Reporting on Research Using Unstructured Electronic Health Record Data to Generate ‘Real World’ Evidence of Comparative Effectiveness and Safety. Drug Saf 42, 1297–1309 (2019). https://doi.org/10.1007/s40264-019-00851-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40264-019-00851-0

Navigation