ABSTRACT
Automated procedures are described for discovering predictive rules from electronic health records. These patient records are structured, but are not collected relative to any targeted labels or study objectives. The learning methods cycle through all features, simulating labels and converting the problem from unlabeled learning to supervised classification and regression. Each feature in turn is processed as a simulated label, and a prediction is made from the remaining features. Using a decision-rule representation for knowledge extraction, machine learning techniques are applied to a large collection of electronic health records. Many rules are readily induced with significant predictive performance. By formulating the rules as queries to a web search engine, and then counting hit frequencies, we show how medical researchers can assess and rank potential for new insight among a collection of empirically strong associations.
- R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and I. Verkamo. Fast discovery of association rules. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and U. Uthurasamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307--328. AAAI Press, Menlo Park, CA, 1996. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB'94, Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile, pages 487--499. Morgan Kaufmann, 1994. Google ScholarDigital Library
- C. Apté, F. Damerau, and S. Weiss. Automated Learning of Decision Rules for Text Categorization. ACM Transactions on Information Systems, 12(3): 233--251, July 1994. Google ScholarDigital Library
- B. Arunasalam and S. Chawla. CCCS: A top-down associative classifier for imbalanced class distribution. In Proceedings of KDD-2006, pages 517--524, New York, NY, 2006. ACM Press. Google ScholarDigital Library
- T. Astebo and S. Elhedhli. The effectiveness of simple decision heuristics: Forecasting commercial success for early-stage ventures. Management Science, 52(3): 395--409, 2006. Google ScholarDigital Library
- C. Blake, E. Keogh, and C. Merz. UCI repository of machine learning databases. Technical report, University of California Irvine, 1999. www.ics.uci.edu/mlearn/MLRepository.html.Google Scholar
- R. Holte. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 3(11): 63--91, 1993. Google ScholarDigital Library
- R. Kaplan, A. McGinn, A. Baird, S. Hendrix, C. Kooperberg, J. Lynch, D. Rosenbaum, K. Johnson, H. Strickler, and S. Wassertheil-Smoller. Inflammation and hemostasis biomarkers for predicting stroke in postmenopausal women: the women's health initiative observational study. J Stroke Cerebrovasc Dis, 17(6): 344--355, 2008.Google ScholarCross Ref
- J. Li, A. Fu, and P. Fahey. Efficient discovery of risk patterns in medical data. Artificial Intelligence in Medicine, 45(1): 77--89, 2009. Google ScholarDigital Library
- W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple-class association rules. In Proceedings of ICDM, pages 369--376, 2001. Google ScholarDigital Library
- B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proceedings of KDD, pages 80--86, 1998.Google Scholar
- I. Mullins, M. Siadaty, J. Lyman, K. Scully, C. Garrett, et al. Data mining and clinical data repositories: Insights from a 667,000 patient data set. Computers in Biology and Medicine, 36(12): 1351--1377, 2006.Google ScholarCross Ref
- L. Pal, S. Hailpern, N. Santoro, R. Freeman, D. Barad, S. Kipersztok, V. Barnabei, and S. Wassertheil-Smoller. Association of pelvic organ prolapse and fractures in postmenopausal women: analysis of baseline data from the women's health initiative estrogen plus progestin trial. Menopause, 15(1): 59--66, 2008.Google ScholarCross Ref
- A. Ragel and B. Cremilleux. Treatment of missing values for association rules. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 258--270, 1998. Google ScholarDigital Library
- I. Rigoutsos and A. Floratos. Combinatorial Pattern Discovery In Biological Sequences: The TEIRESIAS Algorithm. Bioinformatics, 14(1), 1998.Google Scholar
- B. Robson and R. Mushlin. Clinical and pharmacogenomic data mining. A Simple Method for the Combination of Information from Associations and Multivariances to Facilitate Analysis, Decision and Design in Clinical Research and Practice. Proteome Research, 2004.Google Scholar
- G. Webb. Discovering significant rules. In Proceedings of KDD-2006, pages 434--440, New York, NY, 2006. ACM Press. Google ScholarDigital Library
- S. Weiss and N. Indurkhya. Lightweight rule induction. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 1135--1142, 2000. Google ScholarDigital Library
- S. Weiss and N. Indurkhya. Solving regression problems with rule-based ensemble classifiers. In Proceedings of KDD-2001, 2001. Google ScholarDigital Library
- X. Yin and J. Han. CPAR: Classification based on predictive association rules. In Proceedings of SDM, pages 369--376, 2003.Google ScholarCross Ref
Index Terms
- Predictive rule discovery from electronic health records
Recommendations
Concept-based electronic health records: opportunities and challenges
MM '06: Proceedings of the 14th ACM international conference on MultimediaHealthcare is a data-rich but information-poor domain. Terabytes of multimedia medical data are being generated on a monthly basis in a typical healthcare organization in order to document patients' health status and care process. Government and health-...
Electronic health records: how can IS researchers contribute to transforming healthcare?
Electronic health records (EHR) facilitate integration of patient health history for planning safe and proper treatment. Combined with data analytics, aggregate-level EHR enable examination and development of effective medicines and therapies for ...
Management of Electronic Health Records in Virtual Health Environments: The Case of Rocket Health in Uganda
This article examined the management of electronic health records in virtual health environments using rocket health as a case study. The specific objectives of the study were to determine the healthcare services provided at rocket health; examine the ...
Comments