skip to main content
10.1145/1882992.1883104acmotherconferencesArticle/Chapter ViewAbstractPublication PagesihiConference Proceedingsconference-collections
poster

Predictive rule discovery from electronic health records

Published:11 November 2010Publication History

ABSTRACT

Automated procedures are described for discovering predictive rules from electronic health records. These patient records are structured, but are not collected relative to any targeted labels or study objectives. The learning methods cycle through all features, simulating labels and converting the problem from unlabeled learning to supervised classification and regression. Each feature in turn is processed as a simulated label, and a prediction is made from the remaining features. Using a decision-rule representation for knowledge extraction, machine learning techniques are applied to a large collection of electronic health records. Many rules are readily induced with significant predictive performance. By formulating the rules as queries to a web search engine, and then counting hit frequencies, we show how medical researchers can assess and rank potential for new insight among a collection of empirically strong associations.

References

  1. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and I. Verkamo. Fast discovery of association rules. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and U. Uthurasamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307--328. AAAI Press, Menlo Park, CA, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB'94, Proceedings of 20th International Conference on Very Large Data Bases, September 12-15, 1994, Santiago de Chile, Chile, pages 487--499. Morgan Kaufmann, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Apté, F. Damerau, and S. Weiss. Automated Learning of Decision Rules for Text Categorization. ACM Transactions on Information Systems, 12(3): 233--251, July 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Arunasalam and S. Chawla. CCCS: A top-down associative classifier for imbalanced class distribution. In Proceedings of KDD-2006, pages 517--524, New York, NY, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Astebo and S. Elhedhli. The effectiveness of simple decision heuristics: Forecasting commercial success for early-stage ventures. Management Science, 52(3): 395--409, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Blake, E. Keogh, and C. Merz. UCI repository of machine learning databases. Technical report, University of California Irvine, 1999. www.ics.uci.edu/mlearn/MLRepository.html.Google ScholarGoogle Scholar
  7. R. Holte. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 3(11): 63--91, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Kaplan, A. McGinn, A. Baird, S. Hendrix, C. Kooperberg, J. Lynch, D. Rosenbaum, K. Johnson, H. Strickler, and S. Wassertheil-Smoller. Inflammation and hemostasis biomarkers for predicting stroke in postmenopausal women: the women's health initiative observational study. J Stroke Cerebrovasc Dis, 17(6): 344--355, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. Li, A. Fu, and P. Fahey. Efficient discovery of risk patterns in medical data. Artificial Intelligence in Medicine, 45(1): 77--89, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple-class association rules. In Proceedings of ICDM, pages 369--376, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proceedings of KDD, pages 80--86, 1998.Google ScholarGoogle Scholar
  12. I. Mullins, M. Siadaty, J. Lyman, K. Scully, C. Garrett, et al. Data mining and clinical data repositories: Insights from a 667,000 patient data set. Computers in Biology and Medicine, 36(12): 1351--1377, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  13. L. Pal, S. Hailpern, N. Santoro, R. Freeman, D. Barad, S. Kipersztok, V. Barnabei, and S. Wassertheil-Smoller. Association of pelvic organ prolapse and fractures in postmenopausal women: analysis of baseline data from the women's health initiative estrogen plus progestin trial. Menopause, 15(1): 59--66, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  14. A. Ragel and B. Cremilleux. Treatment of missing values for association rules. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 258--270, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Rigoutsos and A. Floratos. Combinatorial Pattern Discovery In Biological Sequences: The TEIRESIAS Algorithm. Bioinformatics, 14(1), 1998.Google ScholarGoogle Scholar
  16. B. Robson and R. Mushlin. Clinical and pharmacogenomic data mining. A Simple Method for the Combination of Information from Associations and Multivariances to Facilitate Analysis, Decision and Design in Clinical Research and Practice. Proteome Research, 2004.Google ScholarGoogle Scholar
  17. G. Webb. Discovering significant rules. In Proceedings of KDD-2006, pages 434--440, New York, NY, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Weiss and N. Indurkhya. Lightweight rule induction. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 1135--1142, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Weiss and N. Indurkhya. Solving regression problems with rule-based ensemble classifiers. In Proceedings of KDD-2001, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. X. Yin and J. Han. CPAR: Classification based on predictive association rules. In Proceedings of SDM, pages 369--376, 2003.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Predictive rule discovery from electronic health records

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      IHI '10: Proceedings of the 1st ACM International Health Informatics Symposium
      November 2010
      886 pages
      ISBN:9781450300308
      DOI:10.1145/1882992

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 November 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader