skip to main content
survey
Open Access

A Survey of Methods for Explaining Black Box Models

Published:22 August 2018Publication History
Skip Abstract Section

Abstract

In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.

Skip Supplemental Material Section

Supplemental Material

References

  1. Julius Adebayo and Lalana Kagal. 2016. Iterative orthogonal feature projection for diagnosing bias in black-box models. arXiv preprint arXiv:1611.04967.Google ScholarGoogle Scholar
  2. Philip Adler, Casey Falk, Sorelle A. Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, and Suresh Venkatasubramanian. 2016. Auditing black-box models for indirect influence. In Proceedings of the IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, Springer, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  3. Rakesh Agrawal, Ramakrishnan Srikant et al. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), Vol. 1215. 487--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yousra Abdul Alsahib S. Aldeen, Mazleena Salleh, and Mohammad Abdur Razzaque. 2015. A comprehensive review on privacy preserving data mining. SpringerPlus 4, 1 (2015), 694.Google ScholarGoogle ScholarCross RefCross Ref
  5. Robert Andrews, Joachim Diederich, and Alan B. Tickle. 1995. Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl.-Based Syst. 8, 6 (1995), 373--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Gethsiyal Augasta and T. Kathirvalavakumar. 2012. Reverse engineering the neural networks for rule extraction in classification problems. Neural Process. Lett. 35, 2 (2012), 131--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS One 10, 7 (2015), e0130140.Google ScholarGoogle ScholarCross RefCross Ref
  8. David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert MÞller. 2010. How to explain individual classification decisions. J. Mach. Learn. Res. 11(June 2010), 1803--1831. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jacob Bien and Robert Tibshirani. 2011. Prototype selection for interpretable classification. Ann. Appl. Stat. 5, 4 (2011), 2403--2424.Google ScholarGoogle ScholarCross RefCross Ref
  10. Marko Bohanec and Ivan Bratko. 1994. Trading accuracy for simplicity in decision trees. Mach. Learn. 15, 3 (1994), 223--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Larry Jackel, Urs Muller, and Karol Zieba. 2016. VisualBackProp: Visualizing CNNs for autonomous driving. CoRR, Vol. abs/1611.05418 (2016).Google ScholarGoogle Scholar
  12. Olcay Boz. 2002. Extracting decision trees from trained neural networks. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 456--461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC Press.Google ScholarGoogle Scholar
  14. Aylin Caliskan-Islam, Joanna J. Bryson, and Arvind Narayanan. 2016. Semantics derived automatically from language corpora necessarily contain human biases. arXiv preprint arXiv:1608.07187 (2016).Google ScholarGoogle Scholar
  15. Carolyn Carter, Elizabeth Renuart, Margot Saunders, and Chi Chi Wu. 2006. The credit card market and regulation: In need of repair. NC Bank. Inst. 10 (2006), 23.Google ScholarGoogle Scholar
  16. Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1721--1730. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. A. Chipman, E. I. George, and R. E. McCulloh. 1998. Making sense of a forest of trees. In Proceedings of the 30th Symposium on the Interface, S. Weisberg (Ed.). Fairfax Station, VA: Interface Foundation of North America, 84--92.Google ScholarGoogle Scholar
  18. Paulo Cortez and Mark J. Embrechts. 2011. Opening black box data mining models using sensitivity analysis. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM’11). IEEE, 341--348.Google ScholarGoogle Scholar
  19. Paulo Cortez and Mark J. Embrechts. 2013. Using sensitivity analysis and visualization techniques to open black box data mining models. Info. Sci. 225 (2013), 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Paulo Cortez, Juliana Teixeira, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. 2009. Using data mining for wine quality assessment. In Discovery Science, Vol. 5808. Springer, 66--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mark Craven and Jude W. Shavlik. 1994. Using sampling and queries to extract rules from trained neural networks. In Proceedings of the International Conference on Machine Learning (ICML’94). 37--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mark Craven and Jude W. Shavlik. 1996. Extracting tree-structured representations of trained networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 24--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. David Danks and Alex John London. 2017. Regulating autonomous systems: Beyond standards. IEEE Intell. Syst. 32, 1 (2017), 88--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Proceedings of the IEEE Symposium on Security and Privacy (SP’16). IEEE, 598--617.Google ScholarGoogle ScholarCross RefCross Ref
  25. Houtao Deng. 2014. Interpreting tree ensembles with intrees. arXiv preprint arXiv:1408.5456 (2014).Google ScholarGoogle Scholar
  26. Pedro Domingos. 1998. Knowledge discovery via multiple models. Intell. Data Anal. 2, 1--4 (1998), 187--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Pedro Domingos. 1998. Occam’s two razors: The sharp and the blunt. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’98). 37--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv:1702.08608v2.Google ScholarGoogle Scholar
  29. Strumbelj Erik and Igor Kononenko. 2010. An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11(Jan. 2010), 1--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ruth Fong and Andrea Vedaldi. 2017. Interpretable explanations of black boxes by meaningful perturbation. arXiv preprint arXiv:1704.03296 (2017).Google ScholarGoogle Scholar
  31. Eibe Frank and Ian H. Witten. 1998. Generating accurate rule sets without global optimization. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML'98). 144--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Alex A. Freitas. 2014. Comprehensible classification models: A position paper. ACM SIGKDD Explor. Newslett. 15, 1 (2014), 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Glenn Fung, Sathyakama Sandilya, and R. Bharat Rao. 2005. Rule extraction from linear support vector machines. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 32--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Robert D. Gibbons, Giles Hooker, Matthew D. Finkelman, David J. Weiss, Paul A. Pilkonis, Ellen Frank, Tara Moore, and David J. Kupfer. 2013. The CAD-MDD: A computerized adaptive diagnostic screening tool for depression. J. Clin. Psych. 74, 7 (2013), 669.Google ScholarGoogle ScholarCross RefCross Ref
  35. Alex Goldstein, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. J. Comput. Graph. Stat. 24, 1 (2015), 44--65.Google ScholarGoogle ScholarCross RefCross Ref
  36. Bryce Goodman and Seth Flaxman. 2016. EU regulations on algorithmic decision-making and a “right to explanation.” In Proceedings of the ICML Workshop on Human Interpretability in Machine Learning (WHI’16). Retrieved from http://arxiv. org/abs/1606.08813 v1.Google ScholarGoogle Scholar
  37. Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Dino Pedreschi, Franco Turini, and Fosca Giannotti. 2018. Local rule-based explanations of black box decision systems. arXiv preprint arXiv:1805.10820 (2018).Google ScholarGoogle Scholar
  38. Satoshi Hara and Kohei Hayashi. 2016. Making tree ensembles interpretable. arXiv preprint arXiv:1606.05390 (2016).Google ScholarGoogle Scholar
  39. Stefan Haufe, Frank Meinecke, Kai Görgen, Sven Dähne, John-Dylan Haynes, Benjamin Blankertz, and Felix Bießmann. 2014. On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage 87 (2014), 96--110.Google ScholarGoogle ScholarCross RefCross Ref
  40. Andreas Henelius, Kai Puolamäki, Henrik Boström, Lars Asker, and Panagiotis Papapetrou. 2014. A peek into the black box: Exploring classifiers by randomization. Data Min. Knowl. Discov. 28, 5--6 (2014), 1503--1529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jake M. Hofman, Amit Sharma, and Duncan J. Watts. 2017. Prediction and explanation in social systems. Science 355, 6324 (2017), 486--488.Google ScholarGoogle Scholar
  42. Giles Hooker. 2004. Discovering additive structure in black box functions. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 575--580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Johan Huysmans, Karel Dejaeger, Christophe Mues, Jan Vanthienen, and Bart Baesens. 2011. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decis. Supp. Syst. 51, 1 (2011), 141--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. U. Johansson, R. König, and L. Niklasson. 2003. Rule extraction from trained neural networks using genetic programming. In Proceedings of the 13th International Conference on Artificial Neural Networks. 13--16.Google ScholarGoogle Scholar
  45. Ulf Johansson, Rikard König, and Lars Niklasson. 2004. The truth is in there-rule extraction from opaque models using genetic programming. In Proceedings of the FLAIRS Conference. 658--663.Google ScholarGoogle Scholar
  46. Ulf Johansson and Lars Niklasson. 2009. Evolving decision trees using oracle guides. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM’09). IEEE, 238--244.Google ScholarGoogle ScholarCross RefCross Ref
  47. Ulf Johansson, Lars Niklasson, and Rikard König. 2004. Accuracy vs. comprehensibility in data mining models. In Proceedings of the 7th International Conference on Information Fusion, Vol. 1. 295--300.Google ScholarGoogle Scholar
  48. Hiroharu Kato and Tatsuya Harada. 2014. Image reconstruction from bag-of-visual-words. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 955--962. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Been Kim, Elena Glassman, Brittney Johnson, and Julie Shah. 2015. iBCM: Interactive Bayesian case model empowering humans via intuitive interaction. Technical Report: MIT-CSAIL-TR-2015-010.Google ScholarGoogle Scholar
  50. Been Kim, Oluwasanmi O. Koyejo, and Rajiv Khanna. 2016. Examples are not enough, learn to criticize! criticism for interpretability. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2280--2288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Been Kim, Cynthia Rudin, and Julie A. Shah. 2014. The Bayesian case model: A generative approach for case-based reasoning and prototype classification. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1952--1960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Been Kim, Julie A. Shah, and Finale Doshi-Velez. 2015. Mind the gap: A generative approach to interpretable feature selection and extraction. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2260--2268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. John K. C. Kingston. 2016. Artificial intelligence and legal liability. In Proceedings of the Specialist Group on Artificial Intelligence Conference (SGAI’16). Springer, 269--279.Google ScholarGoogle ScholarCross RefCross Ref
  54. Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. arXiv preprint arXiv:1703.04730 (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Josua Krause, Adam Perer, and Kenney Ng. 2016. Interacting with predictions: Visual inspection of black-box machine learning models. In Proceedings of the CHI Conference on Human Factors in Computing Systems. ACM, 5686--5697. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Samantha Krening, Brent Harrison, Karen M. Feigh, Charles Lee Isbell, Mark Riedl, and Andrea Thomaz. 2017. Learning from explanations using sentiment and advice in RL. IEEE Trans. Cogn. Dev. Syst. 9, 1 (2017), 44--55.Google ScholarGoogle ScholarCross RefCross Ref
  57. R. Krishnan, G. Sivakumar, and P. Bhattacharya. 1999. Extracting decision trees from trained neural networks. Pattern Recogn. 32, 12 (1999).Google ScholarGoogle Scholar
  58. Sanjay Krishnan and Eugene Wu. 2017. PALM: Machine learning explanations for iterative debugging. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. ACM, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Joshua A. Kroll, Joanna Huey, Solon Barocas, Edward W. Felten, Joel R. Reidenberg, David G. Robinson, and Harlan Yu. 2017. Accountable algorithms. U. Penn. Law Rev. 165 (2017), 633--705.Google ScholarGoogle Scholar
  60. Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).Google ScholarGoogle Scholar
  61. Himabindu Lakkaraju, Stephen H. Bach, and Jure Leskovec. 2016. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1675--1684. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2017. Interpretable 8 explorable approximations of black box models. arXiv preprint arXiv:1707.01154 (2017).Google ScholarGoogle Scholar
  63. Himabindu Lakkaraju, Jon Kleinberg, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2017. The selective labels problem: Evaluating algorithmic predictions in the presence of unobservables. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 275--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Will Landecker, Michael D. Thomure, Luís M. A. Bettencourt, Melanie Mitchell, Garrett T. Kenyon, and Steven P. Brumby. 2013. Interpreting individual classifications of hierarchical networks. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM’13). IEEE, 32--38.Google ScholarGoogle Scholar
  65. Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing neural predictions. arXiv preprint arXiv:1606.04155 (2016).Google ScholarGoogle Scholar
  66. Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, David Madigan et al. 2015. Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Ann. Appl. Stat. 9, 3 (2015), 1350--1371.Google ScholarGoogle ScholarCross RefCross Ref
  67. Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi. 2017. Deep text classification can be fooled. arXiv preprint arXiv:1704.08006 (2017).Google ScholarGoogle Scholar
  68. Zachary C. Lipton. 2016. The mythos of model interpretability. arXiv preprint arXiv:1606.03490 (2016).Google ScholarGoogle Scholar
  69. Yin Lou, Rich Caruana, and Johannes Gehrke. 2012. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 150--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. 2013. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 623--631. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Stella Lowry and Gordon Macpherson. 1988. A blot on the profession. Brit. Med. J. Clin. Res. 296, 6623 (1988), 657.Google ScholarGoogle ScholarCross RefCross Ref
  72. Aravindh Mahendran and Andrea Vedaldi. 2015. Understanding deep image representations by inverting them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5188--5196.Google ScholarGoogle ScholarCross RefCross Ref
  73. Aravindh Mahendran and Andrea Vedaldi. 2016. Visualizing deep convolutional neural networks using natural pre-images. Int. J. Comput. Vis. 120, 3 (2016), 233--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Gianclaudio Malgieri and Giovanni Comandé. 2017. Why a right to legibility of automated decision-making exists in the general data protection regulation. Int. Data Priv. Law 7, 4 (2017), 243--265.Google ScholarGoogle ScholarCross RefCross Ref
  75. Dmitry M. Malioutov, Kush R. Varshney, Amin Emad, and Sanjeeb Dash. 2017. Learning interpretable classification rules with boolean compressed sensing. In Transparent Data Mining for Big and Small Data. Springer, 95--121.Google ScholarGoogle Scholar
  76. David Martens, Bart Baesens, Tony Van Gestel, and Jan Vanthienen. 2007. Comprehensible credit scoring models using rule extraction from support vector machines. Eur. J. Operat. Res. 183, 3 (2007), 1466--1476.Google ScholarGoogle ScholarCross RefCross Ref
  77. David Martens, Jan Vanthienen, Wouter Verbeke, and Bart Baesens. 2011. Performance of classification models from a user perspective. Decis. Support Syst. 51, 4 (2011), 782--793. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Grégoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, and Klaus-Robert Müller. 2017. Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recogn. 65 (2017), 211--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Patrick M. Murphy and Michael J. Pazzani. 1991. ID2-of-3: Constructive induction of M-of-N concepts for discriminators in decision trees. In Proceedings of the 8th International Workshop on Machine Learning. 183--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Anh Nguyen, Alexey Dosovitskiy, Jason Yosinski, Thomas Brox, and Jeff Clune. 2016. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 3387--3395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Anh Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 427--436.Google ScholarGoogle ScholarCross RefCross Ref
  82. Haydemar Núñez, Cecilio Angulo, and Andreu Català. 2002. Rule extraction from support vector machines. In Proceedings of the European Symposium on Artificial Neural Networks (ESANN’02). 107--112.Google ScholarGoogle Scholar
  83. Julian D. Olden and Donald A. Jackson. 2002. Illuminating the “black box”: A randomization approach for understanding variable contributions in artificial neural networks. Ecol. Model. 154, 1 (2002), 135--150.Google ScholarGoogle ScholarCross RefCross Ref
  84. Fernando E. B. Otero and Alex A. Freitas. 2013. Improving the interpretability of classification rules discovered by an ant colony algorithm. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation. ACM, 73--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Gisele L. Pappa, Anthony J. Baines, and Alex A. Freitas. 2005. Predicting post-synaptic activity in proteins with data mining. Bioinformatics 21, suppl. 2 (2005), ii19--ii25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Frank Pasquale. 2015. The Black Box Society: The Secret Algorithms that Control Money and Information. Harvard University Press. Google ScholarGoogle Scholar
  87. Michael J. Pazzani, S. Mani, William R. Shankle et al. 2001. Acceptance of rules generated by machine learning among medical experts. Methods Info. Med. 40, 5 (2001), 380--385.Google ScholarGoogle Scholar
  88. Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 560--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Brett Poulin, Roman Eisner, Duane Szafron, Paul Lu, Russell Greiner, David S. Wishart, Alona Fyshe, Brandon Pearcy, Cam MacDonell, and John Anvik. 2006. Visual explanation of evidence with additive classifiers. In Proceedings of the National Conference on Artificial Intelligence, Vol. 21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. J. Ross Quinlan. 1987. Generating production rules from decision trees. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’87), Vol. 87. 304--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. J. Ross Quinlan. 1987. Simplifying decision trees. Int. J. Man-Mach. Stud. 27, 3 (1987), 221--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. J Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Elsevier. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. J. Ross Quinlan. 1999. Simplifying decision trees. Int. J. Hum.-Comput. Stud. 51, 2 (1999), 497--510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. J Ross Quinlan and R. Mike Cameron-Jones. 1993. FOIL: A midterm report. In Proceedings of the European Conference on Machine Learning. Springer, 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Alec Radford, Rafal Jozefowicz, and Ilya Sutskever. 2017. Learning to generate reviews and discovering sentiment. arXiv preprint arXiv:1704.01444 (2017).Google ScholarGoogle Scholar
  96. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016).Google ScholarGoogle Scholar
  97. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Nothing else matters: Model-agnostic explanations by identifying prediction invariance. arXiv preprint arXiv:1611.05817 (2016).Google ScholarGoogle Scholar
  98. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google ScholarGoogle Scholar
  100. Andrea Romei and Salvatore Ruggieri. 2014. A multidisciplinary survey on discrimination analysis. Knowl. Eng. Rev. 29, 5 (2014), 582--638.Google ScholarGoogle ScholarCross RefCross Ref
  101. Salvatore Ruggieri. 2012. Subtree replacement in decision tree simplification. In Proceedings of the 12th SIAM International Conference on Data Mining. SIAM, 379--390.Google ScholarGoogle ScholarCross RefCross Ref
  102. Andrea Saltelli. 2002. Sensitivity analysis for importance assessment. Risk Anal. 22, 3 (2002), 579--590.Google ScholarGoogle ScholarCross RefCross Ref
  103. Wojciech Samek, Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, and Klaus-Robert Müller. 2017. Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28, 11 (2017), 2660--2673.Google ScholarGoogle ScholarCross RefCross Ref
  104. Vitaly Schetinin, Jonathan E. Fieldsend, Derek Partridge, Timothy J. Coats, Wojtek J. Krzanowski, Richard M. Everson, Trevor C. Bailey, and Adolfo Hernandez. 2007. Confident interpretation of Bayesian decision tree ensembles for clinical applications. IEEE Trans. Info. Technol. Biomed. 11, 3 (2007), 312--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Christin Seifert, Aisha Aamir, Aparna Balagopalan, Dhruv Jain, Abhinav Sharma, Sebastian Grottel, and Stefan Gumhold. 2017. Visualizations of deep neural networks in computer vision: A survey. In Transparent Data Mining for Big and Small Data. Springer, 123--144.Google ScholarGoogle Scholar
  106. Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, and Dhruv Batra. 2016. Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. arXiv preprint arXiv:1610.02391 (2016).Google ScholarGoogle Scholar
  107. Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685 (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Ravid Shwartz-Ziv and Naftali Tishby. 2017. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810 (2017).Google ScholarGoogle Scholar
  109. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).Google ScholarGoogle Scholar
  110. Sameer Singh, Marco Tulio Ribeiro, and Carlos Guestrin. 2016. Programs as black-box explanations. arXiv preprint arXiv:1611.07579 (2016).Google ScholarGoogle Scholar
  111. Sören Sonnenburg, Alexander Zien, Petra Philips, and G. Rätsch. 2008. POIMs: Positional oligomer importance matrices—understanding support vector machine-based signal detectors. Bioinformatics 24, 13 (2008), i6--i14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2014. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 (2014).Google ScholarGoogle Scholar
  113. Irene Sturm, Sebastian Lapuschkin, Wojciech Samek, and Klaus-Robert Müller. 2016. Interpretable deep neural networks for single-trial eeg classification. J. Neurosci. Methods 274 (2016), 141--145.Google ScholarGoogle ScholarCross RefCross Ref
  114. Guolong Su, Dennis Wei, Kush R. Varshney, and Dmitry M. Malioutov. 2015. Interpretable two-level Boolean rule learning for classification. arXiv preprint arXiv:1511.07361 (2015).Google ScholarGoogle Scholar
  115. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. arXiv preprint arXiv:1703.01365 (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).Google ScholarGoogle Scholar
  117. Hui Fen Tan, Giles Hooker, and Martin T. Wells. 2016. Tree space prototypes: Another look at making tree ensembles interpretable. arXiv preprint arXiv:1611.07115 (2016).Google ScholarGoogle Scholar
  118. Pang-Ning Tan et al. 2006. Introduction to Data Mining. Pearson Education, India.Google ScholarGoogle Scholar
  119. Jayaraman J. Thiagarajan, Bhavya Kailkhura, Prasanna Sattigeri, and Karthikeyan Natesan Ramamurthy. 2016. TreeView: Peeking into deep neural networks via feature-space partitioning. arXiv preprint arXiv:1611.07429 (2016).Google ScholarGoogle Scholar
  120. Nava Tintarev and Judith Masthoff. 2015. Explaining recommendations: Design and evaluation. In Recommender Systems Handbook, Francesco Ricci, Lior Rokach, and Bracha Shapira (Eds.). Springer, 353--382.Google ScholarGoogle Scholar
  121. Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, and Mounia Lalmas. 2017. Interpretable predictions of tree-based ensembles via actionable feature tweaking. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 465--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Ryan Turner. 2016. A model explanation system. In Proceedings of the IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP’16). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  123. Wouter Verbeke, David Martens, Christophe Mues, and Bart Baesens. 2011. Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Syst. Appl. 38, 3 (2011), 2354--2364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Marina M.-C. Vidovic, Nico Görnitz, Klaus-Robert Müller, and Marius Kloft. 2016. Feature importance measure for non-linear learning algorithms. arXiv preprint arXiv:1611.07567 (2016).Google ScholarGoogle Scholar
  125. Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, and Antonio Torralba. 2013. Hoggles: Visualizing object detection features. In Proceedings of the IEEE International Conference on Computer Vision. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. Sandra Wachter, Brent Mittelstadt, and Luciano Floridi. 2017. Why a right to explanation of automated decision-making does not exist in the general data protection regulation. Int. Data Priv. Law 7, 2 (2017), 76--99.Google ScholarGoogle ScholarCross RefCross Ref
  127. Fulton Wang and Cynthia Rudin. 2015. Falling rule lists. In Proceedings of the Conference on Artificial Intelligence and Statistics. 1013--1022.Google ScholarGoogle Scholar
  128. Jialei Wang, Ryohei Fujimaki, and Yosuke Motohashi. 2015. Trading interpretability for accuracy: Oblique treed sparse additive models. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1245--1254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Tong Wang. 2017. Multi-value rule sets. arXiv preprint arXiv:1710.05257 (2017).Google ScholarGoogle Scholar
  130. Tong Wang, Cynthia Rudin, Finale Velez-Doshi, Yimin Liu, Erica Klampfl, and Perry MacNeille. 2016. Bayesian rule sets for interpretable classification. In Proceedings of the IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 1269--1274.Google ScholarGoogle ScholarCross RefCross Ref
  131. Philippe Weinzaepfel, Hervé Jégou, and Patrick Pérez. 2011. Reconstructing an image from its local descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, 337--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Adrian Weller. 2017. Challenges for transparency. arXiv preprint arXiv:1708.01870 (2017).Google ScholarGoogle Scholar
  133. Dietrich Wettschereck, David W. Aha, and Takao Mohri. 1997. A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. In Lazy Learning. Springer, 273--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning. 2048--2057. Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. Xiaoxin Yin and Jiawei Han. 2003. CPAR: Classification based on predictive association rules. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 331--335.Google ScholarGoogle ScholarCross RefCross Ref
  136. Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. 2015. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015).Google ScholarGoogle Scholar
  137. Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision. Springer, 818--833.Google ScholarGoogle Scholar
  138. Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2016. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016).Google ScholarGoogle Scholar
  139. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2921--2929.Google ScholarGoogle ScholarCross RefCross Ref
  140. Yichen Zhou and Giles Hooker. 2016. Interpreting models via single tree approximation. arXiv preprint arXiv:1610.09036 (2016).Google ScholarGoogle Scholar
  141. Zhi-Hua Zhou, Yuan Jiang, and Shi-Fu Chen. 2003. Extracting symbolic rules from trained neural network ensembles. AI Commun. 16, 1 (2003), 3--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. Alexander Zien, Nicole Krämer, Sören Sonnenburg, and Gunnar Rätsch. 2009. The feature importance ranking measure. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 694--709.Google ScholarGoogle ScholarCross RefCross Ref
  143. Luisa M. Zintgraf, Taco S. Cohen, Tameem Adel, and Max Welling. 2017. Visualizing deep neural network decisions: Prediction difference analysis. arXiv preprint arXiv:1702.04595 (2017).Google ScholarGoogle Scholar

Index Terms

  1. A Survey of Methods for Explaining Black Box Models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 51, Issue 5
      September 2019
      791 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/3271482
      • Editor:
      • Sartaj Sahni
      Issue’s Table of Contents

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 August 2018
      • Revised: 1 June 2018
      • Accepted: 1 June 2018
      • Received: 1 January 2018
      Published in csur Volume 51, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • survey
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format