survey

Open Access

A Survey of Methods for Explaining Black Box Models

Authors:
Riccardo Guidotti

KDDLab, University of Pisa, Pisa, PI, Italy

KDDLab, University of Pisa, Pisa, PI, Italy

0000-0002-2827-7613
View Profile

,
Anna Monreale

KDDLab, University of Pisa, Pisa, PI, Italy

KDDLab, University of Pisa, Pisa, PI, Italy
View Profile

,
Salvatore Ruggieri

KDDLab, University of Pisa, Pisa, PI, Italy

KDDLab, University of Pisa, Pisa, PI, Italy
View Profile

,
Franco Turini

KDDLab, University of Pisa, Pisa, PI, Italy

KDDLab, University of Pisa, Pisa, PI, Italy
View Profile

,
Fosca Giannotti

KDDLab, ISTI-CNR, Pisa, PI, Italy

KDDLab, ISTI-CNR, Pisa, PI, Italy
View Profile

,
Dino Pedreschi

KDDLab, University of Pisa, Pisa, PI, Italy

KDDLab, University of Pisa, Pisa, PI, Italy
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 51 Issue 5Article No.: 93pp 1–42https://doi.org/10.1145/3236009

Published:22 August 2018Publication History

ACM Computing Surveys

Abstract

In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.

Supplemental Material

Available for Download

zip

guidotti.zip (46.9 KB)

Supplemental movie, appendix, image and software files for, A Survey of Methods for Explaining Black Box Models

References

Julius Adebayo and Lalana Kagal. 2016. Iterative orthogonal feature projection for diagnosing bias in black-box models. arXiv preprint arXiv:1611.04967.Google Scholar
Philip Adler, Casey Falk, Sorelle A. Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, and Suresh Venkatasubramanian. 2016. Auditing black-box models for indirect influence. In Proceedings of the IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, Springer, 1--10.Google ScholarCross Ref
Rakesh Agrawal, Ramakrishnan Srikant et al. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), Vol. 1215. 487--499. Google ScholarDigital Library
Yousra Abdul Alsahib S. Aldeen, Mazleena Salleh, and Mohammad Abdur Razzaque. 2015. A comprehensive review on privacy preserving data mining. SpringerPlus 4, 1 (2015), 694.Google ScholarCross Ref
Robert Andrews, Joachim Diederich, and Alan B. Tickle. 1995. Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl.-Based Syst. 8, 6 (1995), 373--389. Google ScholarDigital Library
M. Gethsiyal Augasta and T. Kathirvalavakumar. 2012. Reverse engineering the neural networks for rule extraction in classification problems. Neural Process. Lett. 35, 2 (2012), 131--150. Google ScholarDigital Library
Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS One 10, 7 (2015), e0130140.Google ScholarCross Ref
David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert MÃžller. 2010. How to explain individual classification decisions. J. Mach. Learn. Res. 11(June 2010), 1803--1831. Google ScholarDigital Library
Jacob Bien and Robert Tibshirani. 2011. Prototype selection for interpretable classification. Ann. Appl. Stat. 5, 4 (2011), 2403--2424.Google ScholarCross Ref
Marko Bohanec and Ivan Bratko. 1994. Trading accuracy for simplicity in decision trees. Mach. Learn. 15, 3 (1994), 223--250. Google ScholarDigital Library
Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Bernhard Firner, Larry Jackel, Urs Muller, and Karol Zieba. 2016. VisualBackProp: Visualizing CNNs for autonomous driving. CoRR, Vol. abs/1611.05418 (2016).Google Scholar
Olcay Boz. 2002. Extracting decision trees from trained neural networks. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 456--461. Google ScholarDigital Library
Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC Press.Google Scholar
Aylin Caliskan-Islam, Joanna J. Bryson, and Arvind Narayanan. 2016. Semantics derived automatically from language corpora necessarily contain human biases. arXiv preprint arXiv:1608.07187 (2016).Google Scholar
Carolyn Carter, Elizabeth Renuart, Margot Saunders, and Chi Chi Wu. 2006. The credit card market and regulation: In need of repair. NC Bank. Inst. 10 (2006), 23.Google Scholar
Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1721--1730. Google ScholarDigital Library
H. A. Chipman, E. I. George, and R. E. McCulloh. 1998. Making sense of a forest of trees. In Proceedings of the 30th Symposium on the Interface, S. Weisberg (Ed.). Fairfax Station, VA: Interface Foundation of North America, 84--92.Google Scholar
Paulo Cortez and Mark J. Embrechts. 2011. Opening black box data mining models using sensitivity analysis. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM’11). IEEE, 341--348.Google Scholar
Paulo Cortez and Mark J. Embrechts. 2013. Using sensitivity analysis and visualization techniques to open black box data mining models. Info. Sci. 225 (2013), 1--17. Google ScholarDigital Library
Paulo Cortez, Juliana Teixeira, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. 2009. Using data mining for wine quality assessment. In Discovery Science, Vol. 5808. Springer, 66--79. Google ScholarDigital Library
Mark Craven and Jude W. Shavlik. 1994. Using sampling and queries to extract rules from trained neural networks. In Proceedings of the International Conference on Machine Learning (ICML’94). 37--45. Google ScholarDigital Library
Mark Craven and Jude W. Shavlik. 1996. Extracting tree-structured representations of trained networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 24--30. Google ScholarDigital Library
David Danks and Alex John London. 2017. Regulating autonomous systems: Beyond standards. IEEE Intell. Syst. 32, 1 (2017), 88--91. Google ScholarDigital Library
Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Proceedings of the IEEE Symposium on Security and Privacy (SP’16). IEEE, 598--617.Google ScholarCross Ref
Houtao Deng. 2014. Interpreting tree ensembles with intrees. arXiv preprint arXiv:1408.5456 (2014).Google Scholar
Pedro Domingos. 1998. Knowledge discovery via multiple models. Intell. Data Anal. 2, 1--4 (1998), 187--202. Google ScholarDigital Library
Pedro Domingos. 1998. Occam’s two razors: The sharp and the blunt. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’98). 37--43. Google ScholarDigital Library
Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv:1702.08608v2.Google Scholar
Strumbelj Erik and Igor Kononenko. 2010. An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11(Jan. 2010), 1--18. Google ScholarDigital Library
Ruth Fong and Andrea Vedaldi. 2017. Interpretable explanations of black boxes by meaningful perturbation. arXiv preprint arXiv:1704.03296 (2017).Google Scholar
Eibe Frank and Ian H. Witten. 1998. Generating accurate rule sets without global optimization. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML'98). 144--151. Google ScholarDigital Library
Alex A. Freitas. 2014. Comprehensible classification models: A position paper. ACM SIGKDD Explor. Newslett. 15, 1 (2014), 1--10. Google ScholarDigital Library
Glenn Fung, Sathyakama Sandilya, and R. Bharat Rao. 2005. Rule extraction from linear support vector machines. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining. ACM, 32--40. Google ScholarDigital Library
Robert D. Gibbons, Giles Hooker, Matthew D. Finkelman, David J. Weiss, Paul A. Pilkonis, Ellen Frank, Tara Moore, and David J. Kupfer. 2013. The CAD-MDD: A computerized adaptive diagnostic screening tool for depression. J. Clin. Psych. 74, 7 (2013), 669.Google ScholarCross Ref
Alex Goldstein, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. J. Comput. Graph. Stat. 24, 1 (2015), 44--65.Google ScholarCross Ref
Bryce Goodman and Seth Flaxman. 2016. EU regulations on algorithmic decision-making and a “right to explanation.” In Proceedings of the ICML Workshop on Human Interpretability in Machine Learning (WHI’16). Retrieved from http://arxiv. org/abs/1606.08813 v1.Google Scholar
Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Dino Pedreschi, Franco Turini, and Fosca Giannotti. 2018. Local rule-based explanations of black box decision systems. arXiv preprint arXiv:1805.10820 (2018).Google Scholar
Satoshi Hara and Kohei Hayashi. 2016. Making tree ensembles interpretable. arXiv preprint arXiv:1606.05390 (2016).Google Scholar
Stefan Haufe, Frank Meinecke, Kai Görgen, Sven Dähne, John-Dylan Haynes, Benjamin Blankertz, and Felix Bießmann. 2014. On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage 87 (2014), 96--110.Google ScholarCross Ref
Andreas Henelius, Kai Puolamäki, Henrik Boström, Lars Asker, and Panagiotis Papapetrou. 2014. A peek into the black box: Exploring classifiers by randomization. Data Min. Knowl. Discov. 28, 5--6 (2014), 1503--1529. Google ScholarDigital Library
Jake M. Hofman, Amit Sharma, and Duncan J. Watts. 2017. Prediction and explanation in social systems. Science 355, 6324 (2017), 486--488.Google Scholar
Giles Hooker. 2004. Discovering additive structure in black box functions. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 575--580. Google ScholarDigital Library
Johan Huysmans, Karel Dejaeger, Christophe Mues, Jan Vanthienen, and Bart Baesens. 2011. An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decis. Supp. Syst. 51, 1 (2011), 141--154. Google ScholarDigital Library
U. Johansson, R. König, and L. Niklasson. 2003. Rule extraction from trained neural networks using genetic programming. In Proceedings of the 13th International Conference on Artificial Neural Networks. 13--16.Google Scholar
Ulf Johansson, Rikard König, and Lars Niklasson. 2004. The truth is in there-rule extraction from opaque models using genetic programming. In Proceedings of the FLAIRS Conference. 658--663.Google Scholar
Ulf Johansson and Lars Niklasson. 2009. Evolving decision trees using oracle guides. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM’09). IEEE, 238--244.Google ScholarCross Ref
Ulf Johansson, Lars Niklasson, and Rikard König. 2004. Accuracy vs. comprehensibility in data mining models. In Proceedings of the 7th International Conference on Information Fusion, Vol. 1. 295--300.Google Scholar
Hiroharu Kato and Tatsuya Harada. 2014. Image reconstruction from bag-of-visual-words. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 955--962. Google ScholarDigital Library
Been Kim, Elena Glassman, Brittney Johnson, and Julie Shah. 2015. iBCM: Interactive Bayesian case model empowering humans via intuitive interaction. Technical Report: MIT-CSAIL-TR-2015-010.Google Scholar
Been Kim, Oluwasanmi O. Koyejo, and Rajiv Khanna. 2016. Examples are not enough, learn to criticize&excl; criticism for interpretability. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2280--2288. Google ScholarDigital Library
Been Kim, Cynthia Rudin, and Julie A. Shah. 2014. The Bayesian case model: A generative approach for case-based reasoning and prototype classification. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1952--1960. Google ScholarDigital Library
Been Kim, Julie A. Shah, and Finale Doshi-Velez. 2015. Mind the gap: A generative approach to interpretable feature selection and extraction. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2260--2268. Google ScholarDigital Library
John K. C. Kingston. 2016. Artificial intelligence and legal liability. In Proceedings of the Specialist Group on Artificial Intelligence Conference (SGAI’16). Springer, 269--279.Google ScholarCross Ref
Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. arXiv preprint arXiv:1703.04730 (2017).Google ScholarDigital Library
Josua Krause, Adam Perer, and Kenney Ng. 2016. Interacting with predictions: Visual inspection of black-box machine learning models. In Proceedings of the CHI Conference on Human Factors in Computing Systems. ACM, 5686--5697. Google ScholarDigital Library
Samantha Krening, Brent Harrison, Karen M. Feigh, Charles Lee Isbell, Mark Riedl, and Andrea Thomaz. 2017. Learning from explanations using sentiment and advice in RL. IEEE Trans. Cogn. Dev. Syst. 9, 1 (2017), 44--55.Google ScholarCross Ref
R. Krishnan, G. Sivakumar, and P. Bhattacharya. 1999. Extracting decision trees from trained neural networks. Pattern Recogn. 32, 12 (1999).Google Scholar
Sanjay Krishnan and Eugene Wu. 2017. PALM: Machine learning explanations for iterative debugging. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. ACM, 4. Google ScholarDigital Library
Joshua A. Kroll, Joanna Huey, Solon Barocas, Edward W. Felten, Joel R. Reidenberg, David G. Robinson, and Harlan Yu. 2017. Accountable algorithms. U. Penn. Law Rev. 165 (2017), 633--705.Google Scholar
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).Google Scholar
Himabindu Lakkaraju, Stephen H. Bach, and Jure Leskovec. 2016. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1675--1684. Google ScholarDigital Library
Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2017. Interpretable 8 explorable approximations of black box models. arXiv preprint arXiv:1707.01154 (2017).Google Scholar
Himabindu Lakkaraju, Jon Kleinberg, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2017. The selective labels problem: Evaluating algorithmic predictions in the presence of unobservables. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 275--284. Google ScholarDigital Library
Will Landecker, Michael D. Thomure, Luís M. A. Bettencourt, Melanie Mitchell, Garrett T. Kenyon, and Steven P. Brumby. 2013. Interpreting individual classifications of hierarchical networks. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM’13). IEEE, 32--38.Google Scholar
Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing neural predictions. arXiv preprint arXiv:1606.04155 (2016).Google Scholar
Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, David Madigan et al. 2015. Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Ann. Appl. Stat. 9, 3 (2015), 1350--1371.Google ScholarCross Ref
Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi. 2017. Deep text classification can be fooled. arXiv preprint arXiv:1704.08006 (2017).Google Scholar
Zachary C. Lipton. 2016. The mythos of model interpretability. arXiv preprint arXiv:1606.03490 (2016).Google Scholar
Yin Lou, Rich Caruana, and Johannes Gehrke. 2012. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 150--158. Google ScholarDigital Library
Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. 2013. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 623--631. Google ScholarDigital Library
Stella Lowry and Gordon Macpherson. 1988. A blot on the profession. Brit. Med. J. Clin. Res. 296, 6623 (1988), 657.Google ScholarCross Ref
Aravindh Mahendran and Andrea Vedaldi. 2015. Understanding deep image representations by inverting them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5188--5196.Google ScholarCross Ref
Aravindh Mahendran and Andrea Vedaldi. 2016. Visualizing deep convolutional neural networks using natural pre-images. Int. J. Comput. Vis. 120, 3 (2016), 233--255. Google ScholarDigital Library
Gianclaudio Malgieri and Giovanni Comandé. 2017. Why a right to legibility of automated decision-making exists in the general data protection regulation. Int. Data Priv. Law 7, 4 (2017), 243--265.Google ScholarCross Ref
Dmitry M. Malioutov, Kush R. Varshney, Amin Emad, and Sanjeeb Dash. 2017. Learning interpretable classification rules with boolean compressed sensing. In Transparent Data Mining for Big and Small Data. Springer, 95--121.Google Scholar
David Martens, Bart Baesens, Tony Van Gestel, and Jan Vanthienen. 2007. Comprehensible credit scoring models using rule extraction from support vector machines. Eur. J. Operat. Res. 183, 3 (2007), 1466--1476.Google ScholarCross Ref
David Martens, Jan Vanthienen, Wouter Verbeke, and Bart Baesens. 2011. Performance of classification models from a user perspective. Decis. Support Syst. 51, 4 (2011), 782--793. Google ScholarDigital Library
Grégoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, and Klaus-Robert Müller. 2017. Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recogn. 65 (2017), 211--222. Google ScholarDigital Library
Patrick M. Murphy and Michael J. Pazzani. 1991. ID2-of-3: Constructive induction of M-of-N concepts for discriminators in decision trees. In Proceedings of the 8th International Workshop on Machine Learning. 183--187. Google ScholarDigital Library
Anh Nguyen, Alexey Dosovitskiy, Jason Yosinski, Thomas Brox, and Jeff Clune. 2016. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 3387--3395. Google ScholarDigital Library
Anh Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 427--436.Google ScholarCross Ref
Haydemar Núñez, Cecilio Angulo, and Andreu Català. 2002. Rule extraction from support vector machines. In Proceedings of the European Symposium on Artificial Neural Networks (ESANN’02). 107--112.Google Scholar
Julian D. Olden and Donald A. Jackson. 2002. Illuminating the “black box”: A randomization approach for understanding variable contributions in artificial neural networks. Ecol. Model. 154, 1 (2002), 135--150.Google ScholarCross Ref
Fernando E. B. Otero and Alex A. Freitas. 2013. Improving the interpretability of classification rules discovered by an ant colony algorithm. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation. ACM, 73--80. Google ScholarDigital Library
Gisele L. Pappa, Anthony J. Baines, and Alex A. Freitas. 2005. Predicting post-synaptic activity in proteins with data mining. Bioinformatics 21, suppl. 2 (2005), ii19--ii25. Google ScholarDigital Library
Frank Pasquale. 2015. The Black Box Society: The Secret Algorithms that Control Money and Information. Harvard University Press. Google Scholar
Michael J. Pazzani, S. Mani, William R. Shankle et al. 2001. Acceptance of rules generated by machine learning among medical experts. Methods Info. Med. 40, 5 (2001), 380--385.Google Scholar
Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. 2008. Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 560--568. Google ScholarDigital Library
Brett Poulin, Roman Eisner, Duane Szafron, Paul Lu, Russell Greiner, David S. Wishart, Alona Fyshe, Brandon Pearcy, Cam MacDonell, and John Anvik. 2006. Visual explanation of evidence with additive classifiers. In Proceedings of the National Conference on Artificial Intelligence, Vol. 21. Google ScholarDigital Library
J. Ross Quinlan. 1987. Generating production rules from decision trees. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’87), Vol. 87. 304--307. Google ScholarDigital Library
J. Ross Quinlan. 1987. Simplifying decision trees. Int. J. Man-Mach. Stud. 27, 3 (1987), 221--234. Google ScholarDigital Library
J Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Elsevier. Google ScholarDigital Library
J. Ross Quinlan. 1999. Simplifying decision trees. Int. J. Hum.-Comput. Stud. 51, 2 (1999), 497--510. Google ScholarDigital Library
J Ross Quinlan and R. Mike Cameron-Jones. 1993. FOIL: A midterm report. In Proceedings of the European Conference on Machine Learning. Springer, 1--20. Google ScholarDigital Library
Alec Radford, Rafal Jozefowicz, and Ilya Sutskever. 2017. Learning to generate reviews and discovering sentiment. arXiv preprint arXiv:1704.01444 (2017).Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016).Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Nothing else matters: Model-agnostic explanations by identifying prediction invariance. arXiv preprint arXiv:1611.05817 (2016).Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144. Google ScholarDigital Library
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google Scholar
Andrea Romei and Salvatore Ruggieri. 2014. A multidisciplinary survey on discrimination analysis. Knowl. Eng. Rev. 29, 5 (2014), 582--638.Google ScholarCross Ref
Salvatore Ruggieri. 2012. Subtree replacement in decision tree simplification. In Proceedings of the 12th SIAM International Conference on Data Mining. SIAM, 379--390.Google ScholarCross Ref
Andrea Saltelli. 2002. Sensitivity analysis for importance assessment. Risk Anal. 22, 3 (2002), 579--590.Google ScholarCross Ref
Wojciech Samek, Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, and Klaus-Robert Müller. 2017. Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28, 11 (2017), 2660--2673.Google ScholarCross Ref
Vitaly Schetinin, Jonathan E. Fieldsend, Derek Partridge, Timothy J. Coats, Wojtek J. Krzanowski, Richard M. Everson, Trevor C. Bailey, and Adolfo Hernandez. 2007. Confident interpretation of Bayesian decision tree ensembles for clinical applications. IEEE Trans. Info. Technol. Biomed. 11, 3 (2007), 312--319. Google ScholarDigital Library
Christin Seifert, Aisha Aamir, Aparna Balagopalan, Dhruv Jain, Abhinav Sharma, Sebastian Grottel, and Stefan Gumhold. 2017. Visualizations of deep neural networks in computer vision: A survey. In Transparent Data Mining for Big and Small Data. Springer, 123--144.Google Scholar
Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, and Dhruv Batra. 2016. Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. arXiv preprint arXiv:1610.02391 (2016).Google Scholar
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685 (2017).Google ScholarDigital Library
Ravid Shwartz-Ziv and Naftali Tishby. 2017. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810 (2017).Google Scholar
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).Google Scholar
Sameer Singh, Marco Tulio Ribeiro, and Carlos Guestrin. 2016. Programs as black-box explanations. arXiv preprint arXiv:1611.07579 (2016).Google Scholar
Sören Sonnenburg, Alexander Zien, Petra Philips, and G. Rätsch. 2008. POIMs: Positional oligomer importance matrices—understanding support vector machine-based signal detectors. Bioinformatics 24, 13 (2008), i6--i14. Google ScholarDigital Library
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2014. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806 (2014).Google Scholar
Irene Sturm, Sebastian Lapuschkin, Wojciech Samek, and Klaus-Robert Müller. 2016. Interpretable deep neural networks for single-trial eeg classification. J. Neurosci. Methods 274 (2016), 141--145.Google ScholarCross Ref
Guolong Su, Dennis Wei, Kush R. Varshney, and Dmitry M. Malioutov. 2015. Interpretable two-level Boolean rule learning for classification. arXiv preprint arXiv:1511.07361 (2015).Google Scholar
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. arXiv preprint arXiv:1703.01365 (2017).Google ScholarDigital Library
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).Google Scholar
Hui Fen Tan, Giles Hooker, and Martin T. Wells. 2016. Tree space prototypes: Another look at making tree ensembles interpretable. arXiv preprint arXiv:1611.07115 (2016).Google Scholar
Pang-Ning Tan et al. 2006. Introduction to Data Mining. Pearson Education, India.Google Scholar
Jayaraman J. Thiagarajan, Bhavya Kailkhura, Prasanna Sattigeri, and Karthikeyan Natesan Ramamurthy. 2016. TreeView: Peeking into deep neural networks via feature-space partitioning. arXiv preprint arXiv:1611.07429 (2016).Google Scholar
Nava Tintarev and Judith Masthoff. 2015. Explaining recommendations: Design and evaluation. In Recommender Systems Handbook, Francesco Ricci, Lior Rokach, and Bracha Shapira (Eds.). Springer, 353--382.Google Scholar
Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, and Mounia Lalmas. 2017. Interpretable predictions of tree-based ensembles via actionable feature tweaking. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 465--474. Google ScholarDigital Library
Ryan Turner. 2016. A model explanation system. In Proceedings of the IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP’16). IEEE, 1--6.Google ScholarCross Ref
Wouter Verbeke, David Martens, Christophe Mues, and Bart Baesens. 2011. Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Syst. Appl. 38, 3 (2011), 2354--2364. Google ScholarDigital Library
Marina M.-C. Vidovic, Nico Görnitz, Klaus-Robert Müller, and Marius Kloft. 2016. Feature importance measure for non-linear learning algorithms. arXiv preprint arXiv:1611.07567 (2016).Google Scholar
Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, and Antonio Torralba. 2013. Hoggles: Visualizing object detection features. In Proceedings of the IEEE International Conference on Computer Vision. 1--8. Google ScholarDigital Library
Sandra Wachter, Brent Mittelstadt, and Luciano Floridi. 2017. Why a right to explanation of automated decision-making does not exist in the general data protection regulation. Int. Data Priv. Law 7, 2 (2017), 76--99.Google ScholarCross Ref
Fulton Wang and Cynthia Rudin. 2015. Falling rule lists. In Proceedings of the Conference on Artificial Intelligence and Statistics. 1013--1022.Google Scholar
Jialei Wang, Ryohei Fujimaki, and Yosuke Motohashi. 2015. Trading interpretability for accuracy: Oblique treed sparse additive models. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1245--1254. Google ScholarDigital Library
Tong Wang. 2017. Multi-value rule sets. arXiv preprint arXiv:1710.05257 (2017).Google Scholar
Tong Wang, Cynthia Rudin, Finale Velez-Doshi, Yimin Liu, Erica Klampfl, and Perry MacNeille. 2016. Bayesian rule sets for interpretable classification. In Proceedings of the IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 1269--1274.Google ScholarCross Ref
Philippe Weinzaepfel, Hervé Jégou, and Patrick Pérez. 2011. Reconstructing an image from its local descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, 337--344. Google ScholarDigital Library
Adrian Weller. 2017. Challenges for transparency. arXiv preprint arXiv:1708.01870 (2017).Google Scholar
Dietrich Wettschereck, David W. Aha, and Takao Mohri. 1997. A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. In Lazy Learning. Springer, 273--314. Google ScholarDigital Library
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning. 2048--2057. Google ScholarDigital Library
Xiaoxin Yin and Jiawei Han. 2003. CPAR: Classification based on predictive association rules. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 331--335.Google ScholarCross Ref
Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. 2015. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015).Google Scholar
Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision. Springer, 818--833.Google Scholar
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2016. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016).Google Scholar
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2921--2929.Google ScholarCross Ref
Yichen Zhou and Giles Hooker. 2016. Interpreting models via single tree approximation. arXiv preprint arXiv:1610.09036 (2016).Google Scholar
Zhi-Hua Zhou, Yuan Jiang, and Shi-Fu Chen. 2003. Extracting symbolic rules from trained neural network ensembles. AI Commun. 16, 1 (2003), 3--15. Google ScholarDigital Library
Alexander Zien, Nicole Krämer, Sören Sonnenburg, and Gunnar Rätsch. 2009. The feature importance ranking measure. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 694--709.Google ScholarCross Ref
Luisa M. Zintgraf, Taco S. Cohen, Tameem Adel, and Max Welling. 2017. Visualizing deep neural network decisions: Prediction difference analysis. arXiv preprint arXiv:1702.04595 (2017).Google Scholar

Index Terms

A Survey of Methods for Explaining Black Box Models
1. Information systems
  1. Information systems applications
    1. Decision support systems
      1. Data analytics

Recommendations

A Cloud-Based Black-Box Solar Predictor for Smart Homes
Special Issue on Smart Homes, Buildings and Infrastructures

The popularity of rooftop solar for homes is rapidly growing. However, accurately forecasting solar generation is critical to fully exploiting the benefits of locally generated solar energy. In this article, we present two machine-learning techniques to ...
Read More
Benchmarking and survey of explanation methods for black box models
Abstract
The rise of sophisticated black-box machine learning models in Artificial Intelligence systems has prompted the need for explanation methods that reveal how these models work in an understandable way to users and decision makers. Unsurprisingly, ...
Read More
Procrustes methods

The basic Procrustes problem is to transform a matrix \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$\mathbf{X}_{1}$\end{document} to \documentclass{article}\...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 51, Issue 5
September 2019
791 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3271482
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 August 2018
- Revised: 1 June 2018
- Accepted: 1 June 2018
- Received: 1 January 2018
Published in csur Volume 51, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Open the black box
explanations
interpretability
transparent models
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2,012
  Total Citations
  View Citations
- 56,849
  Total Downloads
- Downloads (Last 12 months)9,738
- Downloads (Last 6 weeks)1,217
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Survey of Methods for Explaining Black Box Models

ACM Computing Surveys

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

A Cloud-Based Black-Box Solar Predictor for Smart Homes

Benchmarking and survey of explanation methods for black box models

Procrustes methods