Abstract
Many machine learning applications involve predictive data-analytic modeling using black-box techniques. A common problem in such studies is understanding/interpretation of estimated nonlinear high-dimensional models. Whereas human users naturally favor simple interpretable models, such models may not be practically feasible with modern adaptive methods such as Support Vector Machines (SVMs) , Multilayer Perceptron Networks (MLPs), AdaBoost , etc. This chapter provides a brief survey of the current techniques for visualization and interpretation of SVM-based classification models, and then highlights potential problems with such methods. We argue that, under the VC-theoretical framework, model interpretation cannot be achieved via technical analysis of predictive data-analytic models. That is, any meaningful interpretation should incorporate application domain knowledge outside data analysis. We also describe a simple graphical technique for visualization of SVM classification models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adams, N.M., Hand, D.J.: Improving the practice of classifier performance assessment. Neural Comput. 12(2), 305–311 (2000)
Ahn, J., Marron, J.: The maximal data piling direction for discrimination. Biometrika 97(1), 254–259 (2010)
Barakat, N., Bradley, A.: Rule-extraction from support vector machines: a review. Neurocomputing 74(1–3), 178–190 (2010)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)
Bradley, A.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)
Breiman, L.: Statistical modeling: the two cultures. Stat. Sci. 16(3), 199–231 (2001)
Caragea, D., Cook, D., Honavar, V.G.: Gaining insights into support vector machine pattern classifiers using projection-based tour methods. In: Proceedings of KDD, pp. 251–256 (2001)
Cherkassky, V.: Predictive learning, knowledge discovery and philosophy of science (invited lecture). In: Lin, J., et al. (eds.) Advances in Computational Intelligence, vol. 7311, pp. 209–233. Springer, Berlin (2012)
Cherkassky, V.: Predictive Learning. http://VCtextbook.com (2013)
Cherkassky, V., Dhar, S.: Simple method for interpretation of high-dimensional nonlinear SVM classification models. In: Proceedings of the 2010 International Conference on Data Mining (DMIN 2010), pp. 267–272 (2010)
Cherkassky, V., Dhar, S.: Market timing of international mutual funds: a decade after the scandal. In: Proceedings of Computational Intelligence for Financial Engineering and Economics, pp. 1–8 (2012)
Cherkassky, V., Dhar, S., Dai, W.: Practical conditions for effectiveness of the universum learning. IEEE Trans. Neural Netw. 22(8), 1241–1255 (2011)
Cherkassky, V., Mulier, F.: Learning from Data: Concepts, Theory, and Methods. Wiley, New York (1998)
Cherkassky, V., Mulier, F.: Learning from Data: Concepts, Theory, and Methods, 2nd edn. Wiley, New York (2007)
Cook, D., Swayne, D.F.: Interactive and Dynamic Graphics for Data Analysis: With Examples Using R and GGobi. Springer, New York (2007)
Diederich, J.: Rule Extraction from Support Vector Machines. Springer, Berlin (2008)
Fisher, R.: The logic of inductive inference. J. R. Stat. Soc. 98(1), 39–82 (1935)
Fisher, R.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1935)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Hand, D.J.: Classifier technology and the illusion of progress. Stat. Sci. 21(1), 1–14 (2006)
Hand, D.J., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York (2001)
Martens, D., Provost, F.: Explaining documents’ classifications. http://pages.stern.nyu.edu/~fprovost/Papers/martens-CeDER-11-01.pdf (2011)
Martens, D., Baesens, B., Gestel, T.: Decompositional rule extraction from support vector machines by active learning. IEEE Trans. Knowl. Data Eng. 21(2), 178–191 (2009)
Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A., et al. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (2000)
Poulet, F.: SVM and graphical algorithms: a cooperative approach. In: Proceedings of the Fourth IEEE International Conference on Data Mining, pp. 499–502 (2004)
Roweis, S.: Data for MATLAB hackers. http://www.cs.nyu.edu/~roweis/data.html
Suykens, J.A.K., Van Gestel, T., de Brabanter, J., de Moor, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific, Singapore (2002)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Boston (2006)
United States Congress: Mutual funds: who’s looking out for investors? http://www.access.gpo.gov/congress/house/pdf/108hrg/92982.pdf. Accessed on 10 Mar 2014
Vapnik, V.: The Nature of Statistical Learning theory. Springer, New York (1995)
Vapnik, V.N.: : Estimation of Dependences Based on Empirical Data. Empirical Inference Science: Afterword of 2006. Springer, New York (2006)
Wang, X., Wu, S., Li, Q.: SVMV—a novel algorithm for the visualization of SVM classification results. In: Wang, J., et al. (eds.) Advances in Neural Networks. Lecture Notes in Computer Science, vol. 3971, pp. 968–973. Springer, Berlin (2006)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, Amsterdam (2005)
Zitzewitz, E.: Who cares about shareholders? Arbitrage proofing mutual funds. J. Law Econ. Organ. 19(2), 245–280 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Cherkassky, V., Dhar, S. (2015). Interpretation of Black-Box Predictive Models. In: Vovk, V., Papadopoulos, H., Gammerman, A. (eds) Measures of Complexity. Springer, Cham. https://doi.org/10.1007/978-3-319-21852-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-21852-6_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21851-9
Online ISBN: 978-3-319-21852-6
eBook Packages: Computer ScienceComputer Science (R0)