Skip to main content
Log in

Evaluating generalization through interval-based neural network inversion

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Typically, measuring the generalization ability of a neural network relies on the well-known method of cross-validation which statistically estimates the classification error of a network architecture thus assessing its generalization ability. However, for a number of reasons, cross-validation does not constitute an efficient and unbiased estimator of generalization and cannot be used to assess generalization of neural network after training. In this paper, we introduce a new method for evaluating generalization based on a deterministic approach revealing and exploiting the network’s domain of validity. This is the area of the input space containing all the points for which a class-specific network output provides values higher than a certainty threshold. The proposed approach is a set membership technique which defines the network’s domain of validity by inverting its output activity on the input space. For a trained neural network, the result of this inversion is a set of hyper-boxes which constitute a reliable and \(\varepsilon\)-accurate computation of the domain of validity. Suitably defined metrics on the volume of the domain of validity provide a deterministic estimation of the generalization ability of the trained network not affected by random test set selection as with cross-validation. The effectiveness of the proposed generalization measures is demonstrated on illustrative examples using artificial and real datasets using swallow feed-forward neural networks such as Multi-layer perceptrons.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Abbreviations

HPD:

Highest posterior density

INTLAB:

INTerval LABoratory

IA:

Interval analysis

MLP:

Multi-layer perceptron

OTS:

Off training set

PDF:

Probability density function

SCS:

Set computations with subpavings

SIVIA:

Set inversion via interval analysis

References

  1. Adam SP, Karras DA, Magoulas GD, Vrahatis MN (2015) Reliable estimation of a neural network’s domain of validity through interval analysis based inversion. In: 2015 international joint conference on neural networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN.2015.7280794

  2. Adam SP, Likas AC, Vrahatis MN (2017) Interval analysis based neural network inversion: a means for evaluating generalization. In: Boracchi G, Iliadis L, Jayne C, Likas A (eds) Engineering applications of neural networks. Springer International Publishing, Berlin, pp 314–326

    Chapter  Google Scholar 

  3. Adam SP, Magoulas GD, Karras DA, Vrahatis MN (2016) Bounding the search space for global optimization of neural networks learning error: an interval analysis approach. J Mach Learn Res 17(169):1–40. http://jmlr.org/papers/v17/14-350.html

  4. Bishop CM (1996) Neural networks for pattern recognition. Oxford University Press, Oxford

    MATH  Google Scholar 

  5. Courrieu P (1994) Three algorithms for estimating the domain of validity of feedforward neural networks. Neural Netw 7(1):169–174

    Article  Google Scholar 

  6. Eberhart R, Dobbins R (1991) Designing neural network explanation facilities using genetic algorithms. In: 1991 IEEE international joint conference on neural networks, vol 2, pp 1758–1763

  7. Hampshire II JB, Pearlmutter BA (1991) Equivalence proofs for multilayer perceptron classifiers and the Bayesian discriminant function. In: Proceedings of the 1990 connectionist models summer school, vol 1, pp 159–172

    Chapter  Google Scholar 

  8. Hassoun MH (1995) Fundamentals of artificial neural networks. MIT Press, Cambridge

    MATH  Google Scholar 

  9. Haykin S (1999) Neural networks a comprehensive foundation, 2nd edn. Prentice-Hall, Upper Saddle River, NJ

    MATH  Google Scholar 

  10. Hernández-Espinosa C, Fernández-Redondo M, Ortiz-Gómez M (2003) Inversion of a Neural Network via Interval Arithmetic for Rule Extraction. In: Kaynak O, Alpaydin E, Oja E, Xu L (eds) Artificial Neural Networks and Neural Information Processing ICANN/ICONIP 2003, vol 2714. Springer, Berlin Heidelberg, pp 670–677 Lecture Notes in Computer Science

    Chapter  MATH  Google Scholar 

  11. Jaulin L, Kieffer M, Didrit O, Walter E (2001) Applied interval analysis with examples in parameter and state estimation, robust control and robotics. Springer, London

    MATH  Google Scholar 

  12. Jaulin L, Walter E (1993) Set inversion via interval analysis for nonlinear bounded-error estimation. Automatica 29(4):1053–1064

    Article  MathSciNet  MATH  Google Scholar 

  13. Jensen C, Reed R, Marks R, El-Sharkawi M, Jung JB, Miyamoto R, Anderson G, Eggen C (1999) Inversion of feedforward neural networks: algorithms and applications. In: Proceedings of the IEEE 87(9):1536–1549

    Article  Google Scholar 

  14. Kamimura R (2017) Mutual information maximization for improving and interpreting multi-layered neural networks. In: 2017 IEEE symposium series on computational intelligence (SSCI), pp 1–7

  15. Karystinos GN, Pados DA (2000) On overfitting, generalization, and randomly expanded training sets. IEEE Trans Neural Netw 11(5):1050–1057

    Article  Google Scholar 

  16. Kearfott RB (1996) Interval computations: introduction, uses, and resources. Euromath Bull 2(1):95–112

    MathSciNet  Google Scholar 

  17. Kiefer J, Wolfowitz J (1952) Stochastic estimation of the maximum of a regression function. Ann Math Stat 23:462–466

    Article  MathSciNet  MATH  Google Scholar 

  18. Kindermann J, Linden A (1990) Inversion of neural networks by gradient descent. Parallel Comput 14(3):277–286

    Article  Google Scholar 

  19. Likas A (2001) Probability density estimation using artificial neural networks. Comput Phys Commun 135(2):167–175

    Article  MATH  Google Scholar 

  20. Liu Y (1995) Unbiased estimate of generalization error and model selection in neural network. Neural Netw 8(2):215–219

    Article  MathSciNet  Google Scholar 

  21. Lu BL, Kita H, Nishikawa Y (1999) Inverting feedforward neural networks using linear and nonlinear programming. IEEE Trans Neural Netw 10(6):1271–1290

    Article  Google Scholar 

  22. Novak R, Bahri Y, Abolafia DA, Pennington J, Sohl-Dickstein J (2018) Sensitivity and generalization in neural networks: an empirical study. In: International conference on learning representations. https://openreview.net/forum?id=HJC2SzZCW

  23. Reed R, Marks R (1995) An evolutionary algorithm for function inversion and boundary marking. In: IEEE international conference on evolutionary computation, 1995, vol  2, pp 794–797

  24. Richard M, Lippmann R (1991) Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Comput 3(4):461–483. https://doi.org/10.1162/neco.1991.3.4.461

    Article  Google Scholar 

  25. Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22:400–407

    Article  MathSciNet  MATH  Google Scholar 

  26. Rump SM (1999) INTLAB - INTerval LABoratory. In: Csendes T (ed) Developments in reliable computing. Kluwer Academic, Dordrecht, Netherlands, pp 77–104

    Chapter  Google Scholar 

  27. Saad EW, Wunsch DC II (2007) Neural network explanation using inversion. Neural Netw 20(1):78–93

    Article  MATH  Google Scholar 

  28. Theodoridis S, Pikrakis A, Koutroumbas K, Kavouras D (2010) Introduction to pattern recognition: a MATLAB approach. Academic Press, Burlington, MA 01803, USA

  29. Thrun SB (1993) Extracting provably correct rules from artificial neural networks. Technical Report IAI–TR–93–5, Institut fur Informatik III, Bonn, Germany

  30. Tornil-Sin S, Puig V, Escobet T (2010) Set computations with subpavings in MATLAB: the SCS toolbox. In: 2010 IEEE international symposium on computer-aided control system design (CACSD), pp 1403–1408

  31. Wolpert DH (1990) A mathematical theory of generalization: part I. Complex Syst 4(2):151–200

    MATH  Google Scholar 

  32. Wolpert DH (1990) A mathematical theory of generalization: part II. Complex Syst 4(2):201–249

    MATH  Google Scholar 

  33. Wolpert DH (1992) On the connection between in-sample testing and generalization error. Complex Syst 6(1):47–94

    MathSciNet  MATH  Google Scholar 

  34. Wolpert DH (1996) The existence of a priori distinctions between learning algorithms. Neural Comput 8(7):1391–1420. https://doi.org/10.1162/neco.1996.8.7.1391

    Article  Google Scholar 

  35. Wolpert DH (1996) The lack of a priori distinctions between learning algorithms. Neural Comput 8(7):1341–1390. https://doi.org/10.1162/neco.1996.8.7.1341

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable suggestions and comments on earlier version of the manuscript that helped to significantly improve the paper at hand.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stavros P. Adam.

Ethics declarations

Conflict of interest:

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

In order to illustrate the impact of the \(\beta\)-cut on the domain of validity first let us consider the two-dimensional classification dataset with two classes forming nine groups shown in Fig.  6a. A \(2-10-1\) MLP, using logistic sigmoid activation functions, has been trained on this dataset, and the contour plot of its output is shown in Fig. 6b. In this Figure, the white regions (output greater than \(1-\beta\)) correspond to patterns classified by the MLP network into class 1 (red points), while the black regions (output lower than \(\beta\)) correspond to patterns classified into class 2 (blue points). Obviously, the gray level zone depicts the ambiguity of classification for patterns near the class boundaries and provides MLP output values in the interval \([\beta ,1-\beta ]\).

The impact of this \(\beta\)-cut classification decision is better depicted in Figs. 7 and 8. For each one of these Figures, the red colored area corresponds to a specific domain of validity defined for some specific interval \([1-\beta ,1]\) of the network output, for the MLP trained on the above two-dimensional problem. Each area is determined using SIVIA to invert the MLP output interval \([1-\beta ,1]\) for class 1 in the input space.

It is obvious that the value of \(\beta\) clearly extends or restricts the input space area classified by the MLP into class 1. This argument can be easily verified by simple observation of Fig. 7a, b, while for problems with a higher dimension this can be confirmed with the comparison of the volumes of the respective domains of validity. This shows the importance of choosing the right value for \(\beta\) which, here, needs to be 0.1 if one wants to take the right classification decision for a significant part of the input space. As shown in Fig. 8a, b, the appropriate value of \(\beta\) also depends on the number of training epochs and the error threshold that were used chosen to train the MLP.

Fig. 6
figure 6

The artificial dataset and the contour plot of an MLP trained on this dataset

Fig. 7
figure 7

The domain of validity for an MLP trained with 500 epochs and MSE \(\leqslant 1e-03\)

Fig. 8
figure 8

The domain of validity for an MLP trained with 5000 epochs and MSE \(\leqslant 1e-05\)

Appendix B

In the extreme case, of a pattern producing more than one valid outputs (i.e., it is assigned to more than one classes), the current implementation computing the domain of validity results in considering this pattern misclassified. Actually, for its proper class the pattern is correctly classified while for any other class for the other classes it is a misclassified pattern. A previous approach for determining the domain of validity considered such a pattern unclassified. However in terms of the proposed metrics both approaches compute the same result given that unclassified and misclassified patterns have the same status for the computed metrics.

Fig. 9
figure 9

Indicative examples of different domains of validity

In many cases a training algorithm results in either under-trained or over-trained networks. Under-training arises for many reasons (insufficient training, small sized training data, inappropriate network architecture, etc.). In consequence, as shown in Fig. 9b, the domain of validity covers either small regions of the input space or a large region is incorrectly classified. For instance, the validity domain of a 2–4–2 MLP, shown in Fig. 9a, exhibits a more regular coverage of the input space, while the 2–2–2 MLP, as shown in Fig. 9b manages to cover a narrow strip in the input space. In general, it can be stated the validity domain of an under-trained network is composed of a small number of large regions with regularly shaped boundaries.

Besides under-training, another issue affecting generalization is network over-training. Typically, an over-trained network fails to correctly classify unseen patterns as it has learned “exactly” the training data and hence it is not able to generalize well. In this case the decision boundaries computed by the network delimit as firmly as possible the regions in the input space and the network fails to interpolate even among close neighboring groups. In such a case the domain of validity consists of smaller regions and so its volume diminishes. An indicative example of such a validity domain is given in Fig. 9c. As a result we may state that for a well-trained network, the lower the volume of its domain of validity, the poorer the generalization achieved by the network due to over-training.

Another unfortunate result, when considering over-training is that MLPs, especially those with a high number of nodes in the hidden(s) layer(s), tend to fit outliers, noisy input patterns as well as patterns with noisy class labels, see Fig. 10. In these cases the network has the flexibility to form the decision boundaries that discriminate the outlying or misplaced patterns. Doing so, the network defines isolated regions, such as isles or lobes, in the input space which delimit not only these very patterns but also important parts of the input space for which there is no information about the class or the classes they belong to. In general, it can be stated that the validity domain of an over-trained network contains regions with small size and irregularly shaped boundaries.

Hence, the previous cases constitute different aspects of over-training that need to be taken into account when considering the volume of the domain of validity as a metric of the network’s generalization performance.

Fig. 10
figure 10

Example of over-training for a 2–20–2 MLP trained with noisy data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Adam, S.P., Likas, A.C. & Vrahatis, M.N. Evaluating generalization through interval-based neural network inversion. Neural Comput & Applic 31, 9241–9260 (2019). https://doi.org/10.1007/s00521-019-04129-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04129-5

Keywords

Navigation