A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models

https://doi.org/10.1016/j.eswa.2005.01.006Get rights and content

Abstract

Numerous articles comparing performances of statistical and Neural Networks (NNs) models are available in the literature, however, very few involved Classification and Regression Tree (CART) models in their comparative studies. We perform a three-way comparison of prediction accuracy involving nonlinear regression, NNs and CART models using a continuous dependent variable and a set of dichotomous and categorical predictor variables. A large dataset on smokers is used to run these models. Different prediction accuracy measuring procedures are used to compare performances of these models. The outcomes of predictions are discussed and the outcomes of this research are compared with the results of similar studies.

Introduction

Classical statistical methods have been applied in industry for years. Recently, Neural Networks (NNs) methods have become tools of choice for a wide variety of applications across many disciplines. It has been recognized in the literature that regression and neural network methods have become competing model-building methods (Smith & Mason, 1997). For a large class of pattern-recognition processes, NNs is the preferred technique (Setyawati, Sahirman, & Creese, 2002). NNs methods have also been used in the areas of prediction and classification (Warner & Misra, 1996).

Since NNs was developed as generalizations of mathematical models of human cognition through biological neurons, it is regarded as an information processing system that has certain performance characteristics in common with human neural biology. The characteristics include ability for storing knowledge and making it available for use whenever necessary, propensity to identify patterns, even in the presence of noise, aptitude for taking past experiences into consideration and make inferences and judgments about new situations.

Statistical methods such as regression analysis, multivariate analysis, Bayesian theory, pattern recognition and least square approximation models have been applied to a wide range of decisions in many disciplines (Buntine & Weigend, 1991). These models are attractive to decision makers because of their established methodology, long history of application, availability of software and deep-rooted acceptance among practitioners and academicians alike. NNs are data dependent and therefore, their performance improves with sample size. Statistical methods, such as Regression perform better for extremely small sample size, and also when theory or experience indicates an underlying relationship between dependent and predictor variables (Warner & Misra, 1996). Classification and Regression Tree (CART) models use tree-building algorithms, which are a set of if-then (split) conditions that permit prediction or classification of cases. A CART model that predicts the value of continuous variables from a set of continuous and/or categorical predictor variables is referred as regression-type model. For the prediction of the value of categorical variable from a set of continuous and/or categorical predictor variables, classification-type CART model is used. One noticeable advantage of decision tree based models, such as CART, is that the decision tree based models are scalable to large problems and can handle smaller data set than NNs models (Marcham, Mathieu, & Wray, 2000).

Despite the apparent substantive and applied advantages of statistical models, Neural Networks (NNs) methods have also gained popularity in recent years (Ripley, 1994). These methods are particularly valuable when the functional relationship between independent and dependent variables are unknown and there are ample training and test data available for the process. NNs models also have high tolerance for noise in the data and complexity. Moreover, the software technologies, such as, SPSS-Clementine, SAS-Enterprise Minor and Brain Maker that deploy neural networks algorithm have become extremely sophisticated and user-friendly in recent years.

Our research objective was to compare the predictive ability of multiple regression, NNs method and CART model using a set of data on smokers that include mostly categorical variables. Comparison of predictive abilities of statistical and NNs models are plentiful in the literature. It is also widely recognized that the effectiveness of any model is largely dependent on the characteristics of data used to fit the model. Goss and Vozikis (2002) compared NNs methods with Binary Logit Regression (BLR) and concluded that NNs model's prediction accuracy was better than that of BLR model. Shang, Lin, and Goetz (2000) also concluded similarly. Feng and Wang (2002) compared nonlinear regression with NNs methods in reverse engineering application using all non-categorical variables in their study. Both models provided comparably satisfactory prediction, however, the regression model produced a slightly better performance in model construction and model verification. Brown, Corruble, and Pittard (1993) show that NNs do better than CART models on multimodal classification problems where data sets are large with few attributes. The authors also concluded that the CART model did better than the NNs model with smaller data sets and with large numbers of irrelevant attributes. For non-linear data sets, NNs and CART models outperform linear discriminant analysis (Curram & Mingers, 1994). In our research, a three-way comparison involving nonlinear regression, NNs and CART models is performed. The prediction errors of these three models are compared where the dependent variable is continuous and predictor variables are all categorical.

The rest of the paper is organized as follows: The Section 2 provides literature review on comparative analysis of NNs and statistical models. Section 3 provides a brief description and organization of data and the research model. Section 4 provides a brief discussion on NNs, regression and CART models and presents test hypotheses. In Section 5, we examine results of these three models and provide analysis. Based on the analysis in Section 5, conclusions are drawn and presented in Section 6.

Section snippets

Classical statistical tools

Some of the widely used traditional Statistical tools applied for prediction and diagnosis in many disciplines are Discriminant analysis (Press and Wilson, 1978, Flury and Riedwyl, 1990), Logistic regression (Press and Wilson, 1978, Hosmer and Lemeshow, 1989, Studenmund, 1992), Bayesian approach (Duda and Hart, 1973, Buntine and Weigend, 1991), and Multiple Regression (Snedecor and Cochran, 1980, Neter et al., 1985, Myers, 1990, Menard, 1993). These models have been proven to be very effective,

Organization of data

In this study we have used a set of data on the smoking habits of people. The data set contained 35 variables and 3652 records. Among 35 available variables, initially we choose 10 variables considered to be most intuitively related to illness, and ran a correlation analysis. Based on the results of correlation analysis, the following variables (presented in Table 1), considered to be significant contributor towards the prediction of the dependent variable (Days in bed due to Illness), are

Neural network model

We choose to use NNs method because it handles nonlinearity associated with the data well. NNs methods imitate the structure of biological neural network. Processing elements (PEs) are the neurons in a Neural Network. Each neuron receives one or more inputs, processes those inputs, and generates a single output. Main components of information processing in the Neural Networks are: Inputs, Weights, Summation Function (weighted average of all input data going into a processing element (PE),

Regression

A stepwise regression procedure was conducted using SPSS. In the process, some of the variables and nonlinear interactions were thrown away by the procedure due to lack of significant contributions towards the prediction of the value of the dependent variable, Y. Multicollinearity among independent variables was also a factor in the final selection of the model. The final nonlinear regression model is as follows:Y=1.474+3.536X2+5.856X4āˆ’1.734X1X2+1.505X1X3āˆ’2.563X2X3āˆ’3.438X3X4

The following table

Conclusion

In this research we perform a three-way comparison of prediction accuracy involving nonlinear regression, NNs and CART models. The prediction errors of these three models are compared where the dependent variable is continuous and predictor variables are all categorical. As mentioned before, many comparative studies have been done in the past, however, very few involved CART model in their studies.

NNs and CART models, in our study, produced better prediction accuracy than non-linear regression

References (81)

  • E.P. Zhou et al.

    Improving error compensation via a fuzzy-neural hybrid model

    Journal of Manufacturing Systems

    (1999)
  • A. Ainslie et al.

    Data-mining and choice classic models/neural networks

    Decisions Marketing

    (1996)
  • W.G. Baxt

    Use of an artificial neural network for data analysis in clinical decision-making: The diagnosis of acute coronary occlusion

    Neural Computation

    (1990)
  • L. Breiman

    Stacked regressions

    Machine Learning

    (1996)
  • L. Breiman et al.

    Classification and regression Trees, Wadsworth, Belmont, CA

    (1984)
  • W.L. Buntine et al.

    Bayesian Back-propagation

    Complex Systems

    (1991)
  • H.A. Chipman et al.

    Bayesian CART model search

    Journal of the American Statistical Association

    (1998)
  • D.W. Coit et al.

    Static neural network process models: Considerations and case studies

    International Journal of Production Research

    (1998)
  • S.P. Curram et al.

    Neural networks, decision tree induction and discriminant analysis: An empirical comparison

    Journal of the Operational Research Society

    (1994)
  • R.O. Duda et al.

    Pattern classification and scene analysis

    (1973)
  • L. Fausett

    Fundamentals of neural networks: Architecture, algorithms and applications

    (1994)
  • C.-X. Feng et al.

    An experimental study of the effect of dizitizing parameters on ditizing uncertainty with a CMM

    International Journal of Production Research

    (2002)
  • C.-X. Feng et al.

    Digitizing uncertainty modeling for reverse engineering applications: Regression versus neural networks

    Journal of Intelligent Manufacturing

    (2002)
  • B. Flury et al.

    Multivariate statistics: A practical approach

    (1990)
  • J.A. Freeman

    Simulating neural networks with mathematica

    (1994)
  • H. Fujita et al.

    Application of artificial neural network to computer-aided diagnosis of coronary artery disease in myocardial spect Bull's-eye images

    Journal of Nuclear Medicine

    (1992)
  • E.P. Goss et al.

    Improving health care organizational management through neural network learning

    Health Care Management Science

    (2002)
  • R. Groth

    Data mining: Building competitive advantage

    (2000)
  • J. Hertz et al.

    Introduction to the theory of neural computation, Santa Fe Institute Studies in the Sciences of Complexity (vol. 1)

    (1991)
  • D.W. Hosmer et al.

    Applied logistic regression

    (1989)
  • R.D. Hurrion

    An example of simulation optimization using a neural network metamodel: Finding the optimal number of Kanbans in manufacturing system

    Journal of Operational Research Society

    (1992)
  • Hutchinson, J.M. (1994). A Radial Basis Function Approach to Financial Time Series Analysis, PhD dissertation,...
  • S. Kaparthi et al.

    Performance of selected part-machine grouping techniques for data sets of wide ranging sizes and imperfections

    Decision Sciences

    (1994)
  • T. Kimoto et al.

    Stock market predictions with modular neural networks

  • A. Kumar et al.

    An empirical comparison of neural networks and logistic regression models

    Marketing Letters

    (1995)
  • Larkey, L., Croft, B. (1996). Combining classifiers in text categorization. Proceedings of SIGIR-96, 19th ACM...
  • J. Lawrence

    Introduction to neural networks: Design, theory, and applications

    (1994)
  • M. LeBlanc et al.

    Combining estimates in regression and classification

    Journal of the American Statistical Association

    (1996)
  • T.H. Lee et al.

    Forecasting creditworthiness: Logistic vs. artificial neural net

    The Journal of Business Forecasting Methods and Systems

    (2000)
  • T.P. Liang et al.

    Integrating neural networks and semi-Markov processes for automated knowledge acquisition: An application to real time scheduling

    Decision Sciences

    (1992)
  • Cited by (278)

    View all citing articles on Scopus
    View full text