An application of Bayesian network for predicting object-oriented software maintainability

https://doi.org/10.1016/j.infsof.2005.03.002Get rights and content

Abstract

As the number of object-oriented software systems increases, it becomes more important for organizations to maintain those systems effectively. However, currently only a small number of maintainability prediction models are available for object-oriented systems. This paper presents a Bayesian network maintainability prediction model for an object-oriented software system. The model is constructed using object-oriented metric data in Li and Henry's datasets, which were collected from two different object-oriented systems. Prediction accuracy of the model is evaluated and compared with commonly used regression-based models. The results suggest that the Bayesian network model can predict maintainability more accurately than the regression-based models for one system, and almost as accurately as the best regression-based model for the other system.

Introduction

It is arguable that many object-oriented (OO) software systems are currently in use. It is also arguable that the growing popularity of OO programming languages, such as Java, as well as the increasing number of software development tools supporting the Unified Modelling Language (UML), encourages more OO systems to be developed at present and in the future. Hence it is important that those systems are maintained effectively and efficiently. A software maintainability prediction model enables organizations to predict maintainability of a software system and assists them with managing maintenance resource. In addition, if an accurate maintainability prediction model is available for a software system, a defensive design can be adopted. This would minimize, or at least reduce future maintenance effort of the system. Maintainability of a software system can be measured in different ways. In this paper, maintainability is measured as the number of changes made to the code during a maintenance period. Alternatively, maintainability may be measured as effort to make those changes. When maintainability is measured as effort, the predictive model is called a maintenance effort prediction model. It is unfortunate that the number of software maintainability prediction models including maintenance effort prediction models, is currently very small in the literature.

Programming an OO software system is different from programming a non-OO system due to the concepts that are specific to the OO paradigm, for example, objects, inheritance and encapsulation. This difference limits the applicability of well-known non-OO software effort prediction models, such as COCOMO [3], to OO software effort prediction, as well as non-OO software metrics, such as Function Points [1], to measuring the characteristics of OO software systems [23]. Hence a number of new software metrics were proposed specifically for OO systems. Some of those OO metrics were used to predict maintainability of OO systems. Examples of the OO metrics are Chidamber and Kemerer (C&K) metrics and Li and Henry (L&H) metrics [10], [25]. It was shown that the L&H metrics had a correlation with the number of changes made to the code of the OO software system [25]. It was also shown that multiple linear regression models consisting of the C&K, L&H and other OO metrics were able to predict software maintenance effort for some OO systems [17].

This paper constructs an OO software maintainability prediction model using a technique known as Bayesian network [14], [20], [22]. This technique allows a user to construct a predictive model based on Bayesian probability theory [12]. An application of Bayesian network to Software Engineering is currently limited to a small number of studies of development effort prediction [2], [11], [31], [34] and defect prediction [16], [28]. However, Bayesian network can also be a promising new technique for OO software maintainability prediction. This is due to the ability to explicitly represent uncertainty using probabilities, the ability to incorporate existing human expert's knowledge into empirical data, and the ability to update the model when new information becomes available. Hence this paper investigates a research problem of what prediction accuracy a Bayesian network OO software maintainability prediction model can achieve. The term prediction accuracy in this paper means how well a predictive model constructed using known data can predict the outcomes of unknown data. The Bayesian network model's prediction accuracy is evaluated using some accuracy measures, which are commonly found in the software effort prediction literature [15], [24]. Those measures are absolute residuals, the magnitude of relative error (MRE) and pred measures. Then, the Bayesian network model's prediction accuracy is compared with regression-based models, namely, a regression tree [4] model and two different types of multiple linear regression models.

The structure of the reminder of this paper is as follows. Section 2 describes the OO software datasets and the sampling method used. Section 3 describes the Bayesian network OO software maintainability prediction model. This is followed by Section 4, which describes the regression tree model and the multiple linear regression models. Section 5 describes the prediction accuracy measures used. Section 6 evaluates the Bayesian network model's prediction accuracy using those accuracy measures and compares it with the regression tree model and multiple linear regression models. Finally Section 7 presents conclusions and discussions about a direction of future studies.

Section snippets

Characteristics of datasets

This paper uses OO software datasets published by Li and Henry [25]. The datasets consist of five C&K metrics: DIT, NOC, RFC, LCOM and WMC, and four L&H metrics: MPC, DAC, NOM and SIZE2, as well as SIZE1, which is a traditional lines of code size metric. Those metric data were collected from a total of 110 classes in two OO software systems: User Interface Management System (UIMS) and Quality Evaluation System (QUES). The code was written in Classical-Ada™. The UIMS and QUES datasets contain 39

Bayesian network

A Bayesian network (also known as Bayes net, causal probabilistic network, Bayesian belief network, or simply belief network) is a directed acyclic graph (DAG) whose nodes represent events in a domain [22]. These events are connected with directed links, which represent an association or a causal relationship between them. When a link represents an association, the direction is defined according to the order of time in which the events happen, that is, the link starts from the preceding event.

Regression tree model

Regression tree is a tree-structured regression technique, which recursively partitions the data space of a given dataset with a number of regression surfaces, on each of which a constant estimate of the response variable is given according to a chosen regression method [4]. Fig. 2 shows an example regression tree. In Fig. 2, four sequential binary splits partition all cases in the dataset into five terminal nodes T1,…,T5, which are shown as five squares. Each terminal node consists of only the

Prediction accuracy measures

This paper evaluates and compares the OO software maintainability prediction models quantitatively, using the following prediction accuracy measures: absolute residual (Ab.Res.), the magnitude of relative error (MRE) and pred measures.

The Ab.Res. is the absolute value of residual given by:Ab.Res.=|actualvaluepredictedvalue|

In this paper, the sum of the absolute residuals (Sum Ab.Res.), the median of the absolute residuals (Med.Ab.Res.) and the standard deviation of the absolute residuals (SD

Results from UIMS dataset

Table 4 shows the values of the prediction accuracy measures achieved by each of the maintainability prediction models for the UIMS dataset. The values in this table are the mean of the values obtained from the 10 different test subsets, which were created using the sampling method described in Section 2.

Table 4 shows that the Bayesian network model has achieved the MMRE value of 0.972, the pred(0.25) value of 0.446 and the pred(0.30) value of 0.469. Although these values do not satisfy the

Conclusions

A Bayesian network OO software maintainability prediction model is constructed using the OO software metric data in Li and Henry datasets. The prediction accuracy of the model is evaluated and compared with the regression tree model and the multiple linear regression models using the prediction accuracy measures: the absolute residuals, MRE and pred measures. The results show that the Bayesian network model can predict maintainability of the OO software systems. For the UIMS dataset, the

Acknowledgements

The authors would like to acknowledge many valuable suggestions made by J. Harraway, Department of Mathematics and Statistics, University of Otago, New Zealand, with regard to the multiple linear regression models presented in this paper.

References (34)

  • L.C. Briand, K.E. Emam, D. Surmann, I. Wieczorek, K.D. Maxwell, An assessment and comparison of common software cost...
  • L.C. Briand, T. Langley, I. Wieczorek, A replicated assessment and comparison of common software cost estimation...
  • L.C. Briand, J. Wüst, The impact of design properties on development cost in object-oriented systems, in: Proceedings...
  • L.C. Briand et al.

    Modeling development effort in object-oriented systems using design properties

    IEEE Transactions on Software Engineering

    (2001)
  • W. Buntine

    A guide to the literature on learning probabilistic networks from data

    IEEE Transactions on Knowledge and Data Engineering

    (1996)
  • S.R. Chidamber et al.

    A metrics suite for object-oriented design

    IEEE Transactions on Software Engineering

    (1994)
  • S. Chulani et al.

    Bayesian analysis of empirical software engineering cost models

    IEEE Transactions on Software Engineering

    (1999)
  • Cited by (162)

    • Change impact analysis: A systematic mapping study

      2021, Journal of Systems and Software
    • Analysis of Bug Report Qualities with Fixing Time using a Bayesian Network

      2023, ACM International Conference Proceeding Series
    View all citing articles on Scopus
    View full text