ABSTRACT
Software metric models are models relating various software metrics of software projects. Such models' purpose is to predict some of these metrics for certain future projects given the other metrics for those projects. The construction of software metric models derives such relationships and is usually based on data samples of concerned software metrics for past software projects. Often, in such a data sample, there are inevitably a few very extreme projects which have relationships among their metrics deviating substantially from those among the metrics for the remaining "mainstream" bulk of projects in the data sample. Such "outlier" projects exert considerable undue influence on the derivation of the said relationships during model construction in that the relationships so derived cannot candidly reflect the true "mainstream" relationships. The direct consequence is degraded prediction accuracy of the constructed models for future projects. To overcome this problem, we proposed a methodology to identify and thus eliminate such outliers prior to model construction. Our methodology makes use of the least of median squares (LMS) regression to uncover such outliers and is applicable irrespective of any subsequent model construction approaches. We also did a case study to apply our methodology, and the results prove our methodology being able to improve the prediction accuracy of most models experimented with in the study. Thus, our methodology is recommended for any further software metric model construction. This paper documents such a methodology and the successful case study.
- A. R. Gray and S. G. MacDonell. "A Comparison of Techniques for Developing Predictive Models of Software Metrics." Information and Software Technology, vol. 39, pp. 425--437, 1997.Google ScholarCross Ref
- Y. Miyazaki, M. Terakado, K. Ozaki, and H. Nozaki. "Robust Regression for Developing Software Estimation Models." J. Systems and Software, vol. 27, pp. 3--16, 1994. Google ScholarDigital Library
- A. Albrecht and J. Gaffney. "Software Function, Source Lines of Codes, and Development Effort Prediction." IEEE Trans. Software Eng., vol. 9, no. 6, pp. 639--648, Nov. 1983.Google ScholarDigital Library
- B. Kitchenham and N. Taylor. "Software project Development Cost Estimation." J. System and Software, vol. 5, no. 4, pp.267--278, Nov. 1985. Google ScholarDigital Library
- C. L. Ramsey and V. R. Basili. "An Evaluation of Expert Systems for Software Engineering Management." IEEE Trans. Software Eng., vol. 15, no. 6, pp. 747--759, Jun. 1989. Google ScholarDigital Library
- D. W. Aha. "Case-Based Learning Algorithms." Proc. Defense Advanced Research Projects Agency Case-Based Reasoning Workshop, pp. 147--158, Washington, D.C., May 1991.Google Scholar
- S. Horikawa, T. Furnuhashi, and Y. Ucikawa. "On Fuzzy Modeling Using Fuzzy Neural Networks with the Back-Propagation Algorithm." IEEE Trans. Neural Networks, vol. 3, no. 5, pp. 801--806, Sept. 1992.Google ScholarDigital Library
- T. Mukhopadhyay, S. S. Vicinanza, and M. J. Prietula. "Examining the Feasibility of a Case-Based Reasoning Model for Software Effort Estimation." MIS Quarterly: Management Information System, vol. 16, no. 2, pp.155--171, Jun. 1992. Google ScholarDigital Library
- L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. New York: Chapman & Hall, 1993.Google Scholar
- R. J. S. Jang. "ANFIS: Adaptive-Network-Based Fuzzy Inference System." IEEE Trans. Systems, Man, and Cybernectics, vol. 23, no. 3, pp. 665--685, May-Jun. 1993.Google ScholarCross Ref
- A. Lakhotia. "Rule-Based Approach to Computing Module Cohesion." Proc. 15th Int'l Conf. on Software Eng., pp. 35--44, Baltimore, Maryland, May 1993. Google ScholarDigital Library
- J. Maston, B. Barrett, and J. Mellichamp. "Software Development Cost Estimation Using Function Points." IEEE Trans. Software Eng., vol. 20, no. 4, pp. 275--287, Apr. 1994. Google ScholarDigital Library
- C. Walston and C. Felix. "A Method of Programming Measurement and Estimation." IBM Systems J., vol. 16, no. 1, pp.54--73, 1977.Google ScholarDigital Library
- L. Briand, K. E. Emam, and I. Wieczorek. "Explaining the Cost of European Space and Military Projects." Proc. 21st Int'l Conf. on Software Eng., pp. 303--312, Los Angeles, California, May 1999. Google ScholarDigital Library
- I. Myrtveit, E. Stensrud, and U. H. Olsson. "Assessing the Benefits of Imputing ERP projects with Missing Data." Proc. 7th IEEE Int'l Software Metrics Symp., pp. 78--84, London, UK, Apr. 2001. Google ScholarDigital Library
- I. Myrtveit, E. Stensrud, and U. H. Olsson. "Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods." IEEE Trans. Software Eng., vol. 27, no. 11, pp. 999--1013, Nov. 2001. Google ScholarDigital Library
- K. Strike, K. E. Emam, and N. Madhavji. "Software Cost Estimation with Incomplete Data." IEEE Trans. Software Eng., vol. 27, no.10, pp. 890--908, Oct. 2001. Google ScholarDigital Library
- L. C. Hamilton. Regression with Graphics: A Second Course in Applied Statistics. Belmont, CA: Wadsworth, 1992.Google Scholar
- P. J. Rousseeuw. "Least Median of Squares Regression." J. Am. Statistical Assoc., vol. 79, pp. 871--880, 1984.Google ScholarCross Ref
- T. P. Hettmansperger and S. J. Sheather. "A Cautionary Note on the Method of Least Median Squares." The Am. Statistician, vol. 46, pp. 79--83, 1992.Google Scholar
- R. Jeffery, M. Ruhe, and I. Wieczorek. "Using Public Domain Metrics to Estimate Software Development Effort." Proc. 7th IEEE Int'l Software Metrics Symp., pp.16--27, London, UK, Apr. 2001. Google ScholarDigital Library
- B. J. Milne and K. B. G. Luxford. Worldwide Software Development: the Benchmark, Release 5. Warrandyte, Australia: International Software Benchmarking Standards Group Limited, 1998.Google Scholar
- V. K. Y. Chan. "Software Effort Prediction Models Using Maximum Likelihood Methods Require Multivariate Normality of the Software Metrics Data Sample: Can Such a Sample Be Made Multivariate Normal?" Proc. 28th IEEE Int'l Computer Software and Applications Conf., pp. 274--279, Hong Kong, China, Sept. 2004. Google ScholarDigital Library
- R. J. A. Little and D. B. Rubin. Statistical Analysis with Missing Data, 2nd ed. New York: Wiley, 2002. Google ScholarDigital Library
- V. K. Y. Chan and W. E. Wong. "Optimizing and Simplifying Software Metric Models Constructed Using Maximum Likelihood Methods." Proc. 29th IEEE Int'l Computer Software and Applications Conf., pp. 65--70, Edinburgh, UK, Jul. 2005. Google ScholarDigital Library
- S. Conte, H. Dunsmore, and V. Y. Shen. Software Engineering Metrics and Models. Menlo Park, CS: Benjamin Cummings, 1986. Google ScholarDigital Library
Index Terms
- Outlier elimination in construction of software metric models
Recommendations
Applying statistical methodology to optimize and simplify software metric models with missing data
SAC '06: Proceedings of the 2006 ACM symposium on Applied computingDuring the construction of a software metric model, the decision on whether a particular predictor metric should be included is most likely based on an intuitive or experience based assumption that the predictor metric has an impact on the target metric ...
Optimizing and Simplifying Software Metric Models Constructed Using Maximum Likelihood Methods
COMPSAC '05: Proceedings of the 29th Annual International Computer Software and Applications Conference - Volume 01A software metric model can be used to predict a target metric (e.g., the development work effort) for a future release of a software system based on the projectýs predictor metrics (e.g., the project team size). However, missing or incomplete data ...
Toward a Software Testing and Reliability Early Warning Metric Suite
ICSE '04: Proceedings of the 26th International Conference on Software EngineeringThe field reliability is measured too late for affordablyguiding corrective action to improve the quality of thesoftware. Software developers can benefit from an earlywarning of their reliability while they can still affordablyreact. This early warning ...
Comments