skip to main content
10.1145/1244002.1244319acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Outlier elimination in construction of software metric models

Published:11 March 2007Publication History

ABSTRACT

Software metric models are models relating various software metrics of software projects. Such models' purpose is to predict some of these metrics for certain future projects given the other metrics for those projects. The construction of software metric models derives such relationships and is usually based on data samples of concerned software metrics for past software projects. Often, in such a data sample, there are inevitably a few very extreme projects which have relationships among their metrics deviating substantially from those among the metrics for the remaining "mainstream" bulk of projects in the data sample. Such "outlier" projects exert considerable undue influence on the derivation of the said relationships during model construction in that the relationships so derived cannot candidly reflect the true "mainstream" relationships. The direct consequence is degraded prediction accuracy of the constructed models for future projects. To overcome this problem, we proposed a methodology to identify and thus eliminate such outliers prior to model construction. Our methodology makes use of the least of median squares (LMS) regression to uncover such outliers and is applicable irrespective of any subsequent model construction approaches. We also did a case study to apply our methodology, and the results prove our methodology being able to improve the prediction accuracy of most models experimented with in the study. Thus, our methodology is recommended for any further software metric model construction. This paper documents such a methodology and the successful case study.

References

  1. A. R. Gray and S. G. MacDonell. "A Comparison of Techniques for Developing Predictive Models of Software Metrics." Information and Software Technology, vol. 39, pp. 425--437, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  2. Y. Miyazaki, M. Terakado, K. Ozaki, and H. Nozaki. "Robust Regression for Developing Software Estimation Models." J. Systems and Software, vol. 27, pp. 3--16, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Albrecht and J. Gaffney. "Software Function, Source Lines of Codes, and Development Effort Prediction." IEEE Trans. Software Eng., vol. 9, no. 6, pp. 639--648, Nov. 1983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Kitchenham and N. Taylor. "Software project Development Cost Estimation." J. System and Software, vol. 5, no. 4, pp.267--278, Nov. 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. L. Ramsey and V. R. Basili. "An Evaluation of Expert Systems for Software Engineering Management." IEEE Trans. Software Eng., vol. 15, no. 6, pp. 747--759, Jun. 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. W. Aha. "Case-Based Learning Algorithms." Proc. Defense Advanced Research Projects Agency Case-Based Reasoning Workshop, pp. 147--158, Washington, D.C., May 1991.Google ScholarGoogle Scholar
  7. S. Horikawa, T. Furnuhashi, and Y. Ucikawa. "On Fuzzy Modeling Using Fuzzy Neural Networks with the Back-Propagation Algorithm." IEEE Trans. Neural Networks, vol. 3, no. 5, pp. 801--806, Sept. 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Mukhopadhyay, S. S. Vicinanza, and M. J. Prietula. "Examining the Feasibility of a Case-Based Reasoning Model for Software Effort Estimation." MIS Quarterly: Management Information System, vol. 16, no. 2, pp.155--171, Jun. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. New York: Chapman & Hall, 1993.Google ScholarGoogle Scholar
  10. R. J. S. Jang. "ANFIS: Adaptive-Network-Based Fuzzy Inference System." IEEE Trans. Systems, Man, and Cybernectics, vol. 23, no. 3, pp. 665--685, May-Jun. 1993.Google ScholarGoogle ScholarCross RefCross Ref
  11. A. Lakhotia. "Rule-Based Approach to Computing Module Cohesion." Proc. 15th Int'l Conf. on Software Eng., pp. 35--44, Baltimore, Maryland, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Maston, B. Barrett, and J. Mellichamp. "Software Development Cost Estimation Using Function Points." IEEE Trans. Software Eng., vol. 20, no. 4, pp. 275--287, Apr. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Walston and C. Felix. "A Method of Programming Measurement and Estimation." IBM Systems J., vol. 16, no. 1, pp.54--73, 1977.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Briand, K. E. Emam, and I. Wieczorek. "Explaining the Cost of European Space and Military Projects." Proc. 21st Int'l Conf. on Software Eng., pp. 303--312, Los Angeles, California, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Myrtveit, E. Stensrud, and U. H. Olsson. "Assessing the Benefits of Imputing ERP projects with Missing Data." Proc. 7th IEEE Int'l Software Metrics Symp., pp. 78--84, London, UK, Apr. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. I. Myrtveit, E. Stensrud, and U. H. Olsson. "Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods." IEEE Trans. Software Eng., vol. 27, no. 11, pp. 999--1013, Nov. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Strike, K. E. Emam, and N. Madhavji. "Software Cost Estimation with Incomplete Data." IEEE Trans. Software Eng., vol. 27, no.10, pp. 890--908, Oct. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. C. Hamilton. Regression with Graphics: A Second Course in Applied Statistics. Belmont, CA: Wadsworth, 1992.Google ScholarGoogle Scholar
  19. P. J. Rousseeuw. "Least Median of Squares Regression." J. Am. Statistical Assoc., vol. 79, pp. 871--880, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  20. T. P. Hettmansperger and S. J. Sheather. "A Cautionary Note on the Method of Least Median Squares." The Am. Statistician, vol. 46, pp. 79--83, 1992.Google ScholarGoogle Scholar
  21. R. Jeffery, M. Ruhe, and I. Wieczorek. "Using Public Domain Metrics to Estimate Software Development Effort." Proc. 7th IEEE Int'l Software Metrics Symp., pp.16--27, London, UK, Apr. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. J. Milne and K. B. G. Luxford. Worldwide Software Development: the Benchmark, Release 5. Warrandyte, Australia: International Software Benchmarking Standards Group Limited, 1998.Google ScholarGoogle Scholar
  23. V. K. Y. Chan. "Software Effort Prediction Models Using Maximum Likelihood Methods Require Multivariate Normality of the Software Metrics Data Sample: Can Such a Sample Be Made Multivariate Normal?" Proc. 28th IEEE Int'l Computer Software and Applications Conf., pp. 274--279, Hong Kong, China, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. J. A. Little and D. B. Rubin. Statistical Analysis with Missing Data, 2nd ed. New York: Wiley, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. V. K. Y. Chan and W. E. Wong. "Optimizing and Simplifying Software Metric Models Constructed Using Maximum Likelihood Methods." Proc. 29th IEEE Int'l Computer Software and Applications Conf., pp. 65--70, Edinburgh, UK, Jul. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Conte, H. Dunsmore, and V. Y. Shen. Software Engineering Metrics and Models. Menlo Park, CS: Benjamin Cummings, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Outlier elimination in construction of software metric models

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SAC '07: Proceedings of the 2007 ACM symposium on Applied computing
            March 2007
            1688 pages
            ISBN:1595934804
            DOI:10.1145/1244002

            Copyright © 2007 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 11 March 2007

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate1,650of6,669submissions,25%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader