Article

Outlier elimination in construction of software metric models

Authors:
Victor K. Y. Chan

Macao Polytechnic Institute, Rua de Luis Gonzaga Gomes, Macau

Macao Polytechnic Institute, Rua de Luis Gonzaga Gomes, Macau
View Profile

,
W. Eric Wong

University of Texas at Dallas, Richardson TX

University of Texas at Dallas, Richardson TX
View Profile

SAC '07: Proceedings of the 2007 ACM symposium on Applied computingMarch 2007Pages 1484–1488https://doi.org/10.1145/1244002.1244319

Published:11 March 2007Publication History

SAC '07: Proceedings of the 2007 ACM symposium on Applied computing

Pages 1484–1488

ABSTRACT

Software metric models are models relating various software metrics of software projects. Such models' purpose is to predict some of these metrics for certain future projects given the other metrics for those projects. The construction of software metric models derives such relationships and is usually based on data samples of concerned software metrics for past software projects. Often, in such a data sample, there are inevitably a few very extreme projects which have relationships among their metrics deviating substantially from those among the metrics for the remaining "mainstream" bulk of projects in the data sample. Such "outlier" projects exert considerable undue influence on the derivation of the said relationships during model construction in that the relationships so derived cannot candidly reflect the true "mainstream" relationships. The direct consequence is degraded prediction accuracy of the constructed models for future projects. To overcome this problem, we proposed a methodology to identify and thus eliminate such outliers prior to model construction. Our methodology makes use of the least of median squares (LMS) regression to uncover such outliers and is applicable irrespective of any subsequent model construction approaches. We also did a case study to apply our methodology, and the results prove our methodology being able to improve the prediction accuracy of most models experimented with in the study. Thus, our methodology is recommended for any further software metric model construction. This paper documents such a methodology and the successful case study.

References

A. R. Gray and S. G. MacDonell. "A Comparison of Techniques for Developing Predictive Models of Software Metrics." Information and Software Technology, vol. 39, pp. 425--437, 1997.Google ScholarCross Ref
Y. Miyazaki, M. Terakado, K. Ozaki, and H. Nozaki. "Robust Regression for Developing Software Estimation Models." J. Systems and Software, vol. 27, pp. 3--16, 1994. Google ScholarDigital Library
A. Albrecht and J. Gaffney. "Software Function, Source Lines of Codes, and Development Effort Prediction." IEEE Trans. Software Eng., vol. 9, no. 6, pp. 639--648, Nov. 1983.Google ScholarDigital Library
B. Kitchenham and N. Taylor. "Software project Development Cost Estimation." J. System and Software, vol. 5, no. 4, pp.267--278, Nov. 1985. Google ScholarDigital Library
C. L. Ramsey and V. R. Basili. "An Evaluation of Expert Systems for Software Engineering Management." IEEE Trans. Software Eng., vol. 15, no. 6, pp. 747--759, Jun. 1989. Google ScholarDigital Library
D. W. Aha. "Case-Based Learning Algorithms." Proc. Defense Advanced Research Projects Agency Case-Based Reasoning Workshop, pp. 147--158, Washington, D.C., May 1991.Google Scholar
S. Horikawa, T. Furnuhashi, and Y. Ucikawa. "On Fuzzy Modeling Using Fuzzy Neural Networks with the Back-Propagation Algorithm." IEEE Trans. Neural Networks, vol. 3, no. 5, pp. 801--806, Sept. 1992.Google ScholarDigital Library
T. Mukhopadhyay, S. S. Vicinanza, and M. J. Prietula. "Examining the Feasibility of a Case-Based Reasoning Model for Software Effort Estimation." MIS Quarterly: Management Information System, vol. 16, no. 2, pp.155--171, Jun. 1992. Google ScholarDigital Library
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. New York: Chapman & Hall, 1993.Google Scholar
R. J. S. Jang. "ANFIS: Adaptive-Network-Based Fuzzy Inference System." IEEE Trans. Systems, Man, and Cybernectics, vol. 23, no. 3, pp. 665--685, May-Jun. 1993.Google ScholarCross Ref
A. Lakhotia. "Rule-Based Approach to Computing Module Cohesion." Proc. 15th Int'l Conf. on Software Eng., pp. 35--44, Baltimore, Maryland, May 1993. Google ScholarDigital Library
J. Maston, B. Barrett, and J. Mellichamp. "Software Development Cost Estimation Using Function Points." IEEE Trans. Software Eng., vol. 20, no. 4, pp. 275--287, Apr. 1994. Google ScholarDigital Library
C. Walston and C. Felix. "A Method of Programming Measurement and Estimation." IBM Systems J., vol. 16, no. 1, pp.54--73, 1977.Google ScholarDigital Library
L. Briand, K. E. Emam, and I. Wieczorek. "Explaining the Cost of European Space and Military Projects." Proc. 21st Int'l Conf. on Software Eng., pp. 303--312, Los Angeles, California, May 1999. Google ScholarDigital Library
I. Myrtveit, E. Stensrud, and U. H. Olsson. "Assessing the Benefits of Imputing ERP projects with Missing Data." Proc. 7th IEEE Int'l Software Metrics Symp., pp. 78--84, London, UK, Apr. 2001. Google ScholarDigital Library
I. Myrtveit, E. Stensrud, and U. H. Olsson. "Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods." IEEE Trans. Software Eng., vol. 27, no. 11, pp. 999--1013, Nov. 2001. Google ScholarDigital Library
K. Strike, K. E. Emam, and N. Madhavji. "Software Cost Estimation with Incomplete Data." IEEE Trans. Software Eng., vol. 27, no.10, pp. 890--908, Oct. 2001. Google ScholarDigital Library
L. C. Hamilton. Regression with Graphics: A Second Course in Applied Statistics. Belmont, CA: Wadsworth, 1992.Google Scholar
P. J. Rousseeuw. "Least Median of Squares Regression." J. Am. Statistical Assoc., vol. 79, pp. 871--880, 1984.Google ScholarCross Ref
T. P. Hettmansperger and S. J. Sheather. "A Cautionary Note on the Method of Least Median Squares." The Am. Statistician, vol. 46, pp. 79--83, 1992.Google Scholar
R. Jeffery, M. Ruhe, and I. Wieczorek. "Using Public Domain Metrics to Estimate Software Development Effort." Proc. 7th IEEE Int'l Software Metrics Symp., pp.16--27, London, UK, Apr. 2001. Google ScholarDigital Library
B. J. Milne and K. B. G. Luxford. Worldwide Software Development: the Benchmark, Release 5. Warrandyte, Australia: International Software Benchmarking Standards Group Limited, 1998.Google Scholar
V. K. Y. Chan. "Software Effort Prediction Models Using Maximum Likelihood Methods Require Multivariate Normality of the Software Metrics Data Sample: Can Such a Sample Be Made Multivariate Normal?" Proc. 28th IEEE Int'l Computer Software and Applications Conf., pp. 274--279, Hong Kong, China, Sept. 2004. Google ScholarDigital Library
R. J. A. Little and D. B. Rubin. Statistical Analysis with Missing Data, 2nd ed. New York: Wiley, 2002. Google ScholarDigital Library
V. K. Y. Chan and W. E. Wong. "Optimizing and Simplifying Software Metric Models Constructed Using Maximum Likelihood Methods." Proc. 29th IEEE Int'l Computer Software and Applications Conf., pp. 65--70, Edinburgh, UK, Jul. 2005. Google ScholarDigital Library
S. Conte, H. Dunsmore, and V. Y. Shen. Software Engineering Metrics and Models. Menlo Park, CS: Benjamin Cummings, 1986. Google ScholarDigital Library

Index Terms

Outlier elimination in construction of software metric models
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Factorization methods
        Canonical correlation analysis
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Applying statistical methodology to optimize and simplify software metric models with missing data
SAC '06: Proceedings of the 2006 ACM symposium on Applied computing

During the construction of a software metric model, the decision on whether a particular predictor metric should be included is most likely based on an intuitive or experience based assumption that the predictor metric has an impact on the target metric ...
Read More
Optimizing and Simplifying Software Metric Models Constructed Using Maximum Likelihood Methods
COMPSAC '05: Proceedings of the 29th Annual International Computer Software and Applications Conference - Volume 01

A software metric model can be used to predict a target metric (e.g., the development work effort) for a future release of a software system based on the projectýs predictor metrics (e.g., the project team size). However, missing or incomplete data ...
Read More
Toward a Software Testing and Reliability Early Warning Metric Suite
ICSE '04: Proceedings of the 26th International Conference on Software Engineering

The field reliability is measured too late for affordablyguiding corrective action to improve the quality of thesoftware. Software developers can benefit from an earlywarning of their reliability while they can still affordablyreact. This early warning ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '07: Proceedings of the 2007 ACM symposium on Applied computing
March 2007
1688 pages
ISBN:1595934804
DOI:10.1145/1244002
Conference Chairs:
Yookun Cho
Seoul National University, Seoul, Korea
,
Roger L. Wainwright
University of Tulsa, Tulsa, Oklahoma
,
Hisham M. Haddad
Kennesaw State University, Kennesaw, Georgia
,
Sung Y. Shin
South Dakota State University, Brookings, South Dakota
,
Program Chair:
Yong Wan Koo
The University of Suwon, Gyeongggi-do, Korea
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 March 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Least of Median Squares (LMS)
models
outliers
software metrics
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 407
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Outlier elimination in construction of software metric models

SAC '07: Proceedings of the 2007 ACM symposium on Applied computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Applying statistical methodology to optimize and simplify software metric models with missing data

Optimizing and Simplifying Software Metric Models Constructed Using Maximum Likelihood Methods

Toward a Software Testing and Reliability Early Warning Metric Suite

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Outlier elimination in construction of software metric models

SAC '07: Proceedings of the 2007 ACM symposium on Applied computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Applying statistical methodology to optimize and simplify software metric models with missing data

Optimizing and Simplifying Software Metric Models Constructed Using Maximum Likelihood Methods

Toward a Software Testing and Reliability Early Warning Metric Suite

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media