Skip to main content
Log in

Automatic detection of code smells using metrics and CodeT5 embeddings: a case study in C#

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Code smells are poorly designed code structures indicating that the code may need to be refactored. Recognizing code smells in practice is complex, and researchers strive to develop automatic code smell detectors. An obstacle to developing these solutions is the datasets’ limitations. Manually labeled datasets were collected to investigate the developers’ perceptions of code smells. They are characterized by a high label disagreement that hurts the performance of Machine Learning (ML) models trained using them. Furthermore, all large, manually labeled datasets are developed for Java. We recently created a novel dataset for C# to alleviate these issues. This paper evaluates ML code smell detection approaches on our novel dataset. We consider two feature representations to train ML models: (1) code metrics and (2) CodeT5 embeddings. This study is the first to consider the CodeT5 state-of-the-art neural source code embedding for code smell detection in C#. To prove the effectiveness of ML, we consider multiple metrics-based heuristics as alternatives. In our experiments, the best-performing approach was the ML classifier trained on code metrics (F-measure of 0.87 for Long Method and 0.91 for Large Class detection). However, the performance improvement over CodeT5 features is negligible if we consider the advantages of automatically inferring features. Finally, our ML model surpassed less experienced annotators and nearly matched the most experienced annotator, suggesting it can assist less experienced developers under tight deadlines. To the best of our knowledge, this is the first study to compare the performance of automatic smell detectors against human performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

Our dataset is publicly available at https://github.com/Clean-CaDET/ML-code-smell-CSharp.

Code Availability

Our replication package is available at https://github.com/Clean-CaDET/ML-code-smell-CSharp.

Notes

  1. The CodeT5 model is pre-trained on eight programming languages: Python, Java, JavaScript, PHP, Ruby, Go, C, and C#.

  2. Our replication package is available at https://github.com/Clean-CaDET/ML-code-smell-CSharp.

  3. We use the implementations from the Scikit-learn [59], XGBoost [60], and CatBoost [61] libraries.

  4. Metrics implementations are available at https://github.com/Clean-CaDET/platform/blob/c4acff95ec00ff6c25fa62dde4818c1f40e39d39/CodeModel/CaDETModel/CodeItems/CaDETMetrics.cs and their extracted values can be found in our replication package https://github.com/Clean-CaDET/ML-code-smell-CSharp.

  5. Compton et al. [47] report that mean aggregation performed best on a set of source-code classification problems. Long Methods and Large Classes typically have many lines of code. Thus, a possible explanation for why we found that summation aggregation performed better on our task is that summing embedding vectors may have helped reflect the snippets’ code length in its representation.

References

  1. Fowler M (2018) Refactoring: improving the design of existing code, Addison-Wesley Professional,

  2. Sharma T, Spinellis D (2018) A survey on software smells. Journal of Systems and Software 138:158–173

    Article  Google Scholar 

  3. Khomh F, Di Penta M, Guéhéneuc Y, Antoniol G (2012) An exploratory study of the impact of antipatterns on class change-and fault-proneness. Empirical Software Engineering 17(3):243–275

    Article  Google Scholar 

  4. Martin R (2009) Clean code: a handbook of agile software craftsmanship, Pearson Education,

  5. Hozano M, Garcia A, Fonseca B, Costa E (2018) Are you smelling it? Investigating how similar developers detect code smells, Information and Software Technology 93:130–146

    Google Scholar 

  6. Azeem M, Palomba F, Shi L, Wang Q (2019) Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology 108:115–138

    Article  Google Scholar 

  7. Lewowski T, Madeyski L (2022) Code Smells Detection Using Artificial Intelligence Techniques: A Business-Driven Systematic Review, Developments in Information & Knowledge Management for Business Applications, 285-319

  8. Menshawy R, Yousef A, Salem A (2021) Code Smells and Detection Techniques: A Survey, in International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC),

  9. AbuHassan A, Alshayeb M, Ghouti L (2021) Software smell detection techniques: A systematic literature review. Journal of Software: Evolution and Process 33(3):2320

    Google Scholar 

  10. Kovačević A, Slivka J, Vidaković D, Grujić K, Luburić N, Prokić S, Sladić G (2022) Automatic detection of Long Method and God Class code smells through neural source code embeddings. Expert Systems with Applications 204:117607

    Article  Google Scholar 

  11. Madeyski L, Lewowski T (2020) MLCQ: Industry-relevant code smell data set,, in Proceedings of the Evaluation and Assessment in Software Engineering,

  12. Lewowski T, Madeyski L (2021) How far are we from reproducible research on code smell detection? A systematic literature review, Information and Software Technology, 106783,

  13. Slivka J, Luburić N, Prokić S, Grujić KG, Kovačević A, Sladić G, Vidaković D (2023) Towards a systematic approach to manual annotation of code smells. Science of Computer Programming 230:102999

    Article  Google Scholar 

  14. Tahir A, Dietrich J, Counsell S, Licorish S, Yamashita A (2020) A large scale study on how developers discuss code smells and anti-pattern in stack exchange sites. Information and Software Technology 125:106333

    Article  Google Scholar 

  15. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781,

  16. Kenton J, Toutanova L (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, in Proceedings of NAACL-HLT,

  17. Allamanis M, Barr E, Devanbu P, Sutton C (2018) A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51(4):1–37

    Article  Google Scholar 

  18. Wang Y, Wang W, Joty S, Hoi S (2021) CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation, in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,

  19. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu P (2020) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Journal of Machine Learning Research, 1-67,

  20. Lu S, Guo D, Ren S, Huang J, Svyatkovskiy A, Blanco A, Clement C, Drain D, Jiang D, Tang D, Li G (2021) CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation, arXiv preprint arXiv:2102.04664,

  21. Kanade A, Maniatis P, Balakrishnan G, Shi K (2020) Learning and evaluating contextual embedding of source code, in International Conference on Machine Learning PMLR,

  22. Sharma T, Efstathiou V, Louridas P, Spinellis D (2019) On the feasibility of transfer-learning code smells using deep learning, arXiv preprint arXiv:1904.03031,

  23. Sharma T, Mishra P, Tiwari R (2016) Designite: A software design quality assessment tool, in Proceedings of the 1st International Workshop on Bringing Architectural Design Thinking into Developers’ Daily Activities,

  24. Velioğlu S, Selçuk Y (2017) An automated code smell and anti-pattern detection approach, in 2017 IEEE 15th International Conference on Software Engineering Research, Management and Applications (SERA),

  25. Tahmid A, Tawhid M, Ahmed S, Sakib K (2017) Code sniffer: a risk based smell detection framework to enhance code quality using static code analysis, International Journal of Software Engineering, Technology and Applications, 2,(1), 41-63

  26. ReSharper: The Visual Studio Extension for .NET Developers by JetBrains, [Online]. Available: https://www.jetbrains.com/resharper/. [Accessed 16 12 2021]

  27. Improve your .NET code quality with NDepend, [Online]. Available: https://www.ndepend.com/. [Accessed 16 12 2021]

  28. SonarQube - Your teammate for Code Quality and Code Security, [Online]. Available: https://www.sonarqube.org/. [Accessed 07 03 2022]

  29. Brown W, Malveau R, McCormick H, Mowbray T (1998) AntiPatterns: refactoring software, architectures, and projects in crisis, John Wiley & Sons,

  30. Bafandeh Mayvan B, Rasoolzadegan A, Javan A (2020) Jafari, Bad smell detection using quality metrics and refactoring opportunities, Journal of Software: Evolution and Process, 32,(8), 2255,

  31. Sharma T, Kechagia M, Georgiou S, Tiwari R, Sarro F (2021) A Survey on Machine Learning Techniques for Source Code Analysis, arXiv preprint arXiv:2110.09610,

  32. Liu H, Xu Z, Zou Y (2018) Deep learning based feature envy detection, in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering,

  33. Liu H, Jin J, Xu Z, Bu Y, Zou Y, Zhang L (2019) Deep learning based code smell detection, IEEE transactions on Software Engineering,

  34. Hadj-Kacem M, Bouassida N (2019) Improving the Identification of Code Smells by Combining Structural and Semantic Information, in International Conference on Neural Information Processing,

  35. Palomba F, Di Nucci D, Tufano M, Bavota G, Oliveto R, Poshyvanyk D, De Lucia A (2015) Landfill: An open dataset of code smells with public evaluation, in 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories,

  36. Guo X, Shi C, and H. Jiang, (2019) Deep semantic-Based Feature Envy Identification, in Proceedings of the 11th Asia-Pacific Symposium on Internetware,

  37. Fontana F, Mäntylä M, Zanoni M, Marino A (2016) Comparing and experimenting machine learning techniques for code smell detection. Empirical Software Engineering 21(3):1143–1191

    Article  Google Scholar 

  38. Di Nucci D, Palomba F, Tamburri D, Serebrenik A, De Lucia A (2018) Detecting code smells using machine learning techniques: are we there yet?, in 2018 ieee 25th international conference on software analysis, evolution and reengineering (saner),

  39. Zhang Y, Dong C (2021) MARS: Detecting brain class/method code smell based on metric-attention mechanism and residual network, Journal of Software: Evolution and Process, e2403,

  40. Rasool G, Arshad Z (2017) A lightweight approach for detection of code smells. Arabian Journal for Science and Engineering 42(2):483–506

    Article  Google Scholar 

  41. Lemaitre G, Nogueira F, Aridas C (2017) Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine Learning Research 18:1–5

    Google Scholar 

  42. Liu H, Ma Z, Shao W, Niu Z (2011) Schedule of bad smell detection and resolution: A new way to save effort. IEEE transactions on Software Engineering 38(1):220–235

    Article  Google Scholar 

  43. Padilha J, Pereira J, Figueiredo E, Almeida J, Garcia A, Sant’Anna C On the effectiveness of concern metrics to detect code smells: An empirical study, in International Conference on Advanced Information Systems Engineering

  44. Prokić S, Grujić K, Luburić N, Slivka J, Kovačević A, Vidaković D, Sladić G Clean Code and Design Educational Tool,, in 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO)

  45. Alon U, Zilberstein M, Levy O, Yahav E (2019) code2vec: Learning distributed representations of code, in Proceedings of the ACM on Programming Languages, 3(POPL),

  46. Hussain Y, Huang Z, Zhou Y, Wang S (2020) Deep transfer learning for source code modeling. International Journal of Software Engineering and Knowledge Engineering 30(05):649–668

    Article  Google Scholar 

  47. Compton R, Frank E, Patros P, Koay A (2020) Embedding java classes with code2vec: Improvements from variable obfuscation, in Proceedings of the 17th International Conference on Mining Software Repositories,

  48. Pecorelli F, Di Nucci D, De Roover C, De Lucia A (2020) A large empirical assessment of the role of data balancing in machine-learning-based code smell detection. Journal of Systems and Software 169:110693

    Article  Google Scholar 

  49. Ng A Machine learning yearning: Technical strategy for ai engineers in the era of deep learning, 2019. [Online]. Available: https://www.mlyearning.org. [Accessed 21 10 2022]

  50. GitHub, Your AI pair programmer, [Online]. Available: https://copilot.github.com/. [Accessed 07 03 2022]

  51. Trifu A, Marinescu R (2005) Diagnosing design problems in object oriented systems, in 12th Working Conference on Reverse Engineering (WCRE’05),

  52. Macia I, Garcia J, Popescu D, Garcia A, Medvidovic N, von Staa A Are automatically-detected code anomalies relevant to architectural modularity? An exploratory analysis of evolving systems, in Proceedings of the 11th annual international conf

  53. Souza P, Sousa B, Ferreira K, Bigonha M (2017) Applying software metric thresholds for detection of bad smells, in Proceedings of the 11th Brazilian Symposium on Software Components, Architectures, and Reuse,

  54. Kiefer C, Bernstein A, Tappolet J (2007) Mining software repositories with isparol and a software evolution ontology, in Fourth International Workshop on Mining Software Repositories (MSR’07: ICSE Workshops 2007),

  55. Danphitsanuphan P, Suwantada T (2012) Code smell detecting tool and code smell-structure bug relationship, in 2012 Spring Congress on Engineering and Technology,

  56. Fard A, Mesbah A (2013) Jsnose: Detecting javascript code smells, in 2013 IEEE 13th international working conference on Source Code Analysis and Manipulation (SCAM),

  57. Moha N, Guéhéneuc Y, Duchien L, Le Meur A (2009) Decor: A method for the specification and detection of code and design smells. IEEE Transactions on Software Engineering 36(1):20–36

    Article  Google Scholar 

  58. Lerthathairat P, Prompoon N (2011) An approach for source code classification to enhance maintainability, in 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE),

  59. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J (2011) Scikit-learn: Machine learning in Python. Journal of machine Learning research 12:2825–2830

    MathSciNet  Google Scholar 

  60. Chen T, Guestrin C (2016) XGBoost: A Scalable Tree Boosting System, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,

  61. Prokhorenkova L, Gusev G, Vorobev A, Dorogush A, Gulin A (2018) CatBoost: unbiased boosting with categorical features, Advances in neural information processing systems, 31,

Download references

Acknowledgements

This research was supported by the Science Fund of the Republic of Serbia, Grant No 6521051, AI-Clean CaDET and the Ministry of Science, Technological Development and Innovation through project no. 451-03-47/2023-01/200156 “Innovative scientific and artistic research from the FTS (activity) domain”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jelena Slivka.

Ethics declarations

Conflict of interest

The authors report there are no competing interests to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1: Performance of heuristic-based detectors

Appendix 1: Performance of heuristic-based detectors

This section presents the performance existing metric-based heuristic detectors achieve on our novel C# dataset. We chose the metric-based heuristics following the same approach as in our paper [10], where we compared multiple heuristic-based detectors on the MLCQ dataset [11]. That is, we started from the list of available code smell detection heuristics neatly summarized in a recent Systematic Literature Review by Bafandeh Mayvan et al. [30] . We selected the heuristics for which we could find metrics definitions that aligned with the implementation of metrics in our platform [44].

Tables 9 and 10 present the metric-based heuristic’s performance on the Large Class and Long Method detection task, respectively. For easy comparison of the heuristic-based detectors on the test set, we summarize the performances from Tables 9 and 10 in Figs. 6 and 7, respectively.

Table 9 Performance of heuristic-based approaches for Large Class detection: P—precision, R—recall, F—F-measure of the minority (smell) class
Table 10 Performance of heuristic-based approaches for Long Method detection: P—precision, R—recall, F—F-measure of the minority (smell) class
Fig. 6
figure 6

Performance of heuristic-based approaches for Large Class detection on the test set. We sort the performances from left to right according to the F-measure, our primary performance metric

Fig. 7
figure 7

Performance of heuristic-based approaches for Long Method detection on the test set. We sort the performances from left to right according to the F-measure, our primary performance metric

The application of metric-based heuristics does not require model training. Thus, we report the performance on the training set, test set, and the whole dataset. By comparing performances on the test set and whole dataset, we can assure that the code snippets, randomly chosen for the test set, are not particularly “easy” or “hard” for heuristic-based detectors, compared to the rest of the dataset. We can observe from Tables 9 and 10 that this is not the case—the performance of the heuristic-based detectors on the test set is similar to their performance on the whole dataset.

In a realistic scenario, we would choose the best-performing heuristic according to its performance on the training set (we use the minority F-measure as our primary performance measure). Thus, the best-performing heuristic for Large Class detection is \(LC_9\) (Table 9), and the best-performing heuristic for Long Method detection is \(LM_3\) (Table 10). We compare these two best-performing heuristics with ML-based detectors in our paper. Interestingly, heuristics \(LC_9\) and \(LM_3\) proposed by Liu et al. [32] achieved the best performance on both our C# dataset and the MLCQ Java dataset, as we showed in [10].

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kovačević, A., Luburić, N., Slivka, J. et al. Automatic detection of code smells using metrics and CodeT5 embeddings: a case study in C#. Neural Comput & Applic 36, 9203–9220 (2024). https://doi.org/10.1007/s00521-024-09551-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-09551-y

Keywords

Navigation