ABSTRACT
Design patterns (DPs) provide reusable and general solutions for frequently encountered problems. Patterns are important to maintain the structure and quality of software products, in particular in large and distributed systems like automotive software. Modern language models (like Code2Vec or Word2Vec) indicate a deep understanding of programs, which has been shown to help in such tasks as program repair or program comprehension, and therefore show promise for DPR in industrial contexts. The models are trained in a self-supervised manner, using a large unlabelled code base, which allows them to quantify such abstract concepts as programming styles, coding guidelines, and, to some extent, the semantics of programs. This study demonstrates how two language models—Code2Vec and Word2Vec, trained on two public automotive repositories, can show the separation of programs containing specific DPs. The results show that the Code2Vec and Word2Vec produce average F1-scores of 0.781 and 0.690 on open-source Java programs, showing promise for DPR in practice.
- Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3, POPL (2019), 1–29. Google ScholarDigital Library
- Apostolos Ampatzoglou, Georgia Frantzeskou, and Ioannis Stamelos. 2012. A methodology to assess the impact of design patterns on software quality. Information and Software Technology, 54, 4 (2012), 331–346. Google ScholarDigital Library
- Giuliano Antoniol and Yann-Gaël Guéhéneuc. 2008. Demima: A multilayered approach for design pattern identification. IEEE Transactions on Software Engineering, 34, 5 (2008), 667–684. Google ScholarDigital Library
- Rhys Compton, Eibe Frank, Panos Patros, and Abigail Koay. 2020. Embedding java classes with code2vec: Improvements from variable obfuscation. In Proceedings of the 17th International Conference on Mining Software Repositories. 243–253. Google ScholarDigital Library
- Riccardo Coppola and Maurizio Morisio. 2016. Connected car: technologies, issues, future trends. ACM Computing Surveys (CSUR), 49, 3 (2016), 1–36. Google ScholarDigital Library
- Erich Gamma, Ralph Johnson, Richard Helm, Ralph E Johnson, and John Vlissides. 1995. Design patterns: elements of reusable object-oriented software. Pearson Deutschland GmbH. Google Scholar
- Kapilan Kulayan Arumugam Gandhi and Chamundeswari Arumugam. 2017. An approach for secure software update in Infotainment system. In Proceedings of the 10th Innovations in Software Engineering Conference. 127–131. Google ScholarDigital Library
- Yann-Gaël Guéhéneuc and Giuliano Antoniol. 2008. Demima: A multilayered approach for design pattern identification. IEEE transactions on software engineering, 34, 5 (2008), 667–684. Google Scholar
- Tae-Hwan Jung. 2021. Commitbert: Commit message generation using pre-trained programming language model. arXiv preprint arXiv:2105.14242. Google Scholar
- David J Ketchen and Christopher L Shook. 1996. The application of cluster analysis in strategic management research: an analysis and critique. Strategic management journal, 17, 6 (1996), 441–458. Google Scholar
- Chris McCormick. 2016. Word2vec tutorial-the skip-gram model. Apr-2016.[Online]. Available: http://mccormickml. com/2016/04/19/word2vec-tutorial-the-skip-gram-model. Google Scholar
- Alexander Mirnig, Tim Kaiser, Artur Lupp, Nicole Perterer, Alexander Meschtscherjakov, Thomas Grah, and Manfred Tscheligi. 2016. Automotive user experience design patterns: an approach and pattern examples. Int. J. Adv. Intell. Syst, 9 (2016), 275–286. Google Scholar
- Dana Movshovitz-Attias and William Cohen. 2013. Natural language models for predicting programming comments. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 35–40. Google Scholar
- Anh Tuan Nguyen and Tien N. Nguyen. 2015. Graph-Based Statistical Language Model for Code. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. 1, 858–868. https://doi.org/10.1109/ICSE.2015.336 Google ScholarCross Ref
- Murali Padmanabha, Daniel Kriesten, and Ulrich Heinkel. [n. d.]. System Design of a Modern Embedded Linux for In-Car Applications. Google Scholar
- Dhasarathy Parthasarathy, Cecilia Ekelin, Anjali Karri, Jiapeng Sun, and Panagiotis Moraitis. 2022. Measuring design compliance using neural language models: an automotive case study. In Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering. 12–21. Google ScholarDigital Library
- Christoph Rieger and Tim A Majchrzak. 2016. Weighted evaluation framework for cross-platform app development approaches. In Information Systems: Development, Research, Applications, Education: 9th SIGSAND/PLAIS EuroSymposium 2016, Gdansk, Poland, September 29, 2016, Proceedings 9. 18–39. Google Scholar
- Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised translation of programming languages. Advances in Neural Information Processing Systems, 33 (2020), 20601–20611. Google Scholar
- Hannes Thaller, Lukas Linsbauer, and Alexander Egyed. 2019. Feature maps: A comprehensible software representation for design pattern detection. In 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). 207–217. Google ScholarCross Ref
- Nikolaos Tsantalis, Alexander Chatzigeorgiou, George Stephanides, and Spyros T Halkidis. 2006. Design pattern detection using similarity scoring. IEEE transactions on software engineering, 32, 11 (2006), 896–909. Google ScholarDigital Library
- Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in software engineering. Springer Science & Business Media. Google ScholarCross Ref
- Renhao Xiong and Bixin Li. 2019. Accurate design pattern detection based on idiomatic implementation matching in java language context. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 163–174. Google ScholarCross Ref
- Marco Zanoni, Francesca Arcelli Fontana, and Fabio Stella. 2015. On applying machine learning techniques for design pattern detection. Journal of Systems and Software, 103 (2015), 102–117. Google ScholarDigital Library
Index Terms
- Comparing Word-Based and AST-Based Models for Design Pattern Recognition
Recommendations
A Memory-Based Lemmatizer for Ancient Greek
DATeCH2017: Proceedings of the 2nd International Conference on Digital Access to Textual Cultural HeritageIn this paper we present the lemmatizer that we developed for Ancient Greek: GLEM. As far as we know, GLEM is the first publicly available lemmatizer for Ancient Greek that uses POS information to disambiguate and that also assigns output to unseen ...
SVM Based Part of Speech Tagger for Malayalam
ITC '10: Proceedings of the 2010 International Conference on Recent Trends in Information, Telecommunication and ComputingThis paper presents the building of part-of-speech Tagger for Malayalam Language using Support Vector Machine (SVM). POS tagger plays an important role in Natural language applications like speech recognition, natural language parsing, information ...
Statistical Language Models of Lithuanian Based on Word Clustering and Morphological Decomposition
This paper describes our research on statistical language modeling of Lithuanian. The idea of improving sparse n-gram models of highly inflected Lithuanian language by interpolating them with complex n-gram models based on word clustering and morphological ...
Comments