skip to main content
10.1145/3617555.3617873acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open Access

Comparing Word-Based and AST-Based Models for Design Pattern Recognition

Authors Info & Claims
Published:08 December 2023Publication History

ABSTRACT

Design patterns (DPs) provide reusable and general solutions for frequently encountered problems. Patterns are important to maintain the structure and quality of software products, in particular in large and distributed systems like automotive software. Modern language models (like Code2Vec or Word2Vec) indicate a deep understanding of programs, which has been shown to help in such tasks as program repair or program comprehension, and therefore show promise for DPR in industrial contexts. The models are trained in a self-supervised manner, using a large unlabelled code base, which allows them to quantify such abstract concepts as programming styles, coding guidelines, and, to some extent, the semantics of programs. This study demonstrates how two language models—Code2Vec and Word2Vec, trained on two public automotive repositories, can show the separation of programs containing specific DPs. The results show that the Code2Vec and Word2Vec produce average F1-scores of 0.781 and 0.690 on open-source Java programs, showing promise for DPR in practice.

References

  1. Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3, POPL (2019), 1–29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apostolos Ampatzoglou, Georgia Frantzeskou, and Ioannis Stamelos. 2012. A methodology to assess the impact of design patterns on software quality. Information and Software Technology, 54, 4 (2012), 331–346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Giuliano Antoniol and Yann-Gaël Guéhéneuc. 2008. Demima: A multilayered approach for design pattern identification. IEEE Transactions on Software Engineering, 34, 5 (2008), 667–684. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rhys Compton, Eibe Frank, Panos Patros, and Abigail Koay. 2020. Embedding java classes with code2vec: Improvements from variable obfuscation. In Proceedings of the 17th International Conference on Mining Software Repositories. 243–253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Riccardo Coppola and Maurizio Morisio. 2016. Connected car: technologies, issues, future trends. ACM Computing Surveys (CSUR), 49, 3 (2016), 1–36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Erich Gamma, Ralph Johnson, Richard Helm, Ralph E Johnson, and John Vlissides. 1995. Design patterns: elements of reusable object-oriented software. Pearson Deutschland GmbH. Google ScholarGoogle Scholar
  7. Kapilan Kulayan Arumugam Gandhi and Chamundeswari Arumugam. 2017. An approach for secure software update in Infotainment system. In Proceedings of the 10th Innovations in Software Engineering Conference. 127–131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yann-Gaël Guéhéneuc and Giuliano Antoniol. 2008. Demima: A multilayered approach for design pattern identification. IEEE transactions on software engineering, 34, 5 (2008), 667–684. Google ScholarGoogle Scholar
  9. Tae-Hwan Jung. 2021. Commitbert: Commit message generation using pre-trained programming language model. arXiv preprint arXiv:2105.14242. Google ScholarGoogle Scholar
  10. David J Ketchen and Christopher L Shook. 1996. The application of cluster analysis in strategic management research: an analysis and critique. Strategic management journal, 17, 6 (1996), 441–458. Google ScholarGoogle Scholar
  11. Chris McCormick. 2016. Word2vec tutorial-the skip-gram model. Apr-2016.[Online]. Available: http://mccormickml. com/2016/04/19/word2vec-tutorial-the-skip-gram-model. Google ScholarGoogle Scholar
  12. Alexander Mirnig, Tim Kaiser, Artur Lupp, Nicole Perterer, Alexander Meschtscherjakov, Thomas Grah, and Manfred Tscheligi. 2016. Automotive user experience design patterns: an approach and pattern examples. Int. J. Adv. Intell. Syst, 9 (2016), 275–286. Google ScholarGoogle Scholar
  13. Dana Movshovitz-Attias and William Cohen. 2013. Natural language models for predicting programming comments. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 35–40. Google ScholarGoogle Scholar
  14. Anh Tuan Nguyen and Tien N. Nguyen. 2015. Graph-Based Statistical Language Model for Code. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. 1, 858–868. https://doi.org/10.1109/ICSE.2015.336 Google ScholarGoogle ScholarCross RefCross Ref
  15. Murali Padmanabha, Daniel Kriesten, and Ulrich Heinkel. [n. d.]. System Design of a Modern Embedded Linux for In-Car Applications. Google ScholarGoogle Scholar
  16. Dhasarathy Parthasarathy, Cecilia Ekelin, Anjali Karri, Jiapeng Sun, and Panagiotis Moraitis. 2022. Measuring design compliance using neural language models: an automotive case study. In Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering. 12–21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Christoph Rieger and Tim A Majchrzak. 2016. Weighted evaluation framework for cross-platform app development approaches. In Information Systems: Development, Research, Applications, Education: 9th SIGSAND/PLAIS EuroSymposium 2016, Gdansk, Poland, September 29, 2016, Proceedings 9. 18–39. Google ScholarGoogle Scholar
  18. Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised translation of programming languages. Advances in Neural Information Processing Systems, 33 (2020), 20601–20611. Google ScholarGoogle Scholar
  19. Hannes Thaller, Lukas Linsbauer, and Alexander Egyed. 2019. Feature maps: A comprehensible software representation for design pattern detection. In 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER). 207–217. Google ScholarGoogle ScholarCross RefCross Ref
  20. Nikolaos Tsantalis, Alexander Chatzigeorgiou, George Stephanides, and Spyros T Halkidis. 2006. Design pattern detection using similarity scoring. IEEE transactions on software engineering, 32, 11 (2006), 896–909. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in software engineering. Springer Science & Business Media. Google ScholarGoogle ScholarCross RefCross Ref
  22. Renhao Xiong and Bixin Li. 2019. Accurate design pattern detection based on idiomatic implementation matching in java language context. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 163–174. Google ScholarGoogle ScholarCross RefCross Ref
  23. Marco Zanoni, Francesca Arcelli Fontana, and Fabio Stella. 2015. On applying machine learning techniques for design pattern detection. Journal of Systems and Software, 103 (2015), 102–117. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Comparing Word-Based and AST-Based Models for Design Pattern Recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PROMISE 2023: Proceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering
      December 2023
      68 pages
      ISBN:9798400703751
      DOI:10.1145/3617555

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 December 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate64of125submissions,51%

      Upcoming Conference

      ICSE 2025
    • Article Metrics

      • Downloads (Last 12 months)120
      • Downloads (Last 6 weeks)25

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader