Skip to main content

Advertisement

Log in

Smart contract generation model based on code annotation and AST-LSTM tuning

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With the wide application of smart contracts in many fields, the number, types, and complexity of smart contracts are showing a rapidly increasing trend. However, the development of smart contracts has its own unique programming language and security requirements, which are difficult for conventional software personnel to adapt quickly, and how to realize the efficient development of smart contracts according to the application requirements is an important issue that needs to be solved for its further development. The author proposes a smart contract generation method based on abstract syntax tree (AST) long short-term memory (LSTM) characterization and code annotation tuning large language model, which adopts the AST-LSTM model combining abstract syntax tree and tree-long short-term memory(Tree-LSTM) to vectorize the code as well as Sentence-BERT to vectorize the annotations and carry out a weighted analysis, and constructs a smart contract clustering analysis model to achieve accurate clustering of functionally similar smart contracts. Then, the AST-LSTM+Transformer model is used to detect defects in the clustered code and correlate the related annotation information to construct a diverse prompt feature prompt statement dataset. Finally, the Llama2-7B model is used as the basis for demand-specific smart contract generation with the help of Lora and P-Tuning v2 fine-tuning techniques. In this paper, we conducted comparative experiments with existing methods with the help of BLEU, an auxiliary tool for the quality assessment of bilingual translation, and Mythril, VaaS, and other code security detection tools. The results of the experiment show that the average value of BLEU of the code generated by this paper’s method is improved by about 25%, and the code security is improved by about 9%, which will greatly promote the rapid development and exploitation of smart contracts with high-security requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Ullah F, Al-Turjman F (2023) A conceptual framework for blockchain smart contract adoption to manage real estate deals in smart cities. Neural Comput Appl 35(7):5033–5054

    Article  Google Scholar 

  2. Vacca A, Di Sorbo A, Visaggio CA et al (2021) A systematic literature review of blockchain and smart contract development: Techniques, tools, and open challenges. J Syst Softw 174:110891

    Article  Google Scholar 

  3. Zou W, Lo D, Kochhar PS et al (2019) Smart contract development: challenges and opportunities. IEEE Trans Softw Eng 47(10):2084–2106

    Article  Google Scholar 

  4. Mao D, Wang F, Wang Y et al (2019) Visual and user-defined smart contract designing system based on automatic coding. IEEE Access 7:73131–73143

    Article  Google Scholar 

  5. Gao Yichen ZZZhao Bin (2020) Research and implementation of the automatic smart contract generation method for Ethereum. J East China Normal Univ (Natural Science Edition) 2020(5):21

    Google Scholar 

  6. Dwivedi V, Norta A (2022) Auto-generation of smart contracts from a domain-specific xml-based language. In: Intelligent Data Engineering and Analytics: Proceedings of the 9th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2021), Springer, pp 549–564

  7. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Advances in neural information processing systems 30

  8. Kocoń J, Cichecki I, Kaszyca O et al (2023) Chatgpt: Jack of all trades, master of none. Information Fusion, p 101861

  9. Liu Y, Han T, Ma S et al (2023) Summary of chatgpt-related research and perspective towards the future of large language models. Meta Radiol 1:100017

    Article  Google Scholar 

  10. Dakhel AM, Majdinasab V, Nikanjam A et al (2023) Github copilot AI pair programmer: Asset or liability? J Syst Softw 203:111734

    Article  Google Scholar 

  11. Roziere B, Gehring J, Gloeckle F et al (2023) arXiv preprint arXiv:2308.12950

  12. Petrović N, Al-Azzoni I (2023) Model-driven smart contract generation leveraging chatgpt. In: International Conference On Systems Engineering, Springer, pp 387–396

  13. Zhao J, Chen X, Yang G et al (2023) Automatic smart contract comment generation via large language models and in-context learning. arXiv preprint arXiv:2311.10388

  14. Majumdar S, Papdeja S, Das PP et al (2020) Comment-mine-a semantic search approach to program comprehension from code comments. Adv Comput Syst Secur 12:29–42

    Article  Google Scholar 

  15. Shinyama Y, Arahori Y, Gondow K (2018) Analyzing code comments to boost program comprehension. In: 2018 25th Asia-Pacific Software Engineering Conference (APSEC), IEEE, pp 325–334

  16. Zhang J, Wang X, Zhang H et al (2019) A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), IEEE, pp 783–794

  17. Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075

  18. Chu Y, Cao H, Diao Y et al (2023) Refined sbert: Representing sentence bert in manifold space. Neurocomputing 555:126453

    Article  Google Scholar 

  19. Chang Y, Wang X, Wang J et al (2023) A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109

  20. Thirunavukarasu AJ, Ting DSJ, Elangovan K et al (2023) Large language models in medicine. Nat Med 29(8):1930–1940

    Article  Google Scholar 

  21. Kasneci E, Seßler K, Küchemann S et al (2023) Chatgpt for good? On opportunities and challenges of large language models for education. Learn Individ Differ 103:102274

    Article  Google Scholar 

  22. Touvron H, Lavril T, Izacard G et al (2023a) Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971

  23. Touvron H, Martin L, Stone K et al (2023b) Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288

  24. Zeng A, Liu X, Du Z et al (2022) Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414

  25. Yang A, Xiao B, Wang B et al (2023) Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305

  26. Hu EJ, Shen Y, Wallis P et al (2021) Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685

  27. Dettmers T, Pagnoni A, Holtzman A et al (2023) Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314

  28. Li XL, Liang P (2021) Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190

  29. Liu X, Ji K, Fu Y et al (2022) P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, (Volume 2: Short Papers), pp 61–68

  30. Liu X, Ji K, Fu Y et al (2021) P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602

  31. Cui J, Li Z, Yan Y et al (2023) Chatlaw: Open-source legal large language model with integrated external knowledge bases. arXiv preprint arXiv:2306.16092

  32. Wang H, Liu C, Xi N et al (2023) Huatuo: tuning llama model with Chinese medical knowledge. arXiv preprint arXiv:2304.06975

  33. Almakhour M, Sliman L, Samhat AE et al (2020) Verification of smart contracts: a survey. Pervasive Mob Comput 67:101227

    Article  Google Scholar 

  34. Samreen NF, Alalfi MH (2021) A survey of security vulnerabilities in ethereum smart contracts. arXiv preprint arXiv:2105.06974

  35. Praitheeshan P, Pan L, Yu J et al (2019) Security analysis methods on ethereum smart contract vulnerabilities: a survey. arXiv preprint arXiv:1908.08605

  36. He D, Wu R, Li X et al (2023) Detection of vulnerabilities of blockchain smart contracts. IEEE Internet Things J 10:12178

    Article  Google Scholar 

  37. Xiang C, Zhanqi C, Zan MGW, Guang Y (2021) Summary of automated generation methods for code annotation. J Softw 32(7):24

    Google Scholar 

  38. Iqbal T, Qureshi S (2022) The survey: text generation models in deep learning. J King Saud Univ Comput Inf Sci 34(6):2515–2528

    Article  Google Scholar 

  39. Roziere B, Gehring J, Gloeckle F et al (2023) Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950

  40. Zheng Q, Xia X, Zou X et al (2023) Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x. arXiv preprint arXiv:2303.17568

  41. Li R, Allal LB, Zi Y et al (2023) Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161

  42. Feng H, Fu X, Sun H et al (2020) Efficient vulnerability detection based on abstract syntax tree and deep learning. In: IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), IEEE, pp 722–727

  43. Zhang T, Mouchère H, Viard-Gaudin C (2020) A tree-BLSTM-based recognition system for online handwritten mathematical expressions. Neural Comput Appl 32:4689–4708

    Article  Google Scholar 

  44. Yu W, Yi M, Huang X et al (2020) Make it directly: event extraction based on tree-LSTM and BI-GRU. IEEE Access 8:14344–14354

    Article  Google Scholar 

  45. Zhou X, Lu L (2020) Defect prediction via lstm based on sequence and tree structure. 2020 IEEE 20th International Conference on Software Quality. IEEE, Reliability and Security (QRS), pp 366–373

  46. Zhao M, Hamarneh G (2019) Tree-lstm: using lstm to encode memory in anatomical tree prediction from 3d images. In: International Workshop on Machine Learning in Medical Imaging, Springer, pp 637–645

  47. Harer J, Reale C, Chin P (2019) Tree-transformer: A transformer-based method for correction of tree-structured data. arXiv preprint arXiv:1908.00449

  48. Ahmed M, Seraj R, Islam SMS (2020) The k-means algorithm: a comprehensive survey and performance evaluation. Electronics 9(8):1295

    Article  Google Scholar 

  49. García J, Crawford B, Soto R et al (2018) A k-means binarization framework applied to multidimensional knapsack problem. Appl Intel 48:357–380

    Article  Google Scholar 

  50. Olukanmi P, Nelwamondo F, Marwala T (2022) k-means-mind: comparing seeds without repeated k-means runs. Neural Comput Appl 37:723

    Article  Google Scholar 

  51. Liao J, Li H, Feng A et al (2023) Domestic pig sound classification based on transformercnn. Appl Intel 53(5):4907–4923

    Google Scholar 

  52. Kang L, He S, Wang M et al (2023) Bilingual attention based neural machine translation. Appl Intel 53(4):4302–4315

    Article  Google Scholar 

  53. Wieting J, Berg-Kirkpatrick T, Gimpel K et al (2019) Beyond bleu: training neural machine translation with semantic similarity. arXiv preprint arXiv:1909.06694

  54. Cao Y, Kang Y, Sun L (2023) Instruction mining: High-quality instruction data selection for large language models. arXiv preprint arXiv:2307.06290 1(3):6

  55. Feng Z, Guo D, Tang D et al (2020) Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155

  56. Guo D, Ren S, Lu S et al (2020) Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366

  57. Ahmad WU, Chakraborty S, Ray B et al (2021) Unified pre-training for program understanding and generation. arXiv preprint arXiv:2103.06333

Download references

Funding

This work was supported by the National Nature Science Foundation of China under Grant 71972102 and the Universities Natural Science Research Project of Jiangsu Province under Grant 20KJA520002.

Author information

Authors and Affiliations

Authors

Contributions

Defeng Hu, Yong Chen, and Chao Xu contributed to writing of the first draft and executive subject experiment; Yong Chen and Chao Xu were involved in conception of the project; Nannan Chen and Defeng Hu contributed to data acquisition and analysis; and Jianbo Liu was involved in experimental environment and equipment. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Chen Yong, Hu Defeng or Xu Chao.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

This article contains no studies with human or animal participants conducted by any of the authors.

Human or animal rights

No involving human participants and/or animals.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yong, C., Defeng, H., Chao, X. et al. Smart contract generation model based on code annotation and AST-LSTM tuning. J Supercomput 81, 731 (2025). https://doi.org/10.1007/s11227-025-07186-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-025-07186-x

Keywords