Abstract
With the wide application of smart contracts in many fields, the number, types, and complexity of smart contracts are showing a rapidly increasing trend. However, the development of smart contracts has its own unique programming language and security requirements, which are difficult for conventional software personnel to adapt quickly, and how to realize the efficient development of smart contracts according to the application requirements is an important issue that needs to be solved for its further development. The author proposes a smart contract generation method based on abstract syntax tree (AST) long short-term memory (LSTM) characterization and code annotation tuning large language model, which adopts the AST-LSTM model combining abstract syntax tree and tree-long short-term memory(Tree-LSTM) to vectorize the code as well as Sentence-BERT to vectorize the annotations and carry out a weighted analysis, and constructs a smart contract clustering analysis model to achieve accurate clustering of functionally similar smart contracts. Then, the AST-LSTM+Transformer model is used to detect defects in the clustered code and correlate the related annotation information to construct a diverse prompt feature prompt statement dataset. Finally, the Llama2-7B model is used as the basis for demand-specific smart contract generation with the help of Lora and P-Tuning v2 fine-tuning techniques. In this paper, we conducted comparative experiments with existing methods with the help of BLEU, an auxiliary tool for the quality assessment of bilingual translation, and Mythril, VaaS, and other code security detection tools. The results of the experiment show that the average value of BLEU of the code generated by this paper’s method is improved by about 25%, and the code security is improved by about 9%, which will greatly promote the rapid development and exploitation of smart contracts with high-security requirements.















Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Ullah F, Al-Turjman F (2023) A conceptual framework for blockchain smart contract adoption to manage real estate deals in smart cities. Neural Comput Appl 35(7):5033–5054
Vacca A, Di Sorbo A, Visaggio CA et al (2021) A systematic literature review of blockchain and smart contract development: Techniques, tools, and open challenges. J Syst Softw 174:110891
Zou W, Lo D, Kochhar PS et al (2019) Smart contract development: challenges and opportunities. IEEE Trans Softw Eng 47(10):2084–2106
Mao D, Wang F, Wang Y et al (2019) Visual and user-defined smart contract designing system based on automatic coding. IEEE Access 7:73131–73143
Gao Yichen ZZZhao Bin (2020) Research and implementation of the automatic smart contract generation method for Ethereum. J East China Normal Univ (Natural Science Edition) 2020(5):21
Dwivedi V, Norta A (2022) Auto-generation of smart contracts from a domain-specific xml-based language. In: Intelligent Data Engineering and Analytics: Proceedings of the 9th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2021), Springer, pp 549–564
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Advances in neural information processing systems 30
Kocoń J, Cichecki I, Kaszyca O et al (2023) Chatgpt: Jack of all trades, master of none. Information Fusion, p 101861
Liu Y, Han T, Ma S et al (2023) Summary of chatgpt-related research and perspective towards the future of large language models. Meta Radiol 1:100017
Dakhel AM, Majdinasab V, Nikanjam A et al (2023) Github copilot AI pair programmer: Asset or liability? J Syst Softw 203:111734
Roziere B, Gehring J, Gloeckle F et al (2023) arXiv preprint arXiv:2308.12950
Petrović N, Al-Azzoni I (2023) Model-driven smart contract generation leveraging chatgpt. In: International Conference On Systems Engineering, Springer, pp 387–396
Zhao J, Chen X, Yang G et al (2023) Automatic smart contract comment generation via large language models and in-context learning. arXiv preprint arXiv:2311.10388
Majumdar S, Papdeja S, Das PP et al (2020) Comment-mine-a semantic search approach to program comprehension from code comments. Adv Comput Syst Secur 12:29–42
Shinyama Y, Arahori Y, Gondow K (2018) Analyzing code comments to boost program comprehension. In: 2018 25th Asia-Pacific Software Engineering Conference (APSEC), IEEE, pp 325–334
Zhang J, Wang X, Zhang H et al (2019) A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), IEEE, pp 783–794
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075
Chu Y, Cao H, Diao Y et al (2023) Refined sbert: Representing sentence bert in manifold space. Neurocomputing 555:126453
Chang Y, Wang X, Wang J et al (2023) A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109
Thirunavukarasu AJ, Ting DSJ, Elangovan K et al (2023) Large language models in medicine. Nat Med 29(8):1930–1940
Kasneci E, Seßler K, Küchemann S et al (2023) Chatgpt for good? On opportunities and challenges of large language models for education. Learn Individ Differ 103:102274
Touvron H, Lavril T, Izacard G et al (2023a) Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971
Touvron H, Martin L, Stone K et al (2023b) Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288
Zeng A, Liu X, Du Z et al (2022) Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414
Yang A, Xiao B, Wang B et al (2023) Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305
Hu EJ, Shen Y, Wallis P et al (2021) Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685
Dettmers T, Pagnoni A, Holtzman A et al (2023) Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314
Li XL, Liang P (2021) Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190
Liu X, Ji K, Fu Y et al (2022) P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, (Volume 2: Short Papers), pp 61–68
Liu X, Ji K, Fu Y et al (2021) P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602
Cui J, Li Z, Yan Y et al (2023) Chatlaw: Open-source legal large language model with integrated external knowledge bases. arXiv preprint arXiv:2306.16092
Wang H, Liu C, Xi N et al (2023) Huatuo: tuning llama model with Chinese medical knowledge. arXiv preprint arXiv:2304.06975
Almakhour M, Sliman L, Samhat AE et al (2020) Verification of smart contracts: a survey. Pervasive Mob Comput 67:101227
Samreen NF, Alalfi MH (2021) A survey of security vulnerabilities in ethereum smart contracts. arXiv preprint arXiv:2105.06974
Praitheeshan P, Pan L, Yu J et al (2019) Security analysis methods on ethereum smart contract vulnerabilities: a survey. arXiv preprint arXiv:1908.08605
He D, Wu R, Li X et al (2023) Detection of vulnerabilities of blockchain smart contracts. IEEE Internet Things J 10:12178
Xiang C, Zhanqi C, Zan MGW, Guang Y (2021) Summary of automated generation methods for code annotation. J Softw 32(7):24
Iqbal T, Qureshi S (2022) The survey: text generation models in deep learning. J King Saud Univ Comput Inf Sci 34(6):2515–2528
Roziere B, Gehring J, Gloeckle F et al (2023) Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950
Zheng Q, Xia X, Zou X et al (2023) Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x. arXiv preprint arXiv:2303.17568
Li R, Allal LB, Zi Y et al (2023) Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161
Feng H, Fu X, Sun H et al (2020) Efficient vulnerability detection based on abstract syntax tree and deep learning. In: IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), IEEE, pp 722–727
Zhang T, Mouchère H, Viard-Gaudin C (2020) A tree-BLSTM-based recognition system for online handwritten mathematical expressions. Neural Comput Appl 32:4689–4708
Yu W, Yi M, Huang X et al (2020) Make it directly: event extraction based on tree-LSTM and BI-GRU. IEEE Access 8:14344–14354
Zhou X, Lu L (2020) Defect prediction via lstm based on sequence and tree structure. 2020 IEEE 20th International Conference on Software Quality. IEEE, Reliability and Security (QRS), pp 366–373
Zhao M, Hamarneh G (2019) Tree-lstm: using lstm to encode memory in anatomical tree prediction from 3d images. In: International Workshop on Machine Learning in Medical Imaging, Springer, pp 637–645
Harer J, Reale C, Chin P (2019) Tree-transformer: A transformer-based method for correction of tree-structured data. arXiv preprint arXiv:1908.00449
Ahmed M, Seraj R, Islam SMS (2020) The k-means algorithm: a comprehensive survey and performance evaluation. Electronics 9(8):1295
García J, Crawford B, Soto R et al (2018) A k-means binarization framework applied to multidimensional knapsack problem. Appl Intel 48:357–380
Olukanmi P, Nelwamondo F, Marwala T (2022) k-means-mind: comparing seeds without repeated k-means runs. Neural Comput Appl 37:723
Liao J, Li H, Feng A et al (2023) Domestic pig sound classification based on transformercnn. Appl Intel 53(5):4907–4923
Kang L, He S, Wang M et al (2023) Bilingual attention based neural machine translation. Appl Intel 53(4):4302–4315
Wieting J, Berg-Kirkpatrick T, Gimpel K et al (2019) Beyond bleu: training neural machine translation with semantic similarity. arXiv preprint arXiv:1909.06694
Cao Y, Kang Y, Sun L (2023) Instruction mining: High-quality instruction data selection for large language models. arXiv preprint arXiv:2307.06290 1(3):6
Feng Z, Guo D, Tang D et al (2020) Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155
Guo D, Ren S, Lu S et al (2020) Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366
Ahmad WU, Chakraborty S, Ray B et al (2021) Unified pre-training for program understanding and generation. arXiv preprint arXiv:2103.06333
Funding
This work was supported by the National Nature Science Foundation of China under Grant 71972102 and the Universities Natural Science Research Project of Jiangsu Province under Grant 20KJA520002.
Author information
Authors and Affiliations
Contributions
Defeng Hu, Yong Chen, and Chao Xu contributed to writing of the first draft and executive subject experiment; Yong Chen and Chao Xu were involved in conception of the project; Nannan Chen and Defeng Hu contributed to data acquisition and analysis; and Jianbo Liu was involved in experimental environment and equipment. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Ethical approval
This article contains no studies with human or animal participants conducted by any of the authors.
Human or animal rights
No involving human participants and/or animals.
Informed consent
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yong, C., Defeng, H., Chao, X. et al. Smart contract generation model based on code annotation and AST-LSTM tuning. J Supercomput 81, 731 (2025). https://doi.org/10.1007/s11227-025-07186-x
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-025-07186-x