Smart contract generation model based on code annotation and AST-LSTM tuning

Yong, Chen; Defeng, Hu; Chao, Xu; Nannan, Chen; Jianbo, Liu

doi:10.1007/s11227-025-07186-x

Smart contract generation model based on code annotation and AST-LSTM tuning

Published: 11 April 2025

Volume 81, article number 731, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Chen Yong¹,
Hu Defeng¹,
Xu Chao¹,
Chen Nannan¹ &
…
Liu Jianbo²

60 Accesses
Explore all metrics

Abstract

With the wide application of smart contracts in many fields, the number, types, and complexity of smart contracts are showing a rapidly increasing trend. However, the development of smart contracts has its own unique programming language and security requirements, which are difficult for conventional software personnel to adapt quickly, and how to realize the efficient development of smart contracts according to the application requirements is an important issue that needs to be solved for its further development. The author proposes a smart contract generation method based on abstract syntax tree (AST) long short-term memory (LSTM) characterization and code annotation tuning large language model, which adopts the AST-LSTM model combining abstract syntax tree and tree-long short-term memory(Tree-LSTM) to vectorize the code as well as Sentence-BERT to vectorize the annotations and carry out a weighted analysis, and constructs a smart contract clustering analysis model to achieve accurate clustering of functionally similar smart contracts. Then, the AST-LSTM+Transformer model is used to detect defects in the clustered code and correlate the related annotation information to construct a diverse prompt feature prompt statement dataset. Finally, the Llama2-7B model is used as the basis for demand-specific smart contract generation with the help of Lora and P-Tuning v2 fine-tuning techniques. In this paper, we conducted comparative experiments with existing methods with the help of BLEU, an auxiliary tool for the quality assessment of bilingual translation, and Mythril, VaaS, and other code security detection tools. The results of the experiment show that the average value of BLEU of the code generated by this paper’s method is improved by about 25%, and the code security is improved by about 9%, which will greatly promote the rapid development and exploitation of smart contracts with high-security requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging pre-trained language models for code generation

Article Open access 29 February 2024

Leveraging meta-data of code for adapting prompt tuning for code summarization

Article 23 December 2024

Rethinking AI code generation: a one-shot correction approach based on user feedback

Article Open access 12 July 2024

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Ullah F, Al-Turjman F (2023) A conceptual framework for blockchain smart contract adoption to manage real estate deals in smart cities. Neural Comput Appl 35(7):5033–5054
Article Google Scholar
Vacca A, Di Sorbo A, Visaggio CA et al (2021) A systematic literature review of blockchain and smart contract development: Techniques, tools, and open challenges. J Syst Softw 174:110891
Article Google Scholar
Zou W, Lo D, Kochhar PS et al (2019) Smart contract development: challenges and opportunities. IEEE Trans Softw Eng 47(10):2084–2106
Article Google Scholar
Mao D, Wang F, Wang Y et al (2019) Visual and user-defined smart contract designing system based on automatic coding. IEEE Access 7:73131–73143
Article Google Scholar
Gao Yichen ZZZhao Bin (2020) Research and implementation of the automatic smart contract generation method for Ethereum. J East China Normal Univ (Natural Science Edition) 2020(5):21
Google Scholar
Dwivedi V, Norta A (2022) Auto-generation of smart contracts from a domain-specific xml-based language. In: Intelligent Data Engineering and Analytics: Proceedings of the 9th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2021), Springer, pp 549–564
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Advances in neural information processing systems 30
Kocoń J, Cichecki I, Kaszyca O et al (2023) Chatgpt: Jack of all trades, master of none. Information Fusion, p 101861
Liu Y, Han T, Ma S et al (2023) Summary of chatgpt-related research and perspective towards the future of large language models. Meta Radiol 1:100017
Article Google Scholar
Dakhel AM, Majdinasab V, Nikanjam A et al (2023) Github copilot AI pair programmer: Asset or liability? J Syst Softw 203:111734
Article Google Scholar
Roziere B, Gehring J, Gloeckle F et al (2023) arXiv preprint arXiv:2308.12950
Petrović N, Al-Azzoni I (2023) Model-driven smart contract generation leveraging chatgpt. In: International Conference On Systems Engineering, Springer, pp 387–396
Zhao J, Chen X, Yang G et al (2023) Automatic smart contract comment generation via large language models and in-context learning. arXiv preprint arXiv:2311.10388
Majumdar S, Papdeja S, Das PP et al (2020) Comment-mine-a semantic search approach to program comprehension from code comments. Adv Comput Syst Secur 12:29–42
Article Google Scholar
Shinyama Y, Arahori Y, Gondow K (2018) Analyzing code comments to boost program comprehension. In: 2018 25th Asia-Pacific Software Engineering Conference (APSEC), IEEE, pp 325–334
Zhang J, Wang X, Zhang H et al (2019) A novel neural source code representation based on abstract syntax tree. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), IEEE, pp 783–794
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075
Chu Y, Cao H, Diao Y et al (2023) Refined sbert: Representing sentence bert in manifold space. Neurocomputing 555:126453
Article Google Scholar
Chang Y, Wang X, Wang J et al (2023) A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109
Thirunavukarasu AJ, Ting DSJ, Elangovan K et al (2023) Large language models in medicine. Nat Med 29(8):1930–1940
Article Google Scholar
Kasneci E, Seßler K, Küchemann S et al (2023) Chatgpt for good? On opportunities and challenges of large language models for education. Learn Individ Differ 103:102274
Article Google Scholar
Touvron H, Lavril T, Izacard G et al (2023a) Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971
Touvron H, Martin L, Stone K et al (2023b) Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288
Zeng A, Liu X, Du Z et al (2022) Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414
Yang A, Xiao B, Wang B et al (2023) Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305
Hu EJ, Shen Y, Wallis P et al (2021) Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685
Dettmers T, Pagnoni A, Holtzman A et al (2023) Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314
Li XL, Liang P (2021) Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190
Liu X, Ji K, Fu Y et al (2022) P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, (Volume 2: Short Papers), pp 61–68
Liu X, Ji K, Fu Y et al (2021) P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602
Cui J, Li Z, Yan Y et al (2023) Chatlaw: Open-source legal large language model with integrated external knowledge bases. arXiv preprint arXiv:2306.16092
Wang H, Liu C, Xi N et al (2023) Huatuo: tuning llama model with Chinese medical knowledge. arXiv preprint arXiv:2304.06975
Almakhour M, Sliman L, Samhat AE et al (2020) Verification of smart contracts: a survey. Pervasive Mob Comput 67:101227
Article Google Scholar
Samreen NF, Alalfi MH (2021) A survey of security vulnerabilities in ethereum smart contracts. arXiv preprint arXiv:2105.06974
Praitheeshan P, Pan L, Yu J et al (2019) Security analysis methods on ethereum smart contract vulnerabilities: a survey. arXiv preprint arXiv:1908.08605
He D, Wu R, Li X et al (2023) Detection of vulnerabilities of blockchain smart contracts. IEEE Internet Things J 10:12178
Article Google Scholar
Xiang C, Zhanqi C, Zan MGW, Guang Y (2021) Summary of automated generation methods for code annotation. J Softw 32(7):24
Google Scholar
Iqbal T, Qureshi S (2022) The survey: text generation models in deep learning. J King Saud Univ Comput Inf Sci 34(6):2515–2528
Article Google Scholar
Roziere B, Gehring J, Gloeckle F et al (2023) Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950
Zheng Q, Xia X, Zou X et al (2023) Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x. arXiv preprint arXiv:2303.17568
Li R, Allal LB, Zi Y et al (2023) Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161
Feng H, Fu X, Sun H et al (2020) Efficient vulnerability detection based on abstract syntax tree and deep learning. In: IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), IEEE, pp 722–727
Zhang T, Mouchère H, Viard-Gaudin C (2020) A tree-BLSTM-based recognition system for online handwritten mathematical expressions. Neural Comput Appl 32:4689–4708
Article Google Scholar
Yu W, Yi M, Huang X et al (2020) Make it directly: event extraction based on tree-LSTM and BI-GRU. IEEE Access 8:14344–14354
Article Google Scholar
Zhou X, Lu L (2020) Defect prediction via lstm based on sequence and tree structure. 2020 IEEE 20th International Conference on Software Quality. IEEE, Reliability and Security (QRS), pp 366–373
Zhao M, Hamarneh G (2019) Tree-lstm: using lstm to encode memory in anatomical tree prediction from 3d images. In: International Workshop on Machine Learning in Medical Imaging, Springer, pp 637–645
Harer J, Reale C, Chin P (2019) Tree-transformer: A transformer-based method for correction of tree-structured data. arXiv preprint arXiv:1908.00449
Ahmed M, Seraj R, Islam SMS (2020) The k-means algorithm: a comprehensive survey and performance evaluation. Electronics 9(8):1295
Article Google Scholar
García J, Crawford B, Soto R et al (2018) A k-means binarization framework applied to multidimensional knapsack problem. Appl Intel 48:357–380
Article Google Scholar
Olukanmi P, Nelwamondo F, Marwala T (2022) k-means-mind: comparing seeds without repeated k-means runs. Neural Comput Appl 37:723
Article Google Scholar
Liao J, Li H, Feng A et al (2023) Domestic pig sound classification based on transformercnn. Appl Intel 53(5):4907–4923
Google Scholar
Kang L, He S, Wang M et al (2023) Bilingual attention based neural machine translation. Appl Intel 53(4):4302–4315
Article Google Scholar
Wieting J, Berg-Kirkpatrick T, Gimpel K et al (2019) Beyond bleu: training neural machine translation with semantic similarity. arXiv preprint arXiv:1909.06694
Cao Y, Kang Y, Sun L (2023) Instruction mining: High-quality instruction data selection for large language models. arXiv preprint arXiv:2307.06290 1(3):6
Feng Z, Guo D, Tang D et al (2020) Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155
Guo D, Ren S, Lu S et al (2020) Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366
Ahmad WU, Chakraborty S, Ray B et al (2021) Unified pre-training for program understanding and generation. arXiv preprint arXiv:2103.06333

Download references

Funding

This work was supported by the National Nature Science Foundation of China under Grant 71972102 and the Universities Natural Science Research Project of Jiangsu Province under Grant 20KJA520002.

Author information

Authors and Affiliations

School of Computer Science, Nanjing Audit University, 86 Yushan West Road, Jiangpu Street, Nanjing, 211815, Jiangsu, China
Chen Yong, Hu Defeng, Xu Chao & Chen Nannan
Wuhan Shubo Technology, Wuhan Shubo Technology Co., Ltd., Hongshan District, Fenghuo Innovation Valley, Wuhan, 430070, Hubei, China
Liu Jianbo

Authors

Chen Yong
View author publications
You can also search for this author inPubMed Google Scholar
Hu Defeng
View author publications
You can also search for this author inPubMed Google Scholar
Xu Chao
View author publications
You can also search for this author inPubMed Google Scholar
Chen Nannan
View author publications
You can also search for this author inPubMed Google Scholar
Liu Jianbo
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Defeng Hu, Yong Chen, and Chao Xu contributed to writing of the first draft and executive subject experiment; Yong Chen and Chao Xu were involved in conception of the project; Nannan Chen and Defeng Hu contributed to data acquisition and analysis; and Jianbo Liu was involved in experimental environment and equipment. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Chen Yong, Hu Defeng or Xu Chao.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

This article contains no studies with human or animal participants conducted by any of the authors.

Human or animal rights

No involving human participants and/or animals.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yong, C., Defeng, H., Chao, X. et al. Smart contract generation model based on code annotation and AST-LSTM tuning. J Supercomput 81, 731 (2025). https://doi.org/10.1007/s11227-025-07186-x

Download citation

Accepted: 13 March 2025
Published: 11 April 2025
DOI: https://doi.org/10.1007/s11227-025-07186-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Smart contract generation model based on code annotation and AST-LSTM tuning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Leveraging pre-trained language models for code generation

Leveraging meta-data of code for adapting prompt tuning for code summarization

Rethinking AI code generation: a one-shot correction approach based on user feedback

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Human or animal rights

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now