JStrack: Enriching Malicious JavaScript Detection Based on AST Graph Analysis and Attention Mechanism

Rozi, Muhammad Fakhrur; Ban, Tao; Ozawa, Seiichi; Kim, Sangwook; Takahashi, Takeshi; Inoue, Daisuke

doi:10.1007/978-3-030-92270-2_57

JStrack: Enriching Malicious JavaScript Detection Based on AST Graph Analysis and Attention Mechanism

Muhammad Fakhrur Rozi^13,14,
Tao Ban¹³,
Seiichi Ozawa¹⁴,
Sangwook Kim¹⁴,
Takeshi Takahashi¹³ &
…
Daisuke Inoue¹³

Conference paper
First Online: 07 December 2021

1669 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13109))

Abstract

Malicious JavaScript is one of the most common tools for attackers to exploit the vulnerability of web applications. It can carry potential risks such as spreading malware, phishing, or collecting sensitive information. Though there are numerous types of malicious JavaScript that are difficult to detect, generalizing the malicious script’s signature can help catch more complex JavaScripts that use obfuscation techniques. This paper aims at detecting malicious JavaScripts based on structure and attribute analysis of abstract syntax trees (ASTs) that capture the generalized semantic meaning of the source code. We apply a graph convolutional neural network (GCN) to process the AST features and get a graph representation via neural message passing with neighborhood aggregation. The attention layer enriches our method to track pertinent parts of scripts that may contain the signature of malicious intent. We comprehensively evaluate the performance of our proposed approach on a real-world dataset to detect malicious websites. The proposed method demonstrates promising performance in terms of detection accuracy and robustness against obfuscated samples.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)
MathSciNet MATH Google Scholar
Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious JavaScript code. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 281–290. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1772690.1772720
Douligeris, C., Mitrokotsa, A.: DDoS attacks and defense mechanisms: classification and state-of-the-art. Comput. Netw. 44(5), 643–666 (2004)
Article Google Scholar
The estree spec. https://github.com/estree/estree. Accessed 20 Jan 2021
Fang, Y., Huang, C., Liu, L., Xue, M.: Research on malicious JavaScript detection technology based on LSTM. IEEE Access 6, 59118–59125 (2018)
Article Google Scholar
Fass, A., Krawczyk, R.P., Backes, M., Stock, B.: JaSt: fully syntactic detection of malicious (obfuscated) JavaScript. In: Giuffrida, C., Bardin, S., Blanc, G. (eds.) DIMVA 2018. LNCS, vol. 10885, pp. 303–325. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93411-2_14
Chapter Google Scholar
Gupta, S., Gupta, B.: Enhanced XSS defensive framework for web applications deployed in the virtual machines of cloud computing environment. Procedia Technol. 24, 1595–1602 (2016). https://doi.org/10.1016/j.protcy.2016.05.152. https://www.sciencedirect.com/science/article/pii/S2212017316302419. International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015)
Hamilton, W.L.: Graph representation learning. In: Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 14, no. 3, pp. 1–159 (2020)
Google Scholar
Kamkar, S.: phpwn: attacking sessions and pseudo-random numbers in PHP. In: Blackhat (2010)
Google Scholar
Majestic. https://majestic.com/. Accessed 26 Jan 2021
Data modes. https://graphneural.network/data-modes/. Accessed 17 Apr 2021
Ndichu, S., Kim, S., Ozawa, S.: Deobfuscation, unpacking, and decoding of obfuscated malicious JavaScript for machine learning models detection performance improvement. CAAI Trans. Intell. Technol. 5, 184–192 (2020)
Article Google Scholar
Raychev, V., Bielik, P., Vechev, M., Krause, A.: Learning programs from noisy data. SIGPLAN Not. 51(1), 761–774 (2016)
Article Google Scholar
Rozi, M.F., Kim, S., Ozawa, S.: Deep neural networks for malicious JavaScript detection using bytecode sequences. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020)
Google Scholar
Song, X., Chen, C., Cui, B., Fu, J.: Malicious JavaScript detection based on bidirectional LSTM model. Appl. Sci. 10(10), 3440 (2020). https://doi.org/10.3390/app10103440. https://www.mdpi.com/2076-3417/10/10/3440
Usage statistics of JavaScript as client-side programming language on websites. https://w3techs.com/technologies/details/cp-javascript. Accessed 08 May 2021
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 6000–6010. Curran Associates Inc., Red Hook (2017)
Google Scholar
Virustotal. https://www.virustotal.com/gui/. Accessed 15 Jan 2021
Wassermann, G., Su, Z.: Static detection of cross-site scripting vulnerabilities. In: 2008 ACM/IEEE 30th International Conference on Software Engineering, pp. 171–180 (2008). https://doi.org/10.1145/1368088.1368112
Weston, J., Ratle, F., Collobert, R.: Deep learning via semi-supervised embedding. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 1168–1175. Association for Computing Machinery, New York (2008). https://doi.org/10.1145/1390156.1390303
Yaworski, P.: Real-world bug hunting: a field guide to web hacking 14(3) (2019)
Google Scholar
Zhou, K., et al.: Understanding and resolving performance degradation in graph convolutional networks. arXiv e-prints arXiv:2006.07107, June 2020
Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian fields and harmonic functions. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), Washington, DC, USA, 21–24 August 2003, pp. 912–919. AAAI Press (2003). http://www.aaai.org/Library/ICML/2003/icml03-118.php

Download references

Acknowledgements

This research was partially supported by the Ministry of Education, Science, Sports, and Culture, Grant-in-Aid for Scientific Research (B) 21H03444.

Author information

Authors and Affiliations

National Institute of Information and Communications Technology, Koganei, Tokyo, Japan
Muhammad Fakhrur Rozi, Tao Ban, Takeshi Takahashi & Daisuke Inoue
Kobe University, Kobe, Hyogo, Japan
Muhammad Fakhrur Rozi, Seiichi Ozawa & Sangwook Kim

Authors

Muhammad Fakhrur Rozi
View author publications
You can also search for this author in PubMed Google Scholar
Tao Ban
View author publications
You can also search for this author in PubMed Google Scholar
Seiichi Ozawa
View author publications
You can also search for this author in PubMed Google Scholar
Sangwook Kim
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Inoue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Fakhrur Rozi .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rozi, M.F., Ban, T., Ozawa, S., Kim, S., Takahashi, T., Inoue, D. (2021). JStrack: Enriching Malicious JavaScript Detection Based on AST Graph Analysis and Attention Mechanism. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13109. Springer, Cham. https://doi.org/10.1007/978-3-030-92270-2_57

Download citation

DOI: https://doi.org/10.1007/978-3-030-92270-2_57
Published: 07 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92269-6
Online ISBN: 978-3-030-92270-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics