Skip to main content

JStrack: Enriching Malicious JavaScript Detection Based on AST Graph Analysis and Attention Mechanism

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 13109))

Abstract

Malicious JavaScript is one of the most common tools for attackers to exploit the vulnerability of web applications. It can carry potential risks such as spreading malware, phishing, or collecting sensitive information. Though there are numerous types of malicious JavaScript that are difficult to detect, generalizing the malicious script’s signature can help catch more complex JavaScripts that use obfuscation techniques. This paper aims at detecting malicious JavaScripts based on structure and attribute analysis of abstract syntax trees (ASTs) that capture the generalized semantic meaning of the source code. We apply a graph convolutional neural network (GCN) to process the AST features and get a graph representation via neural message passing with neighborhood aggregation. The attention layer enriches our method to track pertinent parts of scripts that may contain the signature of malicious intent. We comprehensively evaluate the performance of our proposed approach on a real-world dataset to detect malicious websites. The proposed method demonstrates promising performance in terms of detection accuracy and robustness against obfuscated samples.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)

    MathSciNet  MATH  Google Scholar 

  2. Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious JavaScript code. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 281–290. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1772690.1772720

  3. Douligeris, C., Mitrokotsa, A.: DDoS attacks and defense mechanisms: classification and state-of-the-art. Comput. Netw. 44(5), 643–666 (2004)

    Article  Google Scholar 

  4. The estree spec. https://github.com/estree/estree. Accessed 20 Jan 2021

  5. Fang, Y., Huang, C., Liu, L., Xue, M.: Research on malicious JavaScript detection technology based on LSTM. IEEE Access 6, 59118–59125 (2018)

    Article  Google Scholar 

  6. Fass, A., Krawczyk, R.P., Backes, M., Stock, B.: JaSt: fully syntactic detection of malicious (obfuscated) JavaScript. In: Giuffrida, C., Bardin, S., Blanc, G. (eds.) DIMVA 2018. LNCS, vol. 10885, pp. 303–325. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93411-2_14

    Chapter  Google Scholar 

  7. Gupta, S., Gupta, B.: Enhanced XSS defensive framework for web applications deployed in the virtual machines of cloud computing environment. Procedia Technol. 24, 1595–1602 (2016). https://doi.org/10.1016/j.protcy.2016.05.152. https://www.sciencedirect.com/science/article/pii/S2212017316302419. International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015)

  8. Hamilton, W.L.: Graph representation learning. In: Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 14, no. 3, pp. 1–159 (2020)

    Google Scholar 

  9. Kamkar, S.: phpwn: attacking sessions and pseudo-random numbers in PHP. In: Blackhat (2010)

    Google Scholar 

  10. Majestic. https://majestic.com/. Accessed 26 Jan 2021

  11. Data modes. https://graphneural.network/data-modes/. Accessed 17 Apr 2021

  12. Ndichu, S., Kim, S., Ozawa, S.: Deobfuscation, unpacking, and decoding of obfuscated malicious JavaScript for machine learning models detection performance improvement. CAAI Trans. Intell. Technol. 5, 184–192 (2020)

    Article  Google Scholar 

  13. Raychev, V., Bielik, P., Vechev, M., Krause, A.: Learning programs from noisy data. SIGPLAN Not. 51(1), 761–774 (2016)

    Article  Google Scholar 

  14. Rozi, M.F., Kim, S., Ozawa, S.: Deep neural networks for malicious JavaScript detection using bytecode sequences. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2020)

    Google Scholar 

  15. Song, X., Chen, C., Cui, B., Fu, J.: Malicious JavaScript detection based on bidirectional LSTM model. Appl. Sci. 10(10), 3440 (2020). https://doi.org/10.3390/app10103440. https://www.mdpi.com/2076-3417/10/10/3440

  16. Usage statistics of JavaScript as client-side programming language on websites. https://w3techs.com/technologies/details/cp-javascript. Accessed 08 May 2021

  17. Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 6000–6010. Curran Associates Inc., Red Hook (2017)

    Google Scholar 

  18. Virustotal. https://www.virustotal.com/gui/. Accessed 15 Jan 2021

  19. Wassermann, G., Su, Z.: Static detection of cross-site scripting vulnerabilities. In: 2008 ACM/IEEE 30th International Conference on Software Engineering, pp. 171–180 (2008). https://doi.org/10.1145/1368088.1368112

  20. Weston, J., Ratle, F., Collobert, R.: Deep learning via semi-supervised embedding. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 1168–1175. Association for Computing Machinery, New York (2008). https://doi.org/10.1145/1390156.1390303

  21. Yaworski, P.: Real-world bug hunting: a field guide to web hacking 14(3) (2019)

    Google Scholar 

  22. Zhou, K., et al.: Understanding and resolving performance degradation in graph convolutional networks. arXiv e-prints arXiv:2006.07107, June 2020

  23. Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian fields and harmonic functions. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), Washington, DC, USA, 21–24 August 2003, pp. 912–919. AAAI Press (2003). http://www.aaai.org/Library/ICML/2003/icml03-118.php

Download references

Acknowledgements

This research was partially supported by the Ministry of Education, Science, Sports, and Culture, Grant-in-Aid for Scientific Research (B) 21H03444.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Fakhrur Rozi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rozi, M.F., Ban, T., Ozawa, S., Kim, S., Takahashi, T., Inoue, D. (2021). JStrack: Enriching Malicious JavaScript Detection Based on AST Graph Analysis and Attention Mechanism. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Lecture Notes in Computer Science(), vol 13109. Springer, Cham. https://doi.org/10.1007/978-3-030-92270-2_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92270-2_57

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92269-6

  • Online ISBN: 978-3-030-92270-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics