ABSTRACT
Deep learning-based vulnerability detection frees human experts from the tedious task of defining features and allows for better detection capabilities. The common practice is to convert program code into vector representation for neural network model training. Since the length of the vector representation varies across program code, finding the optimal vector length is critical to ensuring detection accuracy. This paper proposes an adaptive search optimization algorithm for finding the optimal vector length. It sorts all the vector lengths obtained by word2vec and takes the vector length corresponding to the point where the trend changes from slow to fast as the output. We evaluate our algorithm on three publicly available datasets against state-of-the-art algorithms. The results show that, without significantly increasing the time overhead, our algorithm can more accurately choose an appropriate vector length instead of setting a value empirically or arbitrarily. Furthermore, it shows that while a larger vector length can usually produces a higher detection accuracy, the extra time overhead incurred often does not suffice to compensate for the corresponding accuracy improvement.
- Jinfu Chen, Bo Liu, Saihua Cai, Weijia Wang, and Shengran Wang. 2021. AIdetectorX: A Vulnerability Detector Based on TCN and Self-attention Mechanism. In International Symposium on Dependable Software Engineering: Theories, Tools, and Applications. 161–177.Google Scholar
- Jacob A Harer, Louis Y Kim, Rebecca L Russell, 2018. Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497(2018).Google Scholar
- Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh. 2017. VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery. In Proceedings of 2017 IEEE Symposium on Security and Privacy (S&P’17). 595–614.Google ScholarCross Ref
- Colin Lea, Rene Vidal, Austin Reiter, and Gregory D Hager. 2016. Temporal convolutional networks: A unified approach to action segmentation. In European Conference on Computer Vision. 47–54.Google ScholarCross Ref
- Jingyue Li and Michael D. Ernst. 2012. CBCD: Cloned Buggy Code Detector. In Proceedings of the 34th International Conference on Software Engineering (ICSE’12). 310–320.Google ScholarDigital Library
- Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-based System for Vulnerability Detection. In Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS’18). 1–15.Google ScholarCross Ref
- Guanjun Lin, Jun Zhang, Wei Luo, Lei Pan, Yang Xiang, Olivier De Vel, and Paul Montague. 2018. Cross-project transfer representation learning for vulnerable function discovery. IEEE Transactions on Industrial Informatics 14, 7 (2018), 3289–3297.Google ScholarCross Ref
- Shigang Liu, Guanjun Lin, Lizhen Qu, Jun Zhang, Olivier De Vel, Paul Montague, and Yang Xiang. 2020. CD-VulD: Cross-domain vulnerability discovery based on deep domain adaptation. IEEE Transactions on Dependable and Secure Computing (2020), 1–14.Google Scholar
- Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. 2008. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 2(2008), 539–550.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.Google Scholar
- Henning Perl, Sergej Dechand, Matthew Smith, Daniel Arp, Fabian Yamaguchi, Konrad Rieck, Sascha Fahl, and Yasemin Acar. 2015. VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS’15). 426–437.Google ScholarDigital Library
- Riccardo Scandariato, James Walden, Aram Hovsepyan, and Wouter Joosen. 2014. Predicting Vulnerable Software Components via Text Mining. IEEE Transactions on Software Engineering 40, 10 (2014), 993–1006.Google ScholarCross Ref
- Lwin Khin Shar and Hee Beng Kuan Tan. 2013. Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns. Information and Software Technology 55, 10 (2013), 1767–1780.Google ScholarDigital Library
- Zhixing Tan, Mingxuan Wang, Jun Xie, Yidong Chen, and Xiaodong Shi. 2018. Deep semantic role labeling with self-attention. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18). 1–8.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). 6000–6010.Google ScholarDigital Library
- James Walden, Jeff Stuckman, and Riccardo Scandariato. 2014. Predicting Vulnerable Components: Software Metrics vs Text Mining. In 2014 IEEE 25th International Symposium on Software Reliability Engineering (ISSRE’14). 23–33.Google ScholarDigital Library
- Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering (ICSE’16). 297–308.Google ScholarDigital Library
- Fabian Yamaguchi, Alwin Maier, Hugo Gascon, and Konrad Rieck. 2015. Automatic Inference of Search Patterns for Taint-Style Vulnerabilities. In Proceedings of 2015 IEEE Symposium on Security and Privacy (S&P’15). 797–812.Google ScholarDigital Library
Index Terms
- An adaptive search optimization algorithm for improving the detection capability of software vulnerability
Recommendations
Commit-Level, Neural Vulnerability Detection and Assessment
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringSoftware Vulnerabilities (SVs) are security flaws that are exploitable in cyber-attacks. Delay in the detection and assessment of SVs might cause serious consequences due to the unknown impacts on the attacked systems. The state-of-the-art approaches ...
AIdetectorX: A Vulnerability Detector Based on TCN and Self-attention Mechanism
Dependable Software Engineering. Theories, Tools, and ApplicationsAbstractA vulnerability detector should have both excellent detection capabilities (such as high accuracy, low false positive rate, low false negative rate, etc.) and little time overhead. However, existing vulnerability detection methods often rely on ...
Learning-based Vulnerability Detection in Binary Code
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and ComputingCyberattacks typically exploit software vulnerabilities to compromise computers and smart devices. To address vulnerabilities, many approaches have been developed to detect vulnerabilities using deep learning. However, most learning-based approaches ...
Comments