research-article

Public Access

SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits

Authors:
Wei You

Indiana University, Bloomington, Bloomington, IN, USA

Indiana University, Bloomington, Bloomington, IN, USA
View Profile

,
Peiyuan Zong

Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China

Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Kai Chen

Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China

Institute of Information Engineering, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China
View Profile

,
XiaoFeng Wang

Indiana University, Bloomington, Bloomington, IN, USA

Indiana University, Bloomington, Bloomington, IN, USA
View Profile

,
Xiaojing Liao

William and Mary, Williamsburg, VA, USA

William and Mary, Williamsburg, VA, USA
View Profile

,
Pan Bian

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

,
Bin Liang

Renmin University of China, Beijing, China

Renmin University of China, Beijing, China
View Profile

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications SecurityOctober 2017Pages 2139–2154https://doi.org/10.1145/3133956.3134085

Published:30 October 2017Publication History

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

Pages 2139–2154

ABSTRACT

Patches and related information about software vulnerabilities are often made available to the public, aiming to facilitate timely fixes. Unfortunately, the slow paces of system updates (30 days on average) often present to the attackers enough time to recover hidden bugs for attacking the unpatched systems. Making things worse is the potential to automatically generate exploits on input-validation flaws through reverse-engineering patches, even though such vulnerabilities are relatively rare (e.g., 5% among all Linux kernel vulnerabilities in last few years). Less understood, however, are the implications of other bug-related information (e.g., bug descriptions in CVE), particularly whether utilization of such information can facilitate exploit generation, even on other vulnerability types that have never been automatically attacked.

In this paper, we seek to use such information to generate proof-of-concept (PoC) exploits for the vulnerability types never automatically attacked. Unlike an input validation flaw that is often patched by adding missing sanitization checks, fixing other vulnerability types is more complicated, usually involving replacement of the whole chunk of code. Without understanding of the code changed, automatic exploit becomes less likely. To address this challenge, we present SemFuzz, a novel technique leveraging vulnerability-related text (e.g., CVE reports and Linux git logs) to guide automatic generation of PoC exploits. Such an end-to-end approach is made possible by natural-language processing (NLP) based information extraction and a semantics-based fuzzing process guided by such information. Running over 112 Linux kernel flaws reported in the past five years, SemFuzz successfully triggered 18 of them, and further discovered one zero-day and one undisclosed vulnerabilities. These flaws include use-after-free, memory corruption, information leak, etc., indicating that more complicated flaws can also be automatically attacked. This finding calls into question the way vulnerability-related information is shared today.

Supplemental Material

References

2016. 2016 Financial Industry Cybersecurity Report. https://cdn2.hubspot.net/hubfs/533449/SecurityScorecard 2016 Financial Report.pdf. (2016).Google Scholar
2016. FullDisclosure: CVE-2016--8655 Linux af packet.c race condition (local root). http://seclists.org/oss-sec/2016/q4/607. (2016).Google Scholar
2016. Kernel: Add KCOV Code Coverage. https://lwn.net/Articles/671640/. (2016).Google Scholar
2016. Syzkaller. https://github.com/google/syzkaller. (2016).Google Scholar
2016. Yahoo: Hackers Stole Data On Another Billion Accounts. https://www.forbes.com/sites/thomasbrewster/2016/12/14/yahoo-admitsanother-billion-user-accounts-were-leaked-in-2013. (2016).Google Scholar
2017. Application Vulnerability: Trend Analysis and Correlation of Coding Patterns across Industries. https://www.cognizant.com/whitepapers/ApplicationVulnerability-Trend-Analysis-and-Correlation-of-Coding-Patterns-AcrossIndustries.pdf. (2017).Google Scholar
2017. Bug 195709. https://bugzilla.kernel.org/show bug.cgi?id=195709. (2017).Google Scholar
2017. Bug 195807. https://bugzilla.kernel.org/show bug.cgi?id=195807. (2017).Google Scholar
2017. Common Vulnerabilities and Exposures. https://cve.mitre.org. (2017).Google Scholar
2017. Common Weakness Enumeration. https://cwe.mitre.org. (2017).Google Scholar
2017. CWE: Improper Input Validation. https://cwe.mitre.org/data/definitions/20.html. (2017).Google Scholar
2017. FullDisclosure Mailing List. http://seclists.org/fulldisclosure. (2017).Google Scholar
2017. Information Security Resources. https://www.sans.org/security-resources/blogs. (2017).Google Scholar
2017. Krebs on Security. https://krebsonsecurity.com. (2017).Google Scholar
2017. Linux Kernel Git Repositories. https://git.kernel.org. (2017).Google Scholar
2017. Linux man pages online. http://man7.org/linux/man-pages/index.html. (2017).Google Scholar
2017. National Vulnerability Database. https://nvd.nist.gov. (2017).Google Scholar
2017. pyStatParser. https://github.com/emilmont/pyStatParser. (2017).Google Scholar
2017. STP Constraint Solver. http://stp.github.io. (2017).Google Scholar
2017. Vulnerability. https://en.wikipedia.org/wiki/Vulnerability (computing). (2017).Google Scholar
2017. vUSBf. https://github.com/schumilo/vUSBf. (2017).Google Scholar
2017. WannaCry Ransomware Attack. https://en.wikipedia.org/wiki/WannaCry ransomware attack. (2017).Google Scholar
2017. What is CVE and How Does It Work? http://www.csoonline.com/article/3204884/application-security/what-isthe-cve-and-how-does-it-work.html. (2017).Google Scholar
Abeer Alhuzali, Birhanu Eshete, Rigel Gjomemo, and VN Venkatakrishnan. 2016. Chainsaw: Chained Automated Workflow-Based Exploit Generation. In Proceedings of the 23rd ACM Conference on Computer and Communications Security (CCS 2016). ACM, 641--652. Google ScholarDigital Library
Frances E Allen. 1970. Control Flow Analysis. In ACM SIGPLAN Notices, Vol. 5. ACM, 1--19.Google ScholarDigital Library
Thanassis Avgerinos, Sang Kil Cha, Alexandre Rebert, Edward J Schwartz, Maverick Woo, and David Brumley. 2014. Automatic Exploit Generation. Commun. ACM 57, 2 (2014), 74--84. Google ScholarDigital Library
Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coverage- Based Greybox Fuzzing as Markov Chain. In Proceedings of the 23rd ACM Conference on Computer and Communications Security (CCS 2016). ACM, 1032--1043. Google ScholarDigital Library
David Brumley, Pongsin Poosankam, Dawn Song, and Jiang Zheng. 2008. Automatic Patch-Based Exploit Generation is possible: Techniques and Implications. In Proceedings of the 29th IEEE Symposium on Security & Privacy (S&P 2008). IEEE, 143--157.Google ScholarDigital Library
Yan Cai and Lingwei Cao. 2015. Effective and Precise Dynamic Detection of Hidden Races for Java Programs. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, (FSE 2015). 450--461. Google ScholarDigital Library
Eugene Charniak. 1996. Tree-Bank Grammars. In Proceedings of the 10th National Conference on Artificial Intelligence (AAAI 1996). 1031--1036.Google Scholar
Kai Chen, Xueqiang Wang, Yi Chen, Peng Wang, Yeonjoon Lee, XiaoFeng Wang, Bin Ma, Aohui Wang, Yingjun Zhang, and Wei Zou. 2016. Following Devil's Footprints: Cross-Platform Analysis of Potentially Harmful Libraries on Android and iOS. In Proceedings of the 37th IEEE Symposium on Security & Privacy (S&P 2016). 357--376.Google ScholarCross Ref
Kai Chen, Yingjun Zhang, and Peng Liu. 2016. Dynamically Discovering Likely Memory Layout to Perform Accurate Fuzzing. IEEE Trans. Reliability 65, 3 (2016), 1180--1194. Google ScholarCross Ref
Alessandra Gorla, Ilaria Tavecchia, Florian Gross, and Andreas Zeller. 2014. Checking App Behavior Against App Descriptions. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, 1025--1035. Google ScholarDigital Library
Hong Hu, Zheng Leong Chua, Sendroiu Adrian, Prateek Saxena, and Zhenkai Liang. 2015. Automatic Generation of Data-Oriented Exploits. In Proceedings of the 24th USENIX Security Symposium (Security 2015). 177--192.Google ScholarDigital Library
Shih-Kun Huang, Han-Lin Lu, Wai-Meng Leong, and Huan Liu. 2013. Craxweb: Automatic Web Application Testing and Attack Generation. In Proceedings of the 7th IEEE International Conference on Software Security and Reliability (SERE 2013). IEEE, 208--217. Google ScholarDigital Library
James C King. 1976. Symbolic Execution and Program Testing. Commun. ACM 19, 7 (1976), 385--394.Google ScholarDigital Library
Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2005). 306--315. Google ScholarDigital Library
Bin Liang, Pan Bian, Yan Zhang, Wenchang Shi, Wei You, and Yan Cai. 2016. AntMiner: mining more bugs by reducing noise interference. In Proceedings of the 38th International Conference on Software Engineering (ICSE 2016). 333--344. Google ScholarDigital Library
Lannan Luo, Qiang Zeng, Chen Cao, Kai Chen, Jian Liu, Limin Liu, Neng Gao, Min Yang, Xinyu Xing, and Peng Liu. 2017. System Service Call-oriented Symbolic Execution of Android Framework with Applications to Vulnerability Discovery and Exploit Generation. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys 2017). 225--238. Google ScholarDigital Library
Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 2 (1993), 313--330.Google ScholarDigital Library
Andrew Meneely, Harshavardhan Srinivasan, Ayemi Musa, Alberto Rodriguez Tejeda, Matthew Mokary, and Brian Spates. 2013. When a Patch Goes Bad: Exploring the Properties of Vulnerability-Contributing Commits. In Proceedings of the 7th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, (ESEM 2013). IEEE, 65--74.Google ScholarCross Ref
Andrew Meneely, Alberto C Rodriguez Tejeda, Brian Spates, Shannon Trudeau, Danielle Neuberger, Katherine Whitlock, Christopher Ketant, and Kayla Davis. 2014. An Empirical Investigation of Socio-Technical Code Review Metrics and Security Vulnerabilities. In Proceedings of the 6th International Workshop on Social Software Engineering (SSE 2014). ACM, 37--44. Google ScholarDigital Library
Andrew Meneely and Oluyinka Williams. 2012. Interactive Churn Metrics: SocioTechnical Variants of Code Churn. ACM SIGSOFT Software Engineering Notes 37, 6 (2012), 1--6. Google ScholarDigital Library
Barton P Miller, Louis Fredriksen, and Bryan So. 1990. An Empirical Study of the Reliability of UNIX Utilities. Commun. ACM 33, 12 (1990), 32--44.Google ScholarDigital Library
Antonio Nappa, Richard Johnson, Leyla Bilge, Juan Caballero, and Tudor Dumitras. 2015. The Attack of the Clones: a Study of the Impact of Shared Code on Vulnerability Patching. In Proceedings of the 36th IEEE Symposium on Security & Privacy (S&P 2015). IEEE, 692--708. Google ScholarDigital Library
Stephan Neuhaus, Thomas Zimmermann, Christian Holler, and Andreas Zeller. 2007. Predicting vulnerable software components. In Proceedings of the 14th ACM conference on Computer and Communications Security (CCS 2007). ACM, 529--540. Google ScholarDigital Library
Rahul Pandita, Xusheng Xiao, Wei Yang, William Enck, and Tao Xie. 2013. WHYPER: Towards Automating Risk Assessment of Mobile Applications. In Proceedings of the 22nd USENIX Security Symposium (Security 2013). 527--542.Google Scholar
Henning Perl, Sergej Dechand, Matthew Smith, Daniel Arp, Fabian Yamaguchi, Konrad Rieck, Sascha Fahl, and Yasemin Acar. 2015. Vccfinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS 2015). ACM, 426--437. Google ScholarDigital Library
Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. 2017. VUzzer: Application-Aware Evolutionary Fuzzing. In Proceedings of the 24th Annual Network and Distributed System Security Symposium (NDSS 2017). ISOC.Google ScholarCross Ref
Edward J Schwartz, Thanassis Avgerinos, and David Brumley. 2010. All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask). In Proceedings of the 31st IEEE Symposium on Security & Privacy (S&P 2010). IEEE, 317--331.Google ScholarDigital Library
Jacek Śliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When Do Changes Induce Fixes?. In ACM SIGSOFT Software Engineering Notes, Vol. 30. ACM, 1--5.Google ScholarDigital Library
Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang, Jacopo Corbetta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2016. Driller: Augmenting Fuzzing through Selective Symbolic Execution. In Proceedings of the 23nd Annual Network and Distributed System Security Symposium (NDSS 2016). Google ScholarCross Ref
Michael Sutton, Adam Greene, and Pedram Amini. 2007. Fuzzing: Brute Force Vulnerability Discovery. Pearson Education.Google Scholar
Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. iComment: Bugs or Bad Comments?. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP 2007). ACM, 145--158. Google ScholarDigital Library
Lin Tan, Yuanyuan Zhou, and Yoann Padioleau. 2011. aComment: mining annotations from comments and code to detect interrupt related concurrency bugs. In Proceedings of the 33rd International Conference on Software Engineering (ICSE 2011). IEEE, 11--20. Google ScholarDigital Library
Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-Driven Seed Generation for Fuzzing. In Proceedings of the 38th IEEE Symposium on Security & Privacy (S&P 2017). IEEE.Google ScholarCross Ref
Xusheng Xiao, Amit Paradkar, Suresh Thummalapenta, and Tao Xie. 2012. Automated Extraction of Security Policies from Natural-Language Software Documents. In Proceedings of the 20th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE 2012). ACM, 12. Google ScholarDigital Library
Junfeng Yang, Ang Cui, Salvatore J Stolfo, and Simha Sethumadhavan. 2012. Concurrency Attacks. HotPar 12 (2012), 15.Google Scholar
Juan Zhai, Jianjun Huang, Shiqing Ma, Xiangyu Zhang, Lin Tan, Jianhua Zhao, and Feng Qin. 2016. Automatic Model Generation from Documentation for Java API Functions. In Proceedings of the 38th International Conference on Software Engineering (ICSE 2016). ACM, 380--391. Google ScholarDigital Library
Hao Zhong, Lu Zhang, Tao Xie, and Hong Mei. 2009. Inferring Resource Specifi- cations from Natural Language API Documentation. In Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering (ASE 2009). IEEE, 307--318. Google ScholarDigital Library

Index Terms

SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits
1. Security and privacy
  1. Software and application security
    1. Software security engineering

Recommendations

Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications
SP '08: Proceedings of the 2008 IEEE Symposium on Security and Privacy

The automatic patch-based exploit generation problem is: given a program P and a patched version of the program P, automatically generate an exploit for the potentially unknown vulnerability present in P but fixed in P. In this paper, we propose ...
Read More
Leopard: identifying vulnerable code for vulnerability assessment through program metrics
ICSE '19: Proceedings of the 41st International Conference on Software Engineering

Identifying potentially vulnerable locations in a code base is critical as a pre-step for effective vulnerability assessment; i.e., it can greatly help security experts put their time and effort to where it is needed most. Metric-based and pattern-based ...
Read More
Fuzzing vulnerability discovery techniques: Survey, challenges and future directions
Abstract
Fuzzing is a powerful tool for vulnerability discovery in software, with much progress being made in the field in recent years. There is limited literature available on the fuzzing vulnerability discovery approaches. Hence, in this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security
October 2017
2682 pages
ISBN:9781450349468
DOI:10.1145/3133956
General Chair:
Bhavani Thuraisingham
The University of Texas at Dallas, USA
,
Program Chairs:
David Evans
University of Virginia
,
Tal Malkin
Columbia University
,
Dongyan Xu
Purdue University
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
exploit generation
fuzzing
patch
semantics
vulnerability
Qualifiers
- research-article
Conference

Acceptance Rates
CCS '17 Paper Acceptance Rate151of836submissions,18%Overall Acceptance Rate1,261of6,999submissions,18%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 74
  Total Citations
  View Citations
- 1,650
  Total Downloads
- Downloads (Last 12 months)409
- Downloads (Last 6 weeks)58
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications

Leopard: identifying vulnerable code for vulnerability assessment through program metrics

Fuzzing vulnerability discovery techniques: Survey, challenges and future directions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications

Leopard: identifying vulnerable code for vulnerability assessment through program metrics

Fuzzing vulnerability discovery techniques: Survey, challenges and future directions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media