research-article

Statistical Deobfuscation of Android Applications

Authors:
Benjamin Bichsel

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Veselin Raychev

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Petar Tsankov

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Martin Vechev

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications SecurityOctober 2016Pages 343–355https://doi.org/10.1145/2976749.2978422

Published:24 October 2016Publication History

CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

Pages 343–355

ABSTRACT

This work presents a new approach for deobfuscating Android APKs based on probabilistic learning of large code bases (termed "Big Code"). The key idea is to learn a probabilistic model over thousands of non-obfuscated Android applications and to use this probabilistic model to deobfuscate new, unseen Android APKs. The concrete focus of the paper is on reversing layout obfuscation, a popular transformation which renames key program elements such as classes, packages, and methods, thus making it difficult to understand what the program does. Concretely, the paper: (i) phrases the layout deobfuscation problem of Android APKs as structured prediction in a probabilistic graphical model, (ii) instantiates this model with a rich set of features and constraints that capture the Android setting, ensuring both semantic equivalence and high prediction accuracy, and (iii) shows how to leverage powerful inference and learning algorithms to achieve overall precision and scalability of the probabilistic predictions.

We implemented our approach in a tool called DeGuard and used it to: (i) reverse the layout obfuscation performed by the popular ProGuard system on benign, open-source applications, (ii) predict third-party libraries imported by benign APKs (also obfuscated by ProGuard), and (iii) rename obfuscated program elements of Android malware. The experimental results indicate that DeGuard is practically effective: it recovers 79.1% of the program element names obfuscated with ProGuard, it predicts third-party libraries with accuracy of 91.3%, and it reveals string decoders and classes that handle sensitive data in Android malware.

References

Advertising SDK Can Be Hijacked for Making Phone Calls, Geo-Locating. http://www.hotforsecurity.com/blog/advertising-sdk-can-be-hijacked-for-making-phone-calls-geo-locating-7461.html http://www.hotforsecurity.com/blog/advertising-sdk-can-be-hijacked-for-making-phone-calls-geo-locating-7461.html.Google Scholar
dex2jar. https://github.com/pxb1988/dex2jar.Google Scholar
F-Droid. https://f-droid.org/.Google Scholar
Java Decompiler. http://jd.benow.ca/.Google Scholar
Nice2Predict. https://github.com/eth-srl/Nice2Predict.Google Scholar
ProGuard. http://proguard.sourceforge.net/.Google Scholar
Type Erasure.https://docs.oracle.com/javase/tutorial/java/generics/genTypes.html.Google Scholar
M. Allamanis, E. T. Barr, C. Bird, and C. Sutton. Learning natural coding conventions. In FSE, 2014. Google ScholarDigital Library
M. Allamanis, E. T. Barr, C. Bird, and C. Sutton. Suggesting accurate method and class names. In FSE, 2015. Google ScholarDigital Library
M. Allamanis, D. Tarlow, A. D. Gordon, and Y. Wei. Bimodal modelling of source code and natural language. In ICML, 2015.Google ScholarDigital Library
S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y. Le Traon,D. Octeau, and P. McDaniel. Flowdroid: Precise context, flow, field, object-sensitive andlifecycle-aware taint analysis for android apps. In PLDI, 2014. Google ScholarDigital Library
S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. Exploring the influence of identifier names on code quality: Anempirical study. In CSMR, 2010. Google ScholarDigital Library
B. Caprile and P. Tonella. Restructuring program identifier names. In ICSM, 2000. Google ScholarDigital Library
K. Chen, X. Wang, Y. Chen, P. Wang, Y. Lee, X. Wang, B. Ma, A. Wang, Y. Zhang,and W. Zou. Following devil's footprints: Cross-platform analysis of potentially harmful libraries on android and ios. In S&P, 2016.Google ScholarCross Ref
W. Enck, P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N.Sheth. Taintdroid: An information-flow tracking system for realtime privacy monitoring on smartphones. In OSDI, 2010. Google ScholarDigital Library
S. Gulwani and N. Jojic. Program verification as probabilistic inference. In POPL, 2007. Google ScholarDigital Library
T. Gvero and V. Kuncak. Synthesizing java expressions from free-form queries. In OOPSLA, 2015. Google ScholarDigital Library
S. Karaivanov, V. Raychev, and M. Vechev. Phrase-based statistical translation of programming languages. Onward!, 2014. Google ScholarDigital Library
O. Katz, R. El-Yaniv, and E. Yahav. Estimating types in binaries using predictive modeling. In POPL, 2016. Google ScholarDigital Library
D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques -Adaptive Computation and Machine Learning. The MIT Press, 2009. Google ScholarDigital Library
T. Kremenek, A. Y. Ng, and D. Engler. A factor graph model for software bug finding. In IJCAI, 2007. Google ScholarDigital Library
T. Kremenek, P. Twohey, G. Back, A. Ng, and D. Engler. From uncertainty to belief: Inferring the specification within. In OSDI, 2006. Google ScholarDigital Library
J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In ICML, 2001. Google ScholarDigital Library
B. Livshits, A. V. Nori, S. K. Rajamani, and A. Banerjee. Merlin: Specification inference for explicit information flow problems. In PLDI, 2009. Google ScholarDigital Library
D. Low. Protecting Java Code via Code Obfuscation. Crossroads, 4(3), Apr. 1998. Google ScholarDigital Library
Z. Ma, H. Wang, Y. Guo, and X. Chen. Libradar: fast and accurate detection of third-party libraries in android apps. In ICSE 2016 - Companion Volume, 2016. Google ScholarDigital Library
C. J. Maddison and D. Tarlow. Structured generative models of natural source code. In ICML, 2014.Google Scholar
D. Octeau, S. Jha, M. Dering, P. McDaniel, A. Bartel, L. Li, J. Klein, and Y. Le Traon. Combining static analysis with probabilistic models to enable market-scale android inter-component analysis. In POPL, 2016. Google ScholarDigital Library
N. D. Ratliff, J. A. Bagnell, and M. Zinkevich. (Approximate) Subgradient Methods for Structured Prediction. In AISTATS, 2007.Google Scholar
V. Raychev, P. Bielik, M. Vechev, and A. Krause. Learning programs from noisy data. In POPL, 2016. Google ScholarDigital Library
V. Raychev, M. Vechev, and A. Krause. Predicting program properties from "big code". In POPL, 2015. Google ScholarDigital Library
V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In PLDI, 2014. Google ScholarDigital Library
P. A. Relf. Tool assisted identifier naming for improved software readability: an empirical study. In ISESE, 2005.Google ScholarCross Ref
E. C. R. Shin, D. Song, and R. Moazzezi. Recognizing functions in binaries with neural networks. In USENIX Security, 2015. Google ScholarDigital Library
C. Sutton and A. McCallum. An introduction to conditional random fields. Found. Trends Mach. Learn., 4(4):267--373, Apr. 2012. Google ScholarDigital Library
A. A. Takang, P. A. Grubb, and R. D. Macredie. The effects of comments and identifier names on program comprehensibility: an experimental investigation. J. Prog. Lang., 4(3):143--167, 1996.Google Scholar
O. Tripp, S. Guarnieri, M. Pistoia, and A. Aravkin. Aletheia: Improving the usability of static security analysis. In CCS, 2014. Google ScholarDigital Library
R. Vallée-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, and V. Sundaresan. Soot - a Java Bytecode Optimization Framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research. IBM Press, 1999. Google ScholarDigital Library
R. Yu. Ginmaster: A case study in android malware.https://www.sophos.com/en-us/medialibrary/PDFs/technical%20papers/Yu-VB2013.pdf.Google Scholar
Y. Zhou and X. Jiang. Dissecting android malware: Characterization and evolution. In S&P, 2012. Google ScholarDigital Library

Index Terms

Statistical Deobfuscation of Android Applications
1. Security and privacy
  1. Software and application security
    1. Software reverse engineering
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Software reverse engineering

Recommendations

Detecting repackaged smartphone applications in third-party android marketplaces
CODASPY '12: Proceedings of the second ACM conference on Data and Application Security and Privacy

Recent years have witnessed incredible popularity and adoption of smartphones and mobile devices, which is accompanied by large amount and wide variety of feature-rich smartphone applications. These smartphone applications (or apps), typically organized ...
Read More
Android Applications Repackaging Detection Techniques for Smartphone Devices

The problem of malwares affecting Smartphones has been widely recognized by the researchers across the world. Majority of these malwares target Android OS. Studies have found that most of the Android malwares hide inside repackaged apps to get inside ...
Read More
Repackaging Attack on Android Banking Applications and Its Countermeasures

Although anyone can easily publish Android applications (or apps) in an app marketplace according to an open policy, decompiling the apps is also easy due to the structural characteristics of the app building process, making them very vulnerable to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security
October 2016
1924 pages
ISBN:9781450341394
DOI:10.1145/2976749
General Chairs:
Edgar Weippl
SBA Research, Austria
,
Stefan Katzenbeisser
TU Darmstadt, CYSEC, Germany
,
Program Chairs:
Christopher Kruegel
University of California, Santa Barbara, USA
,
Andrew Myers
Cornell University, USA
,
Shai Halevi
IBM Research, USA
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
malware inspection
program deobfuscation
reverse engineering
Qualifiers
- research-article
Conference

Acceptance Rates
CCS '16 Paper Acceptance Rate137of831submissions,16%Overall Acceptance Rate1,261of6,999submissions,18%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 71
  Total Citations
  View Citations
- 1,390
  Total Downloads
- Downloads (Last 12 months)87
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Statistical Deobfuscation of Android Applications

CCS '16: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

Detecting repackaged smartphone applications in third-party android marketplaces

Android Applications Repackaging Detection Techniques for Smartphone Devices

Repackaging Attack on Android Banking Applications and Its Countermeasures