research-article

A Domain is only as Good as its Buddies: Detecting Stealthy Malicious Domains via Graph Inference

Authors:
Issa M. Khalil

Qatar Computing Research Institute, Doha, Qatar

Qatar Computing Research Institute, Doha, Qatar
View Profile

,
Bei Guan

Qatar Computing Research Institute, Doha, Qatar

Qatar Computing Research Institute, Doha, Qatar
View Profile

,
Mohamed Nabeel

Qatar Computing Research Institute, Doha, Qatar

Qatar Computing Research Institute, Doha, Qatar
View Profile

,
Ting Yu

Qatar Computing Research Institute, Doha, Qatar

Qatar Computing Research Institute, Doha, Qatar
View Profile

CODASPY '18: Proceedings of the Eighth ACM Conference on Data and Application Security and PrivacyMarch 2018Pages 330–341https://doi.org/10.1145/3176258.3176329

Published:13 March 2018Publication History

CODASPY '18: Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy

Pages 330–341

ABSTRACT

Inference based techniques are one of the major approaches to analyze DNS data and detect malicious domains. The key idea of inference techniques is to first define associations between domains based on features extracted from DNS data. Then, an inference algorithm is deployed to infer potential malicious domains based on their direct/indirect associations with known malicious ones. The way associations are defined is key to the effectiveness of an inference technique. It is desirable to be both accurate (i.e., avoid falsely associating domains with no meaningful connections) and with good coverage (i.e., identify all associations between domains with meaningful connections). Due to the limited scope of information provided by DNS data, it becomes a challenge to design an association scheme that achieves both high accuracy and good coverage.

In this paper, we propose a new approach to identify domains controlled by the same entity. Our key idea is an in-depth analysis of active DNS data to accurately separate public IPs from dedicated ones, which enables us to build high-quality associations between domains. Our scheme avoids the pitfall of naive approaches that rely on weak "co-IP" relationship of domains (i.e., two domains are resolved to the same IP) that results in low detection accuracy, and, meanwhile, identifies many meaningful connections between domains that are discarded by existing state-of-the-art approaches. Our experimental results show that the proposed approach not only significantly improves the domain coverage compared to existing approaches but also achieves better detection accuracy.

Existing path-based inference algorithms are specifically designed for DNS data analysis. They are effective but computationally expensive. To further demonstrate the strength of our domain association scheme as well as improve the inference efficiency, we construct a new domain-IP graph that can work well with the generic belief propagation algorithm. Through comprehensive experiments, we show that this approach offers significant efficiency and scalability improvement with only a minor impact to detection accuracy, which suggests that such a combination could offer a good tradeoff for malicious domain detection in practice.

References

Active DNS Project. https://activednsproject.org/. Accessed: 17-04-2017.Google Scholar
AWS Public IP Ranges. https://ip-ranges.amazonaws.com/ip-ranges.json. Accessed: 17-04-2017.Google Scholar
Common Crawl. https://commoncrawl.org/. Accessed: 17-04-2017.Google Scholar
Google Public IP API. https://github.com/bcoe/gce-ips/blob/master/index.js. Accessed: 17-04-2017.Google Scholar
McAfee SiteAdvisor. http://www.siteadvisor.com/. Accessed: 10-08-2016.Google Scholar
Microsoft Azure Public IP Ranges. https://github.com/bcoe/which-cloud/blob/master/data/PublicIPs.xml. Accessed: 17-04-2017.Google Scholar
scikit-learn. http://scikit-learn.org/. Accessed: 20-04-2017.Google Scholar
Team AWS. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html/. Accessed: 17-04--2017.Google Scholar
Team AWS. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html. Accessed: 17-04--2017.Google Scholar
Team Google. https://cloud.google.com/compute/docs/ip-addresses/ephemeraladdress. Accessed: 17-04-2017.Google Scholar
Which-Cloud Tool. https://github.com/bcoe/which-cloud. Accessed: 17-04-2017.Google Scholar
WHOIS Records. https://whois.icann.org/. Accessed: 20-04-2017.Google Scholar
Alexa. Alexa Top Sites. http://aws.amazon.com/alexa-top-sites/. Accessed: 30-03--2016.Google Scholar
Hyrum S. Anderson, Jonathan Woodbridge, and Bobby Filar. DeepDGA: Adversarially-Tuned Domain Generation and Detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, pages 13--21, 2016. Google ScholarDigital Library
Manos Antonakakis, Roberto Perdisci, David Dagon, Wenke Lee, and Nick Feamster. Building a Dynamic Reputation System for DNS. In Proceedings of the 19th USENIX Conference on Security, pages 273--290, 2010. Google ScholarDigital Library
Manos Antonakakis, Roberto Perdisci, Wenke Lee, Nikolaos Vasiloglou, II, and David Dagon. Detecting malware domains at the upper dns hierarchy. In Proceedings of the 20th USENIX Conference on Security, pages 27--42. USENIX Association, 2011. Google ScholarDigital Library
Stefan Axelsson. The base-rate fallacy and the difficulty of intrusion detection. ACM Trans. Inf. Syst. Secur., 3(3):186--205, 2000. Google ScholarDigital Library
Leyla Bilge, Sevil Sen, Davide Balzarotti, Engin Kirda, and Christopher Kruegel. Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains. ACM Transactions on Information and System Security, 16(4):14:1--14:28, apr 2014. Google ScholarDigital Library
Black Hole DNS. Black hole dns list. http://www.malwaredomains.com/bhdns.html/. Accessed: 17-05-2017.Google Scholar
Farsight Security, Inc. DNS Database. https://www.dnsdb.info/. Accessed: 28-03-2016.Google Scholar
Nir Friedman, Dan Geiger, and Moises Goldszmidt. Bayesian network classifiers. Journal of Machine Learning, 29(2--3):131--163, November 1997. Google ScholarDigital Library
Kensuke Fukuda and John Heidemann. Detecting Malicious Activity with DNS Backscatter. In Proceedings of the 2015 ACM Conference on Internet Measurement Conference, pages 197--210, 2015. Google ScholarDigital Library
H. Gao, V. Yegneswaran, J. Jiang, Y. Chen, P. Porras, S. Ghosh, and H. Duan. Reexamining DNS From a Global Recursive Resolver Perspective. IEEE/ACM Transactions on Networking, 24(1):43--57, Feb 2016. Google ScholarDigital Library
Nan Jiang, Jin Cao, Yu Jin, Li Erran Li, and Zhi-Li Zhang. Identifying suspicious activities through dns failure graph analysis. In Proceedings of the The 18th IEEE International Conference on Network Protocols, pages 144--153. IEEE Computer Society, 2010. Google ScholarDigital Library
Issa M. Khalil, Ting Yu, and Bei Guan. Discovering Malicious Domains through Passive DNS Data Graph Analysis. In Proceedings of the 11th ACM Symposium on Information, Computer and Communications Security, pages 663--674, 2016. Google ScholarDigital Library
Clemens Kolbitsch, Paolo Milani Comparetti, Christopher Kruegel, Engin Kirda, Xiaoyong Zhou, and XiaoFeng Wang. Effective and efficient malware detection at the end host. In Proceedings of the 18th Conference on USENIX Security Symposium, pages 351--366. USENIX Association, 2009. Google ScholarDigital Library
Athanasios Kountouras, Panagiotis Kintis, Charles Lever, Yizheng Chen, Yacin Nadji, David Dagon, and Manos Antonakakis. Enabling Network Security Through Active DNS Datasets. In Proceedings of the 19th International Symposium on Research in Attacks, Intrusions, and Defenses, pages 188--208, 2016.Google Scholar
C. Leistner, A. Saffari, J. Santner, and H. Bischof. Semi-supervised random forests. In Proceedings of IEEE 12th International Conference on Computer Vision, pages 506--513, 2009.Google ScholarCross Ref
Suqi Liu, Ian Foster, Stefan Savage, Geoffrey M. Voelker, and Lawrence K. Saul. Who is .com?: Learning to parse whois records. In Proceedings of the 2015 Internet Measurement Conference, pages 369--380. ACM, 2015. Google ScholarDigital Library
Pratyusa Manadhata, Sandeep Yadav, Prasad Rao, and William Horne. Detecting Malicious Domains via Graph Inference. In Proceedings of the 19th European Symposium on Research in Computer Security, pages 1--18, 2014.Google ScholarDigital Library
Judea Pearl. Reverend Bayes on inference engines: A distributed hierarchical approach. In Proceedings of the National Conference on Artificial Intelligence, 1982. Google ScholarDigital Library
B. Rahbarinia, R. Perdisci, and M. Antonakakis. Segugio: Efficient Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks. In Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pages 403--414, 2015. Google ScholarDigital Library
Konrad Rieck, Philipp Trinius, Carsten Willems, and Thorsten Holz. Automatic analysis of malware behavior using machine learning. Journal of Computer Security, 19(4):639--668, December 2011. Google ScholarDigital Library
Havard Rue and Leonhard Held. Gaussian Markov Random Fields: Theory And Applications (Monographs on Statistics and Applied Probability). Chapman & Hall/CRC, 2005. Google ScholarDigital Library
M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous. Survey and taxonomy of ip address lookup algorithms. Magazine of Global Internetworking, 15(2):8--23, March 2001. Google ScholarDigital Library
Will Scott, Thomas Anderson, Tadayoshi Kohno, and Arvind Krishnamurthy. Satellite: Joint analysis of cdns and network-level interference. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference, pages 195--208. USENIX Association, 2016. Google ScholarDigital Library
Matija Stevanovic, Jens Myrup Pedersen, Alessandro D'Alconzo, and Stefan Ruehrup. A Method for Identifying Compromised Clients Based on DNS Traffic Analysis. International Journal of Information Security, 16(2):115--132, 2017. Google ScholarDigital Library
Elizabeth Stinson and John C. Mitchell. Towards Systematic Evaluation of the Evadability of Bot/Botnet Detection Methods. In Proceedings of the 2Nd Conference on USENIX Workshop on Offensive Technologies, pages 5:1--5:9, 2008. Google ScholarDigital Library
Acar Tamersoy, Kevin Roundy, and Duen Horng Chau. Guilt by association: Large scale malware detection by mining file-relation graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1524--1533. ACM, 2014. Google ScholarDigital Library
The DNS-BH project. DNS-BH -- Malware Domain Blocklist. http://www.malwaredomains.com/. Accessed: 16-05--2016.Google Scholar
Van Tong and Giang Nguyen. A Method for Detecting DGA Botnet Based on Semantic and Cluster Analysis. In Proceedings of the Seventh Symposium on Information and Communication Technology, pages 272--277, 2016. Google ScholarDigital Library
VirusTotal, Subsidiary of Google. VirusTotal -- Free Online Virus, Malware and URL Scanner. https://www.virustotal.com/. Accessed: 04-05-2016.Google Scholar
Florian Weimer. Passive DNS Replication. In FIRST Conference on Computer Security Incident, page 98, 2005.Google Scholar
Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moisés Goldszmidt, and Ted Wobber. How dynamic are ip addresses? In The Proceedings of the Special Interest Group on Data Communication (SIGCOMM), pages 301--312. ACM, 2007. Google ScholarDigital Library
Jonathan S Yedidia, William T. Freeman, and Yair Weiss. Generalized belief propagation. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Proceedings of the Advances in Neural Information Processing Systems, pages 689--695. MIT Press, 2001. Google ScholarDigital Library
Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. Exploring artificial intelligence in the new millennium. chapter Understanding Belief Propagation and Its Generalizations, pages 239--269. Morgan Kaufmann Publishers Inc., 2003. Google ScholarDigital Library
Zeus Tracker. Zeus domain blocklist. https://zeustracker.abuse.ch/. Accessed: 17-05-2017.Google Scholar
Jialong Zhang, Sabyasachi Saha, Guofei Gu, Sung-Ju Lee, and Marco Mellia. Systematic mining of associated server herds for malware campaign discovery. In Proceedings of the 35th IEEE International Conference on Distributed Computing Systems, pages 630--641, 2015.Google ScholarCross Ref
Futai Zou, Siyu Zhang, Weixiong Rao, and Ping Yi. Detecting Malware Based on DNS Graph Mining. International Journal of Distributed Sensor Networks, 2015, 2015. Google ScholarDigital Library

Index Terms

A Domain is only as Good as its Buddies: Detecting Stealthy Malicious Domains via Graph Inference
1. Networks
  1. Network properties
    1. Network security
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
  2. Network security

Recommendations

A Survey on Malicious Domains Detection through DNS Data Analysis

Malicious domains are one of the major resources required for adversaries to run attacks over the Internet. Due to the important role of the Domain Name System (DNS), extensive research has been conducted to identify malicious domains based on their ...
Read More
Detecting Malicious Domains via Graph Inference
Computer Security - ESORICS 2014
Abstract
Enterprises routinely collect terabytes of security relevant data, e.g., network logs and application logs, for several reasons such as cheaper storage, forensic analysis, and regulatory compliance. Analyzing these big data sets to identify ...
Read More
Inference in wireless sensor networks based on information structure optimization
LCN '12: Proceedings of the 2012 IEEE 37th Conference on Local Computer Networks (LCN 2012)

Distributed in-network inference plays a significant role in large-scale wireless sensor networks (WSNs) in applications for distributed detection and estimation. Belief propagation (BP) holds great potential for forming an essential and powerful ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODASPY '18: Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy
March 2018
401 pages
ISBN:9781450356329
DOI:10.1145/3176258
General Chairs:
Ziming Zhao
Arizona State University, USA
,
Gail-Joon Ahn
Arizona State University, USA & Samsung Research, Korea
,
Program Chairs:
Ram Krishnan
University of Texas at San Antonio, USA
,
Gabriel Ghinita
University of Massachusetts Boston, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 March 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Paper
Author Tags
belief propagation
dns data analysis
inference algorithms
malicious domains detection
Qualifiers
- research-article
Conference

Acceptance Rates
CODASPY '18 Paper Acceptance Rate23of110submissions,21%Overall Acceptance Rate149of789submissions,19%
More
Upcoming Conference
CODASPY '24

Sponsor:

sigsac

Fourteenth ACM Conference on Data and Application Security and Privacy

June 19 - 21, 2024

Porto , Portugal
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 335
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Domain is only as Good as its Buddies: Detecting Stealthy Malicious Domains via Graph Inference

CODASPY '18: Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Survey on Malicious Domains Detection through DNS Data Analysis

Detecting Malicious Domains via Graph Inference

Inference in wireless sensor networks based on information structure optimization