ABSTRACT
Inference based techniques are one of the major approaches to analyze DNS data and detect malicious domains. The key idea of inference techniques is to first define associations between domains based on features extracted from DNS data. Then, an inference algorithm is deployed to infer potential malicious domains based on their direct/indirect associations with known malicious ones. The way associations are defined is key to the effectiveness of an inference technique. It is desirable to be both accurate (i.e., avoid falsely associating domains with no meaningful connections) and with good coverage (i.e., identify all associations between domains with meaningful connections). Due to the limited scope of information provided by DNS data, it becomes a challenge to design an association scheme that achieves both high accuracy and good coverage.
In this paper, we propose a new approach to identify domains controlled by the same entity. Our key idea is an in-depth analysis of active DNS data to accurately separate public IPs from dedicated ones, which enables us to build high-quality associations between domains. Our scheme avoids the pitfall of naive approaches that rely on weak "co-IP" relationship of domains (i.e., two domains are resolved to the same IP) that results in low detection accuracy, and, meanwhile, identifies many meaningful connections between domains that are discarded by existing state-of-the-art approaches. Our experimental results show that the proposed approach not only significantly improves the domain coverage compared to existing approaches but also achieves better detection accuracy.
Existing path-based inference algorithms are specifically designed for DNS data analysis. They are effective but computationally expensive. To further demonstrate the strength of our domain association scheme as well as improve the inference efficiency, we construct a new domain-IP graph that can work well with the generic belief propagation algorithm. Through comprehensive experiments, we show that this approach offers significant efficiency and scalability improvement with only a minor impact to detection accuracy, which suggests that such a combination could offer a good tradeoff for malicious domain detection in practice.
- Active DNS Project. https://activednsproject.org/. Accessed: 17-04-2017.Google Scholar
- AWS Public IP Ranges. https://ip-ranges.amazonaws.com/ip-ranges.json. Accessed: 17-04-2017.Google Scholar
- Common Crawl. https://commoncrawl.org/. Accessed: 17-04-2017.Google Scholar
- Google Public IP API. https://github.com/bcoe/gce-ips/blob/master/index.js. Accessed: 17-04-2017.Google Scholar
- McAfee SiteAdvisor. http://www.siteadvisor.com/. Accessed: 10-08-2016.Google Scholar
- Microsoft Azure Public IP Ranges. https://github.com/bcoe/which-cloud/blob/master/data/PublicIPs.xml. Accessed: 17-04-2017.Google Scholar
- scikit-learn. http://scikit-learn.org/. Accessed: 20-04-2017.Google Scholar
- Team AWS. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html/. Accessed: 17-04--2017.Google Scholar
- Team AWS. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html. Accessed: 17-04--2017.Google Scholar
- Team Google. https://cloud.google.com/compute/docs/ip-addresses/ephemeraladdress. Accessed: 17-04-2017.Google Scholar
- Which-Cloud Tool. https://github.com/bcoe/which-cloud. Accessed: 17-04-2017.Google Scholar
- WHOIS Records. https://whois.icann.org/. Accessed: 20-04-2017.Google Scholar
- Alexa. Alexa Top Sites. http://aws.amazon.com/alexa-top-sites/. Accessed: 30-03--2016.Google Scholar
- Hyrum S. Anderson, Jonathan Woodbridge, and Bobby Filar. DeepDGA: Adversarially-Tuned Domain Generation and Detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, pages 13--21, 2016. Google ScholarDigital Library
- Manos Antonakakis, Roberto Perdisci, David Dagon, Wenke Lee, and Nick Feamster. Building a Dynamic Reputation System for DNS. In Proceedings of the 19th USENIX Conference on Security, pages 273--290, 2010. Google ScholarDigital Library
- Manos Antonakakis, Roberto Perdisci, Wenke Lee, Nikolaos Vasiloglou, II, and David Dagon. Detecting malware domains at the upper dns hierarchy. In Proceedings of the 20th USENIX Conference on Security, pages 27--42. USENIX Association, 2011. Google ScholarDigital Library
- Stefan Axelsson. The base-rate fallacy and the difficulty of intrusion detection. ACM Trans. Inf. Syst. Secur., 3(3):186--205, 2000. Google ScholarDigital Library
- Leyla Bilge, Sevil Sen, Davide Balzarotti, Engin Kirda, and Christopher Kruegel. Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains. ACM Transactions on Information and System Security, 16(4):14:1--14:28, apr 2014. Google ScholarDigital Library
- Black Hole DNS. Black hole dns list. http://www.malwaredomains.com/bhdns.html/. Accessed: 17-05-2017.Google Scholar
- Farsight Security, Inc. DNS Database. https://www.dnsdb.info/. Accessed: 28-03-2016.Google Scholar
- Nir Friedman, Dan Geiger, and Moises Goldszmidt. Bayesian network classifiers. Journal of Machine Learning, 29(2--3):131--163, November 1997. Google ScholarDigital Library
- Kensuke Fukuda and John Heidemann. Detecting Malicious Activity with DNS Backscatter. In Proceedings of the 2015 ACM Conference on Internet Measurement Conference, pages 197--210, 2015. Google ScholarDigital Library
- H. Gao, V. Yegneswaran, J. Jiang, Y. Chen, P. Porras, S. Ghosh, and H. Duan. Reexamining DNS From a Global Recursive Resolver Perspective. IEEE/ACM Transactions on Networking, 24(1):43--57, Feb 2016. Google ScholarDigital Library
- Nan Jiang, Jin Cao, Yu Jin, Li Erran Li, and Zhi-Li Zhang. Identifying suspicious activities through dns failure graph analysis. In Proceedings of the The 18th IEEE International Conference on Network Protocols, pages 144--153. IEEE Computer Society, 2010. Google ScholarDigital Library
- Issa M. Khalil, Ting Yu, and Bei Guan. Discovering Malicious Domains through Passive DNS Data Graph Analysis. In Proceedings of the 11th ACM Symposium on Information, Computer and Communications Security, pages 663--674, 2016. Google ScholarDigital Library
- Clemens Kolbitsch, Paolo Milani Comparetti, Christopher Kruegel, Engin Kirda, Xiaoyong Zhou, and XiaoFeng Wang. Effective and efficient malware detection at the end host. In Proceedings of the 18th Conference on USENIX Security Symposium, pages 351--366. USENIX Association, 2009. Google ScholarDigital Library
- Athanasios Kountouras, Panagiotis Kintis, Charles Lever, Yizheng Chen, Yacin Nadji, David Dagon, and Manos Antonakakis. Enabling Network Security Through Active DNS Datasets. In Proceedings of the 19th International Symposium on Research in Attacks, Intrusions, and Defenses, pages 188--208, 2016.Google Scholar
- C. Leistner, A. Saffari, J. Santner, and H. Bischof. Semi-supervised random forests. In Proceedings of IEEE 12th International Conference on Computer Vision, pages 506--513, 2009.Google ScholarCross Ref
- Suqi Liu, Ian Foster, Stefan Savage, Geoffrey M. Voelker, and Lawrence K. Saul. Who is .com?: Learning to parse whois records. In Proceedings of the 2015 Internet Measurement Conference, pages 369--380. ACM, 2015. Google ScholarDigital Library
- Pratyusa Manadhata, Sandeep Yadav, Prasad Rao, and William Horne. Detecting Malicious Domains via Graph Inference. In Proceedings of the 19th European Symposium on Research in Computer Security, pages 1--18, 2014.Google ScholarDigital Library
- Judea Pearl. Reverend Bayes on inference engines: A distributed hierarchical approach. In Proceedings of the National Conference on Artificial Intelligence, 1982. Google ScholarDigital Library
- B. Rahbarinia, R. Perdisci, and M. Antonakakis. Segugio: Efficient Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks. In Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pages 403--414, 2015. Google ScholarDigital Library
- Konrad Rieck, Philipp Trinius, Carsten Willems, and Thorsten Holz. Automatic analysis of malware behavior using machine learning. Journal of Computer Security, 19(4):639--668, December 2011. Google ScholarDigital Library
- Havard Rue and Leonhard Held. Gaussian Markov Random Fields: Theory And Applications (Monographs on Statistics and Applied Probability). Chapman & Hall/CRC, 2005. Google ScholarDigital Library
- M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous. Survey and taxonomy of ip address lookup algorithms. Magazine of Global Internetworking, 15(2):8--23, March 2001. Google ScholarDigital Library
- Will Scott, Thomas Anderson, Tadayoshi Kohno, and Arvind Krishnamurthy. Satellite: Joint analysis of cdns and network-level interference. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference, pages 195--208. USENIX Association, 2016. Google ScholarDigital Library
- Matija Stevanovic, Jens Myrup Pedersen, Alessandro D'Alconzo, and Stefan Ruehrup. A Method for Identifying Compromised Clients Based on DNS Traffic Analysis. International Journal of Information Security, 16(2):115--132, 2017. Google ScholarDigital Library
- Elizabeth Stinson and John C. Mitchell. Towards Systematic Evaluation of the Evadability of Bot/Botnet Detection Methods. In Proceedings of the 2Nd Conference on USENIX Workshop on Offensive Technologies, pages 5:1--5:9, 2008. Google ScholarDigital Library
- Acar Tamersoy, Kevin Roundy, and Duen Horng Chau. Guilt by association: Large scale malware detection by mining file-relation graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1524--1533. ACM, 2014. Google ScholarDigital Library
- The DNS-BH project. DNS-BH -- Malware Domain Blocklist. http://www.malwaredomains.com/. Accessed: 16-05--2016.Google Scholar
- Van Tong and Giang Nguyen. A Method for Detecting DGA Botnet Based on Semantic and Cluster Analysis. In Proceedings of the Seventh Symposium on Information and Communication Technology, pages 272--277, 2016. Google ScholarDigital Library
- VirusTotal, Subsidiary of Google. VirusTotal -- Free Online Virus, Malware and URL Scanner. https://www.virustotal.com/. Accessed: 04-05-2016.Google Scholar
- Florian Weimer. Passive DNS Replication. In FIRST Conference on Computer Security Incident, page 98, 2005.Google Scholar
- Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moisés Goldszmidt, and Ted Wobber. How dynamic are ip addresses? In The Proceedings of the Special Interest Group on Data Communication (SIGCOMM), pages 301--312. ACM, 2007. Google ScholarDigital Library
- Jonathan S Yedidia, William T. Freeman, and Yair Weiss. Generalized belief propagation. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Proceedings of the Advances in Neural Information Processing Systems, pages 689--695. MIT Press, 2001. Google ScholarDigital Library
- Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. Exploring artificial intelligence in the new millennium. chapter Understanding Belief Propagation and Its Generalizations, pages 239--269. Morgan Kaufmann Publishers Inc., 2003. Google ScholarDigital Library
- Zeus Tracker. Zeus domain blocklist. https://zeustracker.abuse.ch/. Accessed: 17-05-2017.Google Scholar
- Jialong Zhang, Sabyasachi Saha, Guofei Gu, Sung-Ju Lee, and Marco Mellia. Systematic mining of associated server herds for malware campaign discovery. In Proceedings of the 35th IEEE International Conference on Distributed Computing Systems, pages 630--641, 2015.Google ScholarCross Ref
- Futai Zou, Siyu Zhang, Weixiong Rao, and Ping Yi. Detecting Malware Based on DNS Graph Mining. International Journal of Distributed Sensor Networks, 2015, 2015. Google ScholarDigital Library
Index Terms
- A Domain is only as Good as its Buddies: Detecting Stealthy Malicious Domains via Graph Inference
Recommendations
A Survey on Malicious Domains Detection through DNS Data Analysis
Malicious domains are one of the major resources required for adversaries to run attacks over the Internet. Due to the important role of the Domain Name System (DNS), extensive research has been conducted to identify malicious domains based on their ...
Detecting Malicious Domains via Graph Inference
Computer Security - ESORICS 2014AbstractEnterprises routinely collect terabytes of security relevant data, e.g., network logs and application logs, for several reasons such as cheaper storage, forensic analysis, and regulatory compliance. Analyzing these big data sets to identify ...
Inference in wireless sensor networks based on information structure optimization
LCN '12: Proceedings of the 2012 IEEE 37th Conference on Local Computer Networks (LCN 2012)Distributed in-network inference plays a significant role in large-scale wireless sensor networks (WSNs) in applications for distributed detection and estimation. Belief propagation (BP) holds great potential for forming an essential and powerful ...
Comments