skip to main content
10.1145/3176258.3176329acmconferencesArticle/Chapter ViewAbstractPublication PagescodaspyConference Proceedingsconference-collections
research-article
Best Paper

A Domain is only as Good as its Buddies: Detecting Stealthy Malicious Domains via Graph Inference

Authors Info & Claims
Published:13 March 2018Publication History

ABSTRACT

Inference based techniques are one of the major approaches to analyze DNS data and detect malicious domains. The key idea of inference techniques is to first define associations between domains based on features extracted from DNS data. Then, an inference algorithm is deployed to infer potential malicious domains based on their direct/indirect associations with known malicious ones. The way associations are defined is key to the effectiveness of an inference technique. It is desirable to be both accurate (i.e., avoid falsely associating domains with no meaningful connections) and with good coverage (i.e., identify all associations between domains with meaningful connections). Due to the limited scope of information provided by DNS data, it becomes a challenge to design an association scheme that achieves both high accuracy and good coverage.

In this paper, we propose a new approach to identify domains controlled by the same entity. Our key idea is an in-depth analysis of active DNS data to accurately separate public IPs from dedicated ones, which enables us to build high-quality associations between domains. Our scheme avoids the pitfall of naive approaches that rely on weak "co-IP" relationship of domains (i.e., two domains are resolved to the same IP) that results in low detection accuracy, and, meanwhile, identifies many meaningful connections between domains that are discarded by existing state-of-the-art approaches. Our experimental results show that the proposed approach not only significantly improves the domain coverage compared to existing approaches but also achieves better detection accuracy.

Existing path-based inference algorithms are specifically designed for DNS data analysis. They are effective but computationally expensive. To further demonstrate the strength of our domain association scheme as well as improve the inference efficiency, we construct a new domain-IP graph that can work well with the generic belief propagation algorithm. Through comprehensive experiments, we show that this approach offers significant efficiency and scalability improvement with only a minor impact to detection accuracy, which suggests that such a combination could offer a good tradeoff for malicious domain detection in practice.

References

  1. Active DNS Project. https://activednsproject.org/. Accessed: 17-04-2017.Google ScholarGoogle Scholar
  2. AWS Public IP Ranges. https://ip-ranges.amazonaws.com/ip-ranges.json. Accessed: 17-04-2017.Google ScholarGoogle Scholar
  3. Common Crawl. https://commoncrawl.org/. Accessed: 17-04-2017.Google ScholarGoogle Scholar
  4. Google Public IP API. https://github.com/bcoe/gce-ips/blob/master/index.js. Accessed: 17-04-2017.Google ScholarGoogle Scholar
  5. McAfee SiteAdvisor. http://www.siteadvisor.com/. Accessed: 10-08-2016.Google ScholarGoogle Scholar
  6. Microsoft Azure Public IP Ranges. https://github.com/bcoe/which-cloud/blob/master/data/PublicIPs.xml. Accessed: 17-04-2017.Google ScholarGoogle Scholar
  7. scikit-learn. http://scikit-learn.org/. Accessed: 20-04-2017.Google ScholarGoogle Scholar
  8. Team AWS. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html/. Accessed: 17-04--2017.Google ScholarGoogle Scholar
  9. Team AWS. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html. Accessed: 17-04--2017.Google ScholarGoogle Scholar
  10. Team Google. https://cloud.google.com/compute/docs/ip-addresses/ephemeraladdress. Accessed: 17-04-2017.Google ScholarGoogle Scholar
  11. Which-Cloud Tool. https://github.com/bcoe/which-cloud. Accessed: 17-04-2017.Google ScholarGoogle Scholar
  12. WHOIS Records. https://whois.icann.org/. Accessed: 20-04-2017.Google ScholarGoogle Scholar
  13. Alexa. Alexa Top Sites. http://aws.amazon.com/alexa-top-sites/. Accessed: 30-03--2016.Google ScholarGoogle Scholar
  14. Hyrum S. Anderson, Jonathan Woodbridge, and Bobby Filar. DeepDGA: Adversarially-Tuned Domain Generation and Detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, pages 13--21, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Manos Antonakakis, Roberto Perdisci, David Dagon, Wenke Lee, and Nick Feamster. Building a Dynamic Reputation System for DNS. In Proceedings of the 19th USENIX Conference on Security, pages 273--290, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Manos Antonakakis, Roberto Perdisci, Wenke Lee, Nikolaos Vasiloglou, II, and David Dagon. Detecting malware domains at the upper dns hierarchy. In Proceedings of the 20th USENIX Conference on Security, pages 27--42. USENIX Association, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Stefan Axelsson. The base-rate fallacy and the difficulty of intrusion detection. ACM Trans. Inf. Syst. Secur., 3(3):186--205, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Leyla Bilge, Sevil Sen, Davide Balzarotti, Engin Kirda, and Christopher Kruegel. Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains. ACM Transactions on Information and System Security, 16(4):14:1--14:28, apr 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Black Hole DNS. Black hole dns list. http://www.malwaredomains.com/bhdns.html/. Accessed: 17-05-2017.Google ScholarGoogle Scholar
  20. Farsight Security, Inc. DNS Database. https://www.dnsdb.info/. Accessed: 28-03-2016.Google ScholarGoogle Scholar
  21. Nir Friedman, Dan Geiger, and Moises Goldszmidt. Bayesian network classifiers. Journal of Machine Learning, 29(2--3):131--163, November 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kensuke Fukuda and John Heidemann. Detecting Malicious Activity with DNS Backscatter. In Proceedings of the 2015 ACM Conference on Internet Measurement Conference, pages 197--210, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Gao, V. Yegneswaran, J. Jiang, Y. Chen, P. Porras, S. Ghosh, and H. Duan. Reexamining DNS From a Global Recursive Resolver Perspective. IEEE/ACM Transactions on Networking, 24(1):43--57, Feb 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nan Jiang, Jin Cao, Yu Jin, Li Erran Li, and Zhi-Li Zhang. Identifying suspicious activities through dns failure graph analysis. In Proceedings of the The 18th IEEE International Conference on Network Protocols, pages 144--153. IEEE Computer Society, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Issa M. Khalil, Ting Yu, and Bei Guan. Discovering Malicious Domains through Passive DNS Data Graph Analysis. In Proceedings of the 11th ACM Symposium on Information, Computer and Communications Security, pages 663--674, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Clemens Kolbitsch, Paolo Milani Comparetti, Christopher Kruegel, Engin Kirda, Xiaoyong Zhou, and XiaoFeng Wang. Effective and efficient malware detection at the end host. In Proceedings of the 18th Conference on USENIX Security Symposium, pages 351--366. USENIX Association, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Athanasios Kountouras, Panagiotis Kintis, Charles Lever, Yizheng Chen, Yacin Nadji, David Dagon, and Manos Antonakakis. Enabling Network Security Through Active DNS Datasets. In Proceedings of the 19th International Symposium on Research in Attacks, Intrusions, and Defenses, pages 188--208, 2016.Google ScholarGoogle Scholar
  28. C. Leistner, A. Saffari, J. Santner, and H. Bischof. Semi-supervised random forests. In Proceedings of IEEE 12th International Conference on Computer Vision, pages 506--513, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  29. Suqi Liu, Ian Foster, Stefan Savage, Geoffrey M. Voelker, and Lawrence K. Saul. Who is .com?: Learning to parse whois records. In Proceedings of the 2015 Internet Measurement Conference, pages 369--380. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Pratyusa Manadhata, Sandeep Yadav, Prasad Rao, and William Horne. Detecting Malicious Domains via Graph Inference. In Proceedings of the 19th European Symposium on Research in Computer Security, pages 1--18, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Judea Pearl. Reverend Bayes on inference engines: A distributed hierarchical approach. In Proceedings of the National Conference on Artificial Intelligence, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Rahbarinia, R. Perdisci, and M. Antonakakis. Segugio: Efficient Behavior-Based Tracking of Malware-Control Domains in Large ISP Networks. In Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pages 403--414, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Konrad Rieck, Philipp Trinius, Carsten Willems, and Thorsten Holz. Automatic analysis of malware behavior using machine learning. Journal of Computer Security, 19(4):639--668, December 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Havard Rue and Leonhard Held. Gaussian Markov Random Fields: Theory And Applications (Monographs on Statistics and Applied Probability). Chapman & Hall/CRC, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous. Survey and taxonomy of ip address lookup algorithms. Magazine of Global Internetworking, 15(2):8--23, March 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Will Scott, Thomas Anderson, Tadayoshi Kohno, and Arvind Krishnamurthy. Satellite: Joint analysis of cdns and network-level interference. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference, pages 195--208. USENIX Association, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Matija Stevanovic, Jens Myrup Pedersen, Alessandro D'Alconzo, and Stefan Ruehrup. A Method for Identifying Compromised Clients Based on DNS Traffic Analysis. International Journal of Information Security, 16(2):115--132, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Elizabeth Stinson and John C. Mitchell. Towards Systematic Evaluation of the Evadability of Bot/Botnet Detection Methods. In Proceedings of the 2Nd Conference on USENIX Workshop on Offensive Technologies, pages 5:1--5:9, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Acar Tamersoy, Kevin Roundy, and Duen Horng Chau. Guilt by association: Large scale malware detection by mining file-relation graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1524--1533. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. The DNS-BH project. DNS-BH -- Malware Domain Blocklist. http://www.malwaredomains.com/. Accessed: 16-05--2016.Google ScholarGoogle Scholar
  41. Van Tong and Giang Nguyen. A Method for Detecting DGA Botnet Based on Semantic and Cluster Analysis. In Proceedings of the Seventh Symposium on Information and Communication Technology, pages 272--277, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. VirusTotal, Subsidiary of Google. VirusTotal -- Free Online Virus, Malware and URL Scanner. https://www.virustotal.com/. Accessed: 04-05-2016.Google ScholarGoogle Scholar
  43. Florian Weimer. Passive DNS Replication. In FIRST Conference on Computer Security Incident, page 98, 2005.Google ScholarGoogle Scholar
  44. Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moisés Goldszmidt, and Ted Wobber. How dynamic are ip addresses? In The Proceedings of the Special Interest Group on Data Communication (SIGCOMM), pages 301--312. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jonathan S Yedidia, William T. Freeman, and Yair Weiss. Generalized belief propagation. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Proceedings of the Advances in Neural Information Processing Systems, pages 689--695. MIT Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. Exploring artificial intelligence in the new millennium. chapter Understanding Belief Propagation and Its Generalizations, pages 239--269. Morgan Kaufmann Publishers Inc., 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Zeus Tracker. Zeus domain blocklist. https://zeustracker.abuse.ch/. Accessed: 17-05-2017.Google ScholarGoogle Scholar
  48. Jialong Zhang, Sabyasachi Saha, Guofei Gu, Sung-Ju Lee, and Marco Mellia. Systematic mining of associated server herds for malware campaign discovery. In Proceedings of the 35th IEEE International Conference on Distributed Computing Systems, pages 630--641, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  49. Futai Zou, Siyu Zhang, Weixiong Rao, and Ping Yi. Detecting Malware Based on DNS Graph Mining. International Journal of Distributed Sensor Networks, 2015, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Domain is only as Good as its Buddies: Detecting Stealthy Malicious Domains via Graph Inference

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CODASPY '18: Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy
          March 2018
          401 pages
          ISBN:9781450356329
          DOI:10.1145/3176258

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 March 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          CODASPY '18 Paper Acceptance Rate23of110submissions,21%Overall Acceptance Rate149of789submissions,19%

          Upcoming Conference

          CODASPY '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader