Abstract
We address the problem of learning a detector of malicious behavior in network traffic. The malicious behavior is detected based on the analysis of network proxy logs that capture malware communication between client and server computers. The conceptual problem in using the standard supervised learning methods is the lack of sufficiently representative training set containing examples of malicious and legitimate communication. Annotation of individual proxy logs is an expensive process involving security experts and does not scale with constantly evolving malware. However, weak supervision can be achieved on the level of properly defined bags of proxy logs by leveraging internet domain black lists, security reports, and sandboxing analysis. We demonstrate that an accurate detector can be obtained from the collected security intelligence data by using a Multiple Instance Learning algorithm tailored to the Neyman-Pearson problem. We provide a thorough experimental evaluation on a large corpus of network communications collected from various company network environments.
Chapter PDF
Similar content being viewed by others
References
Cisco 2014 annual security report. http://www.cisco.com/web/offers/lp/2014-annual-security-report/index.html
List of 1 million top web sites. http://www.alexa.com
VirusTotal service. https://www.virustotal.com
Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of the Anti-phishing Working Groups 2nd Annual eCrime Researchers Summit, eCrime 2007, pp. 60–69. ACM, New York (2007)
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Proc. of Neural Information Processing Systems (2002)
Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your neighbors: Web spam detection using the web topology. In: Proceedings of SIGIR. ACM, Amsterdam, July 2007
Farnham, G., Leune, K.: Tools and standards for cyber threat intelligence projects. Technical report, SANS Institute InfoSec Reading Room, vol.10 (2013)
Franc, V., Sonnenburg, S.: Optimized cutting plane algorithm for support vector machines. In: McCallum, A., Roweis, S. (eds.) Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), pp. 320–327. ACM, New York (2008)
Gu, G., Zhang, J., Lee, W.: BotSniffer: detecting botnet command and control channels in network traffic. In: Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS 2008), February 2008
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Identifying suspicious URLs: an application of large-scale online learning. In: Danyluk, A.P., Bottou, L., Littman, M.L. (eds.) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14–18, 2009. ACM International Conference Proceeding Series, vol. 382, pp. 681–688. ACM (2009)
Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of http-based malware and signature generation using malicious network traces. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, NSDI 2010, pp. 26–26. USENIX Association, Berkeley (2010)
Schlesinger, M.I., Hlaváč, V.: Ten Lectures on Statistical and Structural Pattern Recognition. Kluwer Academic Publishers, Dordrecht (2002)
Shamir, O., Zhang, T.: Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In: Proc. of International Conference on Machine Learning (2012)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer Verlag (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Franc, V., Sofka, M., Bartos, K. (2015). Learning Detector of Malicious Network Traffic from Weak Labels. In: Bifet, A., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9286. Springer, Cham. https://doi.org/10.1007/978-3-319-23461-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-23461-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23460-1
Online ISBN: 978-3-319-23461-8
eBook Packages: Computer ScienceComputer Science (R0)