Learning Detector of Malicious Network Traffic from Weak Labels

Franc, Vojtech; Sofka, Michal; Bartos, Karel

doi:10.1007/978-3-319-23461-8_6

Vojtech Franc^12,13,
Michal Sofka¹² &
Karel Bartos¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9286))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3590 Accesses
10 Citations

Abstract

We address the problem of learning a detector of malicious behavior in network traffic. The malicious behavior is detected based on the analysis of network proxy logs that capture malware communication between client and server computers. The conceptual problem in using the standard supervised learning methods is the lack of sufficiently representative training set containing examples of malicious and legitimate communication. Annotation of individual proxy logs is an expensive process involving security experts and does not scale with constantly evolving malware. However, weak supervision can be achieved on the level of properly defined bags of proxy logs by leveraging internet domain black lists, security reports, and sandboxing analysis. We demonstrate that an accurate detector can be obtained from the collected security intelligence data by using a Multiple Instance Learning algorithm tailored to the Neyman-Pearson problem. We provide a thorough experimental evaluation on a large corpus of network communications collected from various company network environments.

Download to read the full chapter text

Chapter PDF

Sharpshooting Most Beneficial Part of AUC for Detecting Malicious Logs

Unsupervised Detection of APT C&C Channels using Web Request Graphs

Reviewing Traffic Classification

Keywords

References

Cisco 2014 annual security report. http://www.cisco.com/web/offers/lp/2014-annual-security-report/index.html
List of 1 million top web sites. http://www.alexa.com
VirusTotal service. https://www.virustotal.com
Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of the Anti-phishing Working Groups 2nd Annual eCrime Researchers Summit, eCrime 2007, pp. 60–69. ACM, New York (2007)
Google Scholar
Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Proc. of Neural Information Processing Systems (2002)
Google Scholar
Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your neighbors: Web spam detection using the web topology. In: Proceedings of SIGIR. ACM, Amsterdam, July 2007
Google Scholar
Farnham, G., Leune, K.: Tools and standards for cyber threat intelligence projects. Technical report, SANS Institute InfoSec Reading Room, vol.10 (2013)
Google Scholar
Franc, V., Sonnenburg, S.: Optimized cutting plane algorithm for support vector machines. In: McCallum, A., Roweis, S. (eds.) Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), pp. 320–327. ACM, New York (2008)
Google Scholar
Gu, G., Zhang, J., Lee, W.: BotSniffer: detecting botnet command and control channels in network traffic. In: Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS 2008), February 2008
Google Scholar
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Identifying suspicious URLs: an application of large-scale online learning. In: Danyluk, A.P., Bottou, L., Littman, M.L. (eds.) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14–18, 2009. ACM International Conference Proceeding Series, vol. 382, pp. 681–688. ACM (2009)
Google Scholar
Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of http-based malware and signature generation using malicious network traces. In: Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, NSDI 2010, pp. 26–26. USENIX Association, Berkeley (2010)
Google Scholar
Schlesinger, M.I., Hlaváč, V.: Ten Lectures on Statistical and Structural Pattern Recognition. Kluwer Academic Publishers, Dordrecht (2002)
Google Scholar
Shamir, O., Zhang, T.: Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In: Proc. of International Conference on Machine Learning (2012)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer Verlag (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Cisco Systems, Prague, Czech Republic
Vojtech Franc, Michal Sofka & Karel Bartos
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Prague, Czech Republic
Vojtech Franc

Authors

Vojtech Franc
View author publications
You can also search for this author in PubMed Google Scholar
Michal Sofka
View author publications
You can also search for this author in PubMed Google Scholar
Karel Bartos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vojtech Franc .

Editor information

Editors and Affiliations

Huawei Noah’s Ark Lab, Shatin, Hong Kong
Albert Bifet
Siemens AG Corporate Technology, München, Germany
Michael May
IBM Research Brazil, Rio de Janeiro, Brazil
Bianca Zadrozny
Universitat Politècnica de Catalunya, Barcelona, Spain
Ricard Gavalda
Università di Pisa, Pisa, Italy
Dino Pedreschi
Eurecat / Yahoo Labs, Barcelona, Spain
Francesco Bonchi
University of Porto - INESC TEC, Porto, Portugal
Jaime Cardoso
Otto-von-Guericke University, Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Franc, V., Sofka, M., Bartos, K. (2015). Learning Detector of Malicious Network Traffic from Weak Labels. In: Bifet, A., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9286. Springer, Cham. https://doi.org/10.1007/978-3-319-23461-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-23461-8_6
Published: 29 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23460-1
Online ISBN: 978-3-319-23461-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Detector of Malicious Network Traffic from Weak Labels

Abstract

Chapter PDF

Similar content being viewed by others

Sharpshooting Most Beneficial Part of AUC for Detecting Malicious Logs

Unsupervised Detection of APT C&C Channels using Web Request Graphs

Reviewing Traffic Classification

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Learning Detector of Malicious Network Traffic from Weak Labels

Abstract

Chapter PDF

Similar content being viewed by others

Sharpshooting Most Beneficial Part of AUC for Detecting Malicious Logs

Unsupervised Detection of APT C&C Channels using Web Request Graphs

Reviewing Traffic Classification

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation