ABSTRACT
Attacks like call fraud and identity theft often involve sophisticated stateful attack patterns which, on top of normal communication, try to harm systems on a higher semantic level than usual attack scenarios. To detect these kind of threats via specially deployed honeypots, at least a minimal understanding of the inherent state machine of a specific service is needed to lure potential attackers and to keep a communication for a sufficiently large number of steps. To this end we propose PRISMA, a method for protocol inspection and state machine analysis, which infers a functional state machine and message format of a protocol from network traffic alone. We apply our method to three real-life network traces ranging from 10,000 up to 2 million messages of both binary and textual protocols. We show that PRISMA is capable of simulating complete and correct sessions based on the learned models. A case study on malware traffic reveals the different states of the execution, rendering PRISMA a valuable tool for malware analysis.
- R. Albright, J. Cox, D. Duling, A. Langville, and C. Meyer. Algorithms, initializations, and convergence for the nonnegative matrix factorization. Technical Report 81706, North Carolina State University, 2006.Google Scholar
- L. E. Baum and J. A. Eagon. An inequality with applications to statistical estimation for probabilistic functions of markov processes and to a model for ecology. Bulletin of the American Mathematical Society, 73(3):360--363, 1967.Google ScholarCross Ref
- M. A. Beddoe. Network Protocol Analysis using Bioinformatics Algorithms. Technical report, McAfee Inc., 2005.Google Scholar
- J. Caballero, P. Poosankam, and C. Kreibich. Dispatcher: Enabling Active Botnet Infiltration Using Automatic Protocol Reverse-Engineering. In Proceedings of the 16th ACM conference on Computer and Communications Security (CCS), 2009. Google ScholarDigital Library
- J. Caballero, H. Yin, and Z. Liang. Polyglot: Automatic Extraction of Protocol Message Format using Dynamic Binary Analysis. In Proceedings of the 14th ACM Conference on Computer and Communications Security (CSS), 2007. Google ScholarDigital Library
- P. Comparetti and G. Wondracek. Prospex: Protocol Specification Extraction. In Proceedings of the 30th IEEE Symposium on Security and Privacy, 2009. Google ScholarDigital Library
- W. Cui and J. Kannan. Discoverer: Automatic Protocol Reverse Engineering From Network Traces. In Proceedings of the 16th USENIX Security Symposium, 2007. Google ScholarDigital Library
- W. Cui, V. Paxson, N. C. Weaver, and R. H. Katz. Protocol-Independent Adaptive Replay of Application Dialog. In Proceedings of the 13th Network and Distributed System Security Symposium (NDSS), 2006.Google Scholar
- W. Cui, M. Peinado, K. Chen, and H. Wang. Tupni: Automatic Reverse Engineering of Input Formats. In Proceedings of the 15th ACM conference on Computer and Communications Security (CCS), 2008. Google ScholarDigital Library
- A. M. Fraser. Hidden Markov Models and Dynamical Systems. Society for Industrial and Applied Mathematics, 2008. Google ScholarDigital Library
- M. Heiler and C. Schnörr. Learning sparse representations by non-negative matrix factorization and sequential cone programming. Journal of Machine Learning Research, 7:1385--1407, 2006. Google ScholarDigital Library
- P. Hethmon. Extensions to FTP. RFC 3659 (Proposed Standard), Mar. 2007.Google Scholar
- P. Hethmon and R. Elz. Feature negotiation mechanism for the File Transfer Protocol. RFC 2389 (Proposed Standard), Aug. 1998. Google ScholarDigital Library
- P. Holland. Weighted ridge regression: Combining ridge and robust regression methods. Technical Report 11, National Bureau of Econ. Research, 1973.Google Scholar
- S. Holm. A simple sequentially rejective multiple test procedure. Scand. Journal of Statistics, 6:65--70, 1979.Google Scholar
- P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5:1457--1469, 2004. Google ScholarDigital Library
- G. Jacob, R. Hund, C. Kruegel, and T. Holz. Jackstraws: Picking command and control connections from bot traffic. Proceedings of the 20th USENIX Security Symposium, 2011. Google ScholarDigital Library
- I. T. Jolliffe. Principal Component Analysis. Springer, 1986.Google ScholarCross Ref
- H. Kaplan and D. Wing. The SIP identity baiting attack. Internet-draft, Internet Engineering Task Force, 2008.Google Scholar
- T. Krueger, N. Krämer, and K. Rieck. ASAP: automatic semantics-aware analysis of network payloads. Proceedings of the ECML/PKDD conference on Privacy and security issues in data mining and machine learning, 2011. Google ScholarDigital Library
- D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, 1999.Google ScholarCross Ref
- C. Leita and M. Dacier. Automatic Handling of Protocol Dependencies and Reaction to 0-Day Attacks with ScriptGen Based Honeypots. In Proceedings of the 9th international conference on Recent Advances in Intrusion Detection (RAID), 2006. Google ScholarDigital Library
- C. Leita and K. Mermoud. Scriptgen: An Automated Script Generation Tool For honeyd. In Proceedings of the 21st Annual Computer Security Applications Conference (ACSAC), 2005. Google ScholarDigital Library
- Z. Lin, X. Jiang, and D. Xu. Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution. In Proceedings of the 15th Network and Distributed System Security Symposium (NDSS), 2008.Google Scholar
- D. Mankins, D. Franklin, and A. Owen. Directory oriented FTP commands. RFC 775, Dec. 1980. Google ScholarDigital Library
- E. F. Moore. Gedanken-experiments on sequential machines. Automata Studies, 34:129--153, 1956.Google Scholar
- J. Newsome, D. Brumley, and J. Franklin. Replayer Automatic Protocol Replay by Binary Analysis. In Proceedings of the 13th ACM conference on Computer and Communications Security (CCS), 2006. Google ScholarDigital Library
- P. Paatero and U. Tapper. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(2):111--126, 1994.Google ScholarCross Ref
- R. Pang and V. Paxson. A high-level programming environment for packet trace anonymization and transformation. Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM), 2003. Google ScholarDigital Library
- J. Postel and J. Reynolds. File Transfer Protocol. RFC 959 (Standard), Oct. 1985. Updated by RFCs 2228, 2640, 2773, 3659. Google ScholarDigital Library
- K. Rieck and P. Laskov. Linear-time computation of similarity measures for sequential data. Journal of Machine Learning Research, 9:23--48, 2008. Google ScholarDigital Library
- R. Schmidt. Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation, 34(3):276--280, 1986.Google ScholarCross Ref
- R. State, O. Festor, H. Abdelnur, V. Pascual, J. Kuthan, R. Coeffic, J. Janak, and J. Floroiu. SIP digest authentication relay attack. Internet-draft, Internet Engineering Task Force, 2008.Google Scholar
- Z. Wang, X. Jiang, W. Cui, and X. Wang. ReFormat: Automatic Reverse Engineering of Encrypted Messages. In European Symposium on Research in Computer Security (ESORICS), 2009. Google ScholarDigital Library
- G. Wondracek and P. Comparetti. Automatic Network Protocol Analysis. In Proceedings of the 15th Network and Distributed System Security Symposium (NDSS), 2008.Google Scholar
Index Terms
- Learning stateful models for network honeypots
Recommendations
Heat-seeking honeypots: design and experience
WWW '11: Proceedings of the 20th international conference on World wide webMany malicious activities on the Web today make use of compromised Web servers, because these servers often have high pageranks and provide free resources. Attackers are therefore constantly searching for vulnerable servers. In this work, we aim to ...
Collecting Autonomous Spreading Malware Using High-Interaction Honeypots
Information and Communications SecurityAbstractAutonomous spreading malware in the form of worms or bots has become a severe threat in today’s Internet. Collecting the sample as early as possible is a necessary precondition for the further treatment of the spreading malware, e.g., to develop ...
Intrusion detection system using honeypots and swarm intelligence
ACAI '11: Proceedings of the International Conference on Advances in Computing and Artificial IntelligenceAs the number and size of the Network and Internet traffic increase and the need for the intrusion detection grows in step to reduce the overhead required for the intrusion detection and diagnosis, it has made public servers increasingly vulnerable to ...
Comments