Abstract
A masquerader is someone who impersonates another user and operates a computer system with privileged access. Computer security problems caused by masqueraders are serious. Although anomaly detection is considered to be the best way to detect masqueraders, due to the low probability of detection and high error rate, this method is still in the research phase. Thus far, a number of methods, such as the Support Vector Machine (SVM), the Hidden Markov Model (HMM), and the Naïve Bayes (N. Bayes) classifier technique, have been investigated in order to further improve accuracy of detection. In the present paper, a method of integrating Data Mining and Natural Language Processing, namely, the N-Gram_Square root Term Frequency-Inverse Document Frequency (N-Gram_STF-IDF), is proposed. Using the proposed method, sequences to be detected are segmented via N-Gram characteristics, and non-normal users are then detected using a STF-IDF classifier. We perform an experiment using Schonlau and Greenberg data sets and the proposed method and compare the obtained results with results obtained using various other methods.
Similar content being viewed by others
References
DTI. Information security breaches survey 2006. Technical report, DTI (Department of Trade and Industry, Britain) (2006)
Gordon, L.A., Loeb, M.P., Lucyshyn, W., Richardson, R.: CSI/FBI Computer crime and security survey 2006. Computer Security Institute publications (2006)
Yampolskiy, R.V.: Human computer interaction based intrusion detection. In: Fourth International Conference on Information Technology, 2007, ITNG’07, pp. 837–842 (2007)
Axelsson, S.: Intrusion detection systems: a survey and taxonomy. Department of Computer Engineering, Chalmers University, Tech. Rep. 1:99–15 (2000)
Murali, A., Rao, M.: A survey on intrusion detection approaches. In: First International Conference on Information and Communication Technologies, ICICT 2005, pp. 233–240 (2005)
Schonlau M., DuMouchel W., Ju W.H., Karr A.F., Theus M., Vardi Y.: Computer intrusion: detecting masquerades. Stat. Sci. 16, 58–74 (2001)
Huang S.H.S., Wu H.C.: Analysis of user command behavior and masquerade detection. J. Inf. Assur. Secur. 4, 265–273 (2009)
Liao Y., Vemuri V.R., Pasos A.: Adaptive anomaly detection with evolving connectionist systems. J. Netw. Comput. Appl. 30(1), 60–80 (2007)
Guan X., Wang W., Zhang X.: Fast intrusion detection based on a non-negative matrix factorization model. J. Netw. Comput. Appl. 32(1), 31–44 (2009)
Greenberg S.: Using unix: collected traces of 168 users. Department of Computer Science, University of Calgary. Technical Report 88(333), 45 (1988)
Sebastiani F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Maxion R.A., Townsend T.N.: Masquerade detection augmented with error analysis. IEEE Trans. Reliab. 53(1), 124–147 (2004)
Kim H.S., Cha S.D.: Empirical evaluation of SVM-based masquerade detection using UNIX commands. Comput. Secur. 24(2), 160–168 (2005)
Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: alternative data models. In: IEEE Symposium on Security and Privacy, pp. 133–145. IEEE Computer Society, USA (1999)
Oka, M., Oyama, Y., Abe, H., Kato, K.: Anomaly detection using layered networks based on eigen co-occurrence matrix. Lecture Notes in Computer Science, pp. 223–237 (2004)
Jian Z., Shirai H., Takahashi I., Kuroiwa J., Odaka T., Ogura H.: Masquerade detection by boosting decision stumps using UNIX commands. Comput. Secur. 26(4), 311–318 (2007)
Latendresse, M., Navy, U.S.: Masquerade detection via customized grammars. In: Second International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. LNCS, vol. 3548, pp. 141–159. Springer, Berlin (2005)
Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, pp. 161–175 (1994)
Jones K.S. et al.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 60, 493–502 (2004)
Salton G., Buckley C.: Term-weighting approaches in automatic text retrieval* 1. Inf. Processing Manage. 24(5), 513–523 (1988)
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proceedings of the 2003 ACM symposium on Applied computing, pp. 784–788. ACM, New York (2003)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Geng, D., Odaka, T., Kuroiwa, J. et al. An N-Gram and STF-IDF model for masquerade detection in a UNIX environment. J Comput Virol 7, 133–142 (2011). https://doi.org/10.1007/s11416-010-0143-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-010-0143-3