Skip to main content
Log in

An N-Gram and STF-IDF model for masquerade detection in a UNIX environment

  • Original Paper
  • Published:
Journal in Computer Virology Aims and scope Submit manuscript

Abstract

A masquerader is someone who impersonates another user and operates a computer system with privileged access. Computer security problems caused by masqueraders are serious. Although anomaly detection is considered to be the best way to detect masqueraders, due to the low probability of detection and high error rate, this method is still in the research phase. Thus far, a number of methods, such as the Support Vector Machine (SVM), the Hidden Markov Model (HMM), and the Naïve Bayes (N. Bayes) classifier technique, have been investigated in order to further improve accuracy of detection. In the present paper, a method of integrating Data Mining and Natural Language Processing, namely, the N-Gram_Square root Term Frequency-Inverse Document Frequency (N-Gram_STF-IDF), is proposed. Using the proposed method, sequences to be detected are segmented via N-Gram characteristics, and non-normal users are then detected using a STF-IDF classifier. We perform an experiment using Schonlau and Greenberg data sets and the proposed method and compare the obtained results with results obtained using various other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. DTI. Information security breaches survey 2006. Technical report, DTI (Department of Trade and Industry, Britain) (2006)

  2. Gordon, L.A., Loeb, M.P., Lucyshyn, W., Richardson, R.: CSI/FBI Computer crime and security survey 2006. Computer Security Institute publications (2006)

  3. Yampolskiy, R.V.: Human computer interaction based intrusion detection. In: Fourth International Conference on Information Technology, 2007, ITNG’07, pp. 837–842 (2007)

  4. Axelsson, S.: Intrusion detection systems: a survey and taxonomy. Department of Computer Engineering, Chalmers University, Tech. Rep. 1:99–15 (2000)

  5. Murali, A., Rao, M.: A survey on intrusion detection approaches. In: First International Conference on Information and Communication Technologies, ICICT 2005, pp. 233–240 (2005)

  6. Schonlau M., DuMouchel W., Ju W.H., Karr A.F., Theus M., Vardi Y.: Computer intrusion: detecting masquerades. Stat. Sci. 16, 58–74 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  7. Huang S.H.S., Wu H.C.: Analysis of user command behavior and masquerade detection. J. Inf. Assur. Secur. 4, 265–273 (2009)

    Google Scholar 

  8. Liao Y., Vemuri V.R., Pasos A.: Adaptive anomaly detection with evolving connectionist systems. J. Netw. Comput. Appl. 30(1), 60–80 (2007)

    Article  Google Scholar 

  9. Guan X., Wang W., Zhang X.: Fast intrusion detection based on a non-negative matrix factorization model. J. Netw. Comput. Appl. 32(1), 31–44 (2009)

    Article  Google Scholar 

  10. Greenberg S.: Using unix: collected traces of 168 users. Department of Computer Science, University of Calgary. Technical Report 88(333), 45 (1988)

    Google Scholar 

  11. Sebastiani F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)

    Article  Google Scholar 

  12. Maxion R.A., Townsend T.N.: Masquerade detection augmented with error analysis. IEEE Trans. Reliab. 53(1), 124–147 (2004)

    Article  Google Scholar 

  13. Kim H.S., Cha S.D.: Empirical evaluation of SVM-based masquerade detection using UNIX commands. Comput. Secur. 24(2), 160–168 (2005)

    Article  Google Scholar 

  14. Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: alternative data models. In: IEEE Symposium on Security and Privacy, pp. 133–145. IEEE Computer Society, USA (1999)

  15. Oka, M., Oyama, Y., Abe, H., Kato, K.: Anomaly detection using layered networks based on eigen co-occurrence matrix. Lecture Notes in Computer Science, pp. 223–237 (2004)

  16. Jian Z., Shirai H., Takahashi I., Kuroiwa J., Odaka T., Ogura H.: Masquerade detection by boosting decision stumps using UNIX commands. Comput. Secur. 26(4), 311–318 (2007)

    Article  Google Scholar 

  17. Latendresse, M., Navy, U.S.: Masquerade detection via customized grammars. In: Second International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. LNCS, vol. 3548, pp. 141–159. Springer, Berlin (2005)

  18. Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, pp. 161–175 (1994)

  19. Jones K.S. et al.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 60, 493–502 (2004)

    Article  Google Scholar 

  20. Salton G., Buckley C.: Term-weighting approaches in automatic text retrieval* 1. Inf. Processing Manage. 24(5), 513–523 (1988)

    Article  Google Scholar 

  21. Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Proceedings of the 2003 ACM symposium on Applied computing, pp. 784–788. ACM, New York (2003)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dai Geng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Geng, D., Odaka, T., Kuroiwa, J. et al. An N-Gram and STF-IDF model for masquerade detection in a UNIX environment. J Comput Virol 7, 133–142 (2011). https://doi.org/10.1007/s11416-010-0143-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11416-010-0143-3

Keywords

Navigation