Analysis of Attack Detection on Log Access Servers Using Machine Learning Classification: Integrating Expert Labeling and Optimal Model Selection

Mohammad Ridwan(1), Irwan Sembiring(2), Adi Setiawan(3), Iwan Setyawan(4),


(1) Faculty of Engineering, Universitas Islam Syekh Yusuf Tangerang, Indonesia
(2) Faculty of Science and Mathematics, Universitas Kristen Satya Wacana Salatiga, Indonesia
(3) Faculty of Science and Mathematics, Universitas Kristen Satya Wacana Salatiga, Indonesia
(4) Faculty of Science and Mathematics, Universitas Kristen Satya Wacana Salatiga, Indonesia

Abstract

Purpose: As the complexity and diversity of cyberattacks continue to grow, traditional security measures fall short in effectively countering these threats within web-based environments. Therefore, there is an urgent need to develop and implement innovative, advanced techniques tailored specifically to detect and address these evolving security risks within web applications.

Methods: This research focuses on analyzing attack detection in log access servers using machine learning classification with two primary approaches: expert labeling integration and best model selection. Expert labeling determines whether log entries are safe or indicate an attack.

Result: Validation in labeling was applied using different datasets to minimize errors and increase confidence in the resulting dataset. Experimental results show that the Decision Tree and Random Forest models have nearly identical accuracy rates, around 89.3%-89.4%, while the ANN model has an accuracy of 81%.

Novelty: This study proposes a fusion of expert knowledge in labeling log entries with a rigorous process of selecting the best classification model. This integration has not been extensively explored in previous research, offering a novel approach to enhancing attack detection within web applications. The research contribution lies in the integration of expert security assessment and the selection of the best model for detecting attacks on server access logs, along with validating labels using various datasets from different log devices to enhance confidence in the analysis results.

Keywords

Intrusion detection; Log server; Machine learning; Expert labeling; Best model selection

Full Text:

PDF

References

C. T. Yang, Y. W. Chan, J. C. Liu, E. Kristiani, and C. H. Lai, “Cyberattacks detection and analysis in a network log system using XGBoost with ELK stack,” Soft Computing, vol. 26, no. 11, pp. 5143–5157, Jun. 2022, doi: 10.1007/S00500-022-06954-8/METRICS.

M. Landauer, M. Frank, F. Skopik, W. Hotwagner, M. Wurzenberger, and A. Rauber, “A Framework for Automatic Labeling of Log Datasets from Model-driven Testbeds for HIDS Evaluation,” SaT-CPS 2022 - Proceedings of the 2022 ACM Workshop on Secure and Trustworthy Cyber-Physical Systems, pp. 77–86, Apr. 2022, doi: 10.1145/3510547.3517924.

N. Zagorodna, M. Stadnyk, B. Lypa, M. Gavrylov, and R. Kozak, “Network Attack Detection Using Machine Learning Methods,” Challenges to national defence in contemporary geopolitical situation, vol. 2022, no. 1, pp. 55–61, Nov. 2022, doi: 10.47459/CNDCGS.2022.7.

I. Riadi, R. Umar, and A. Sugandi, “Web Forensic on Container Services Using Grr Rapid Response Framework,” Scientific Journal of Informatics, vol. 7, no. 1, pp. 33–42, Jun. 2020, doi: 10.15294/SJI.V7I1.18299.

A. Thakkar and R. Lohiya, “Attack classification using feature selection techniques: a comparative study,” Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 1, pp. 1249–1266, Jan. 2021, doi: 10.1007/S12652-020-02167-9/METRICS.

G. Xu et al., “Real-Time Diagnosis of Configuration Errors for Software of AI Server Infrastructure,” IEEE Transactions on Dependable and Secure Computing, no. 01, pp. 1–12, Apr. 2023, doi: 10.1109/TDSC.2023.3266007.

S. Sulaimany and A. Mafakheri, “Visibility graph analysis of web server log files,” Physica A: Statistical Mechanics and its Applications, vol. 611, p. 128448, Feb. 2023, doi: 10.1016/J.PHYSA.2023.128448.

W. Wagito and L. Dison, “ANALISIS DATA AKSES SITUS BERDASAR TEKNOLOGI LOG SERVER,” Technologia : Jurnal Ilmiah, vol. 13, no. 1, pp. 22–29, Jun. 2022, doi: 10.31602/TJI.V13I1.6113.

S. Ghareeb, M. Mahyoub, and J. Mustafina, “Analysis of Feature Selection and Phishing Website Classification Using Machine Learning,” Proceedings - International Conference on Developments in eSystems Engineering, DeSE, vol. 2023-January, pp. 178–183, 2023, doi: 10.1109/DESE58274.2023.10099697.

A. Truong, S. R. Etesami, and N. Kiyavash, “Selective Labeling in Learning with Expert Advice,” Proceedings of the American Control Conference, vol. 2021-May, pp. 2537–2542, May 2021, doi: 10.23919/ACC50511.2021.9482705.

Q. Jia, G. Jin, Y. Li, X. Tang, S. Xu, and H. Ye, “Labeling Expert: A New Multi-Network Anomaly Detection Architecture Based on LNN-RLSTM,” Applied Sciences 2023, Vol. 13, Page 581, vol. 13, no. 1, p. 581, Dec. 2022, doi: 10.3390/APP13010581.

S. Lad, “Application and Data Security Patterns,” Azure Security For Critical Workloads, pp. 111–133, 2023, doi: 10.1007/978-1-4842-8936-5_5.

B. Erşahin and M. Erşahin, “Web application security,” South Florida Journal of Development, vol. 3, no. 4, pp. 4194–4203, Jul. 2022, doi: 10.46932/sfjdv3n4-002.

W. Wang, F. Dumont, N. Niu, and G. Horton, “Detecting Software Security Vulnerabilities Via Requirements Dependency Analysis,” IEEE Transactions on Software Engineering, vol. 48, no. 05, pp. 1665–1675, May 2022, doi: 10.1109/TSE.2020.3030745.

H. Dam, T. Tran, T. Pham, S. Ng, J. Grundy, and A. Ghose, “Automatic Feature Learning for Predicting Vulnerable Software Components,” IEEE Transactions on Software Engineering, vol. 47, no. 01, pp. 67–85, Jan. 2021, doi: 10.1109/TSE.2018.2881961.

B. Reddy Bhimireddy, A. Nimmagadda, H. Kurapati, L. Reddy Gogula, R. Rani Chintala, and V. Chandra Jadala, “Web Security and Web Application Security: Attacks and Prevention,” 2023 9th International Conference on Advanced Computing and Communication Systems, ICACCS 2023, pp. 2095–2099, 2023, doi: 10.1109/ICACCS57279.2023.10112741.

C. Mohan and D. Dath, “Automatic Attack Detection with Machine Learning and Secure Log for Cloud Forensics,” ICISTSD 2022 - 3rd International Conference on Innovations in Science and Technology for Sustainable Development, pp. 248–252, 2022, doi: 10.1109/ICISTSD55159.2022.10010556.

S. Saleem, M. Sheeraz, M. Hanif, and U. Farooq, “Web Server Attack Detection using Machine Learning,” 1st Annual International Conference on Cyber Warfare and Security, ICCWS 2020 - Proceedings, Oct. 2020, doi: 10.1109/ICCWS48432.2020.9292393.

S. Tarannum, S. M. M. Hossain, and T. Sayeed, “Cyber Security Issues: Web Attack Investigation,” Lecture Notes in Networks and Systems, vol. 647 LNNS, pp. 1254–1269, 2023, doi: 10.1007/978-3-031-27409-1_115/COVER.

A. T. Tran, T. D. Luong, X. S. Pham, and T. L. Tran, “Deep Models with Differential Privacy for Distributed Web Attack Detection,” Proceedings - International Conference on Knowledge and Systems Engineering, KSE, vol. 2022-October, 2022, doi: 10.1109/KSE56063.2022.9953788.

O. G. Awuor, “Assessment of existing cyber-attack detection models for web-based systems,” https://gjeta.com/sites/default/files/GJETA-2023-0075.pdf, vol. 15, no. 1, pp. 070–089, Apr. 2023, doi: 10.30574/GJETA.2023.15.1.0075.

J. I. Christy Eunaicy and S. Suguna, “Web attack detection using deep learning models,” Materials Today: Proceedings, vol. 62, pp. 4806–4813, Jan. 2022, doi: 10.1016/J.MATPR.2022.03.348.

C. D. D. Rupprecht, L. Fujiyoshi, S. R. McGreevy, and I. Tayasu, “Trust me? Consumer trust in expert information on food product labels,” Food and Chemical Toxicology, vol. 137, p. 111170, Mar. 2020, doi: 10.1016/J.FCT.2020.111170.

N. Fitzgerald, O. Täckström, K. Ganchev, and D. Das, “Semantic Role Labeling with Neural Network Factors,” Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing, pp. 960–970, 2015, doi: 10.18653/V1/D15-1112.

M. Muller et al., “Designing Ground Truth and the Social Life of Labels,” pp. 1–16, May 2021, doi: 10.1145/3411764.3445402.

P. Donmez, J. G. Carbonell, and J. Schneider, “Efficiently learning the accuracy of labeling sources for selective sampling,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 259–267, 2009, doi: 10.1145/1557019.1557053.

A. K. Silivery, K. R. M. Rao, and L. K. Suresh Kumar, “An Effective Deep Learning Based Multi-Class Classification of DoS and DDoS Attack Detection,” International Journal of Electrical and Computer Engineering Systems, vol. 14, no. 4, pp. 421–431, Apr. 2023, doi: 10.32985/IJECES.14.4.6.

F. Nazarudeen and S. Sundar, “Efficient DDoS Attack Detection using Machine Learning Techniques,” 2022 IEEE International Power and Renewable Energy Conference, IPRECON 2022, 2022, doi: 10.1109/IPRECON55716.2022.10059561.

S. Santhosh, M. Sambath, and J. Thangakumar, “Detection of DDOS Attack using Machine Learning Models,” Proceedings of the 1st IEEE International Conference on Networking and Communications 2023, ICNWC 2023, 2023, doi: 10.1109/ICNWC57852.2023.10127537.

R. L. Alaoui and E. H. Nfaoui, “Web attacks detection using stacked generalization ensemble for LSTMs and word embedding,” Procedia Computer Science, vol. 215, pp. 687–696, Jan. 2022, doi: 10.1016/J.PROCS.2022.12.070.

T. S. Riera, J. R. B. Higuera, J. B. Higuera, J. J. M. Herraiz, and J. A. S. Montalvo, “A new multi-label dataset for Web attacks CAPEC classification using machine learning techniques,” Computers & Security, vol. 120, p. 102788, Sep. 2022, doi: 10.1016/J.COSE.2022.102788.

S. Qiu, H. Huang, W. Jiang, F. Zhang, and W. Zhou, “Defect Prediction via Tree-Based Encoding with Hybrid Granularity for Software Sustainability,” IEEE Transactions on Sustainable Computing, no. 01, pp. 1–12, Feb. 2023, doi: 10.1109/TSUSC.2023.3248965.

P. A. Bayupati, A. A. A. S. Dewi, and N. K. A. Wirdiani, “Perbandingan Metode Artificial Neural Network dan Artificial Neural Network untuk Memprediksi Jumlah Distribusi Air di PDAM Kota Denpasar,” JST (Jurnal Sains dan Teknologi), vol. 12, no. 2, Oct. 2023, doi: 10.23887/JSTUNDIKSHA.V12I2.47800.

G. Ibarra-Vazquez, M. Soledad Ramírez-Montoya, H. Terashima, and H. Terashima terashima, “Education and Information Technologies Gender prediction based on University students’ complex thinking competency: An analysis from machine learning approaches”, doi: 10.1007/s10639-023-11831-4.

R. Meenal, P. A. Michael, D. Pamela, and E. Rajasekaran, “Weather prediction using random forest machine learning model,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 22, no. 2, pp. 1208–1215, May 2021, doi: 10.11591/IJEECS.V22.I2.PP1208-1215.

L. Alfaris, R. C. Siagian, A. C. Muhammad, U. I. Nyuswantoro, N. Laeiq, and F. D. Mobo, “Classification of Spiral and Non-Spiral Galaxies using Decision Tree Analysis and Random Forest Model: A Study on the Zoo Galaxy Dataset,” Scientific Journal of Informatics, vol. 10, no. 2, pp. 139–150, May 2023, doi: 10.15294/SJI.V10I2.44027.

D. Supriyadi et al., “Klasifikasi Loyalitas Pengguna Sistem E-Learning Menggunakan Net Promoter Score dan Machine Learning,” JEPIN (Jurnal Edukasi dan Penelitian Informatika), vol. 8, no. 1, pp. 38–43, Apr. 2022, doi: 10.26418/JP.V8I1.49300.

A. M. Alfatah, R. Arifudin, and M. A. Muslim, “Implementation of Decision Tree and Dempster Shafer on Expert System for Lung Disease Diagnosis,” Scientific Journal of Informatics, vol. 5, no. 1, p. 57, May 2018, doi: 10.15294/SJI.V5I1.13440.

Refbacks

  • There are currently no refbacks.




Scientific Journal of Informatics (SJI)
p-ISSN 2407-7658 | e-ISSN 2460-0040
Published By Department of Computer Science Universitas Negeri Semarang
Website: https://journal.unnes.ac.id/nju/index.php/sji
Email: [email protected]

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.