skip to main content
research-article
Free Access
Just Accepted

Transfer Learning-based Forensic Analysis and Classification of E-Mail Content

Authors Info & Claims
Online AM:28 June 2023Publication History

References

  1. Ahmed Abbasi, Abdul Rehman Javed, Farkhund Iqbal, Zunera Jalil, Thippa Reddy Gadekallu, and Natalia Kryvinska. 2022. Authorship identification using ensemble learning. Scientific Reports 12, 1 (2022), 1–16.Google ScholarGoogle ScholarCross RefCross Ref
  2. Ahmed Abbasi, Abdul Rehman Rehman Javed, Amanullah Yasin, Zunera Jalil, Natalia Kryvinska, and Usman Tariq. 2022. A Large-Scale Benchmark Dataset for Anomaly Detection and Rare Event Classification for Audio Forensics. IEEE Access 10(2022), 38885–38894.Google ScholarGoogle ScholarCross RefCross Ref
  3. Nakul Agarwal, Vineeth N Balasubramanian, and CV Jawahar. 2018. Improving multiclass classification by deep networks using dagsvm and triplet loss. Pattern Recognition Letters 112 (2018), 184–190.Google ScholarGoogle ScholarCross RefCross Ref
  4. Adnan Ahmed, Abdul Rehman Javed, Zunera Jalil, Gautam Srivastava, and Thippa Reddy Gadekallu. 2021. Privacy of web browsers: a challenge in digital forensics. In International Conference on Genetic and Evolutionary Computing. Springer, 493–504.Google ScholarGoogle Scholar
  5. Naeem Ahmed, Rashid Amin, Hamza Aldabbas, Deepika Koundal, Bader Alouffi, and Tariq Shah. 2022. Machine learning techniques for spam detection in email and IoT platforms: analysis and research challenges. Security and Communication Networks 2022 (2022), 1–19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Usman Ahmed, Rutvij H Jhaveri, Gautam Srivastava, and Jerry Chun-Wei Lin. 2022. Explainable deep attention active learning for sentimental analytics of mental disorder. Transactions on Asian and Low-Resource Language Information Processing (2022).Google ScholarGoogle Scholar
  7. Waqas Ahmed, Faisal Shahzad, Abdul Rehman Javed, Farkhund Iqbal, and Liaqat Ali. 2021. Whatsapp network forensics: Discovering the ip addresses of suspects. In 2021 11th IFIP International Conference on New Technologies, Mobility and Security (NTMS). IEEE, 1–7.Google ScholarGoogle ScholarCross RefCross Ref
  8. Yusra Al Balushi, Hothefa Shaker, and Basant Kumar. 2023. The Use of Machine Learning in Digital Forensics. In 1st International Conference on Innovation in Information Technology and Business (ICIITB 2022). Atlantis Press, 96–113.Google ScholarGoogle Scholar
  9. AM Al-Zoubi and H Faris. 2018. J. f. Alqatawna, and MA Hassonah,“Evolving Support Vector Machines using Whale Optimization Algorithm for spam profiles detection on online social networks in different lingual contexts,”. Knowledge-Based Systems 153 (2018), 91–104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Abdullah Alqahtani, Habib Ullah Khan, Shtwai Alsubai, Mohemmed Sha, Ahmad Almadhor, Tayyab Iqbal, and Sidra Abbas. 2022. An efficient approach for textual data classification using deep learning. (2022).Google ScholarGoogle Scholar
  11. I Basyar, Murdiansyah DT Adiwijaya, and DT Murdiansyah. 2020. Email spam classification using gated recurrent unit and long short-term memory. Journal of Computer Science 16, 4 (2020), 559–567.Google ScholarGoogle ScholarCross RefCross Ref
  12. Gourav Bathla and Adarsh Kumar. 2021. Opinion spam detection using Deep Learning. In 2021 8th International Conference on Signal Processing and Integrated Networks (SPIN). IEEE, 1160–1164.Google ScholarGoogle ScholarCross RefCross Ref
  13. Asma Belhadi, Youcef Djenouri, Gautam Srivastava, and Jerry Chun-Wei Lin. 2023. Fast and Accurate Framework for Ontology Matching in Web of Things. ACM Transactions on Asian and Low-Resource Language Information Processing (2023).Google ScholarGoogle Scholar
  14. Diego Benito, Oscar Araque, and Carlos A. Iglesias. 2019. GSI-UPM at SemEval-2019 Task 5: Semantic Similarity and Word Embeddings for Multilingual Detection of Hate Speech Against Immigrants and Women on Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation. Association for Computational Linguistics, Minneapolis, Minnesota, USA, 396–403. https://doi.org/10.18653/v1/S19-2070Google ScholarGoogle ScholarCross RefCross Ref
  15. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146. https://doi.org/10.1162/tacl_a_00051Google ScholarGoogle ScholarCross RefCross Ref
  16. Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jie Cao and Chengzhe Lai. 2020. A bilingual multi-type spam detection model based on M-BERT. In GLOBECOM 2020-2020 IEEE Global Communications Conference. IEEE, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  18. Gurpal Singh Chhabra and Dilpreet Singh Bajwa. 2015. Review of e-mail system, security protocols and email forensics. International Journal of Computer Science & Communication Networks 5, 3(2015), 201–211.Google ScholarGoogle Scholar
  19. Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747Google ScholarGoogle ScholarCross RefCross Ref
  20. Michael Crawford and Taghi M Khoshgoftaar. 2021. Using inductive transfer learning to improve hotel review spam detection. In 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI). IEEE, 248–254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Giovanni Da San Martino, Shaden Shaar, Yifan Zhang, Seunghak Yu, Alberto Barrón-Cedeño, and Preslav Nakov. 2020. Prta: A System to Support the Analysis of Propaganda Techniques in the News. arXiv (2020), arXiv–2005.Google ScholarGoogle Scholar
  22. S Jancy Sickory Daisy and A Rijuvana Begum. 2021. Smart material to build mail spam filtering technique using Naive Bayes and MRF methodologies. Materials Today: Proceedings 47 (2021), 446–452.Google ScholarGoogle ScholarCross RefCross Ref
  23. Bilge Kagan Dedeturk and Bahriye Akay. 2020. Spam filtering using a logistic regression model trained by an artificial bee colony algorithm. Applied Soft Computing 91 (2020), 106229.Google ScholarGoogle ScholarCross RefCross Ref
  24. Ahmet Demirkaya, Jiasi Chen, and Samet Oymak. 2020. Exploring the role of loss functions in multiclass classification. In 2020 54th Annual Conference on Information Sciences and Systems (CISS). IEEE, 1–5.Google ScholarGoogle ScholarCross RefCross Ref
  25. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarGoogle Scholar
  26. M Muztaba Fuad, Debzani Deb, and M Shahriar Hossain. 2004. A trainable fuzzy spam detection system. In Proc. of the 7th Int. Conf. on Computer and Information Technology.Google ScholarGoogle Scholar
  27. Surajit Giri, Siddhartha Banerjee, Kunal Bag, and Dipanjan Maiti. 2022. Comparative Study of Content-Based Phishing Email Detection Using Global Vector (GloVe) and Bidirectional Encoder Representation from Transformer (BERT) Word Embedding Models. In 2022 First International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT). IEEE, 01–06.Google ScholarGoogle ScholarCross RefCross Ref
  28. Bahia Halawi, Azzam Mourad, Hadi Otrok, and Ernesto Damiani. 2018. Few are as good as many: An ontology-based tweet spam detection approach. IEEE Access 6(2018), 63890–63904.Google ScholarGoogle ScholarCross RefCross Ref
  29. Maryam Hina, Mohsin Ali, Abdul Rehman Javed, Fahad Ghabban, Liaqat Ali Khan, and Zunera Jalil. 2021. Sefaced: Semantic-based forensic analysis and classification of e-mail data using deep learning. IEEE Access 9(2021), 98398–98411.Google ScholarGoogle ScholarCross RefCross Ref
  30. Maryam Hina, Mohsan Ali, Abdul Rehman Javed, Gautam Srivastava, Thippa Reddy Gadekallu, and Zunera Jalil. 2021. Email Classification and Forensics Analysis using Machine Learning. In 2021 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/IOP/SCI). IEEE, 630–635.Google ScholarGoogle Scholar
  31. Maryam Hina, Mohsan Ali, Abdul Rehman Javed, Gautam Srivastava, Thippa Reddy Gadekallu, and Zunera Jalil. 2021. Email Classification and Forensics Analysis using Machine Learning. In 2021 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/IOP/SCI). IEEE, 630–635.Google ScholarGoogle Scholar
  32. Yaoshiang Ho and Samuel Wookey. 2019. The real-world-weight cross-entropy loss function: Modeling the costs of mislabeling. IEEE Access 8(2019), 4806–4813.Google ScholarGoogle ScholarCross RefCross Ref
  33. Gauri Jain, Manisha Sharma, and Basant Agarwal. 2019. Spam detection in social media using convolutional and long short term memory neural network. Annals of Mathematics and Artificial Intelligence 85, 1(2019), 21–44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Abdul Rehman Javed, Waqas Ahmed, Mamoun Alazab, Zunera Jalil, Kashif Kifayat, and Thippa Reddy Gadekallu. 2022. A comprehensive survey on computer forensics: State-of-the-art, tools, techniques, challenges, and future directions. IEEE Access (2022).Google ScholarGoogle Scholar
  35. Abdul Rehman Javed and Zunera Jalil. 2020. Byte-level object identification for forensic investigation of digital images. In 2020 International Conference on Cyber Warfare and Security (ICCWS). IEEE, 1–4.Google ScholarGoogle ScholarCross RefCross Ref
  36. Abdul Rehman Javed, Zunera Jalil, Wisha Zehra, Thippa Reddy Gadekallu, Doug Young Suh, and Md Jalil Piran. 2021. A comprehensive survey on digital video forensics: Taxonomy, challenges, and future directions. Engineering Applications of Artificial Intelligence 106 (2021), 104456.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, Valencia, Spain, 427–431. https://www.aclweb.org/anthology/E17-2068Google ScholarGoogle ScholarCross RefCross Ref
  38. Akash Junnarkar, Siddhant Adhikari, Jainam Fagania, Priya Chimurkar, and Deepak Karia. 2021. E-mail spam classification via machine learning and natural language processing. In 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV). IEEE, 693–699.Google ScholarGoogle ScholarCross RefCross Ref
  39. Sanaa Kaddoura, Ganesh Chandrasekaran, Daniela Elena Popescu, and Jude Hemanth Duraisamy. 2022. A systematic literature review on spam content detection and classification. PeerJ Computer Science 8(2022), e830.Google ScholarGoogle ScholarCross RefCross Ref
  40. Hannah Kim and Young-Seob Jeong. 2019. Sentiment classification using convolutional neural networks. Applied Sciences 9, 11 (2019), 2347.Google ScholarGoogle ScholarCross RefCross Ref
  41. Piotr Kłosowski. 2018. Deep learning for natural language processing and language modelling. In 2018 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA). IEEE, 223–228.Google ScholarGoogle Scholar
  42. Priti Kulkarni, Jatinderkumar R Saini, and Haridas Acharya. 2020. Effect of header-based features on accuracy of classifiers for spam email classification. International Journal of Advanced Computer Science and Applications 11, 3(2020).Google ScholarGoogle ScholarCross RefCross Ref
  43. Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI conference on artificial intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yuxin Liu, Li Wang, Tengfei Shi, and Jinyan Li. 2022. Detection of spam reviews through a hierarchical attention architecture with N-gram CNN and Bi-LSTM. Information Systems 103(2022), 101865.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Qin Luo, Bin Liu, Junhua Yan, and Zhongyue He. 2011. Design and implement a rule-based spam filtering system using neural network. In 2011 International Conference on Computational and Information Sciences. IEEE, 398–401.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Arun S Maiya. 2020. ktrain: A low-code library for augmented machine learning. arXiv preprint arXiv:2004.10703(2020).Google ScholarGoogle Scholar
  47. RAZA Mansoor, Nathali Dilshani Jayasinghe, and Muhana Magboul Ali Muslam. 2021. A comprehensive review on email spam classification using machine learning algorithms. In 2021 International Conference on Information Networking (ICOIN). IEEE, 327–332.Google ScholarGoogle Scholar
  48. Weilong Mo, Xiaoshu Luo, Yexiu Zhong, and Wenjie Jiang. 2019. Image recognition using convolutional neural network combined with ensemble learning algorithm. In Journal of Physics: Conference Series, Vol.  1237. IOP Publishing, 022026.Google ScholarGoogle Scholar
  49. Kamran Morovati and Sanjay S Kadam. 2019. Detection of Phishing Emails with Email Forensic Analysis and Machine Learning Techniques.International Journal of Cyber-Security and Digital Forensics 8, 2(2019), 98–108.Google ScholarGoogle Scholar
  50. Rakesh Nayak, Salim Amirali Jiwani, and B Rajitha. 2021. Spam email detection using machine learning algorithm. Materials Today: Proceedings(2021).Google ScholarGoogle ScholarCross RefCross Ref
  51. Ahmad Nsouli, Azzam Mourad, and Danielle Azar. 2018. Towards proactive social learning approach for traffic event detection based on arabic tweets. In 2018 14th International Wireless Communications & Mobile Computing Conference (IWCMC). IEEE, 1501–1506.Google ScholarGoogle ScholarCross RefCross Ref
  52. Khan Farhan Rafat, Qin Xin, Abdul Rehman Javed, Zunera Jalil, and Rana Zeeshan Ahmad. 2022. Evading obscure communication from spam emails. Mathematical Biosciences and Engineering 19, 2 (2022), 1926–1943.Google ScholarGoogle ScholarCross RefCross Ref
  53. Nadjate Saidani, Kamel Adi, and Mohand Said Allili. 2020. A semantic-based classification approach for an enhanced spam detection. Computers & Security 94(2020), 101716.Google ScholarGoogle ScholarCross RefCross Ref
  54. Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108(2019).Google ScholarGoogle Scholar
  55. Robert E. Schapire. 1999. A Brief Introduction to Boosting. In Proceedings of the 16th International Joint Conference on Artificial Intelligence - Volume 2 (IJCAI’99). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1401–1406.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Alexander Semenov, Vladimir Boginski, and Eduardo L Pasiliao. 2019. Neural Networks with Multidimensional Cross-Entropy Loss Functions. In International Conference on Computational Data and Social Networks. Springer, 57–62.Google ScholarGoogle Scholar
  57. Neha Sharma, Vibhor Jain, and Anju Mishra. 2018. An analysis of convolutional neural networks for image classification. Procedia computer science 132 (2018), 377–384.Google ScholarGoogle Scholar
  58. Vishnu Dutt Sharma, Santosh Kumar Yadav, Sumit Kumar Yadav, Kamakhya Narain Singh, and Suraj Sharma. 2021. An effective approach to protect social media account from spam mail–a machine learning approach. Materials Today: Proceedings(2021).Google ScholarGoogle ScholarCross RefCross Ref
  59. Jitendra Nath Shrivastava and Maringanti Hima Bindu. 2014. E-mail spam filtering using adaptive genetic algorithm. International Journal of Intelligent Systems and Applications 6, 2(2014), 54–60.Google ScholarGoogle ScholarCross RefCross Ref
  60. Amar Singh, Nidhi Chahal, Simranjit Singh, and Suneet Kumar Gupta. 2021. Spam Detection using ANN and ABC Algorithm. In 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE, 164–168.Google ScholarGoogle ScholarCross RefCross Ref
  61. Peng Song, Chaoyang Geng, and Zhijie Li. 2019. Research on text classification based on convolutional neural network. In 2019 International Conference on Computer Network, Electronic and Automation (ICCNEA). IEEE, 229–232.Google ScholarGoogle ScholarCross RefCross Ref
  62. Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075(2015).Google ScholarGoogle Scholar
  63. Xiaoya Tang, Tieyun Qian, and Zhenni You. 2020. Generating behavior features for cold-start spam review detection with adversarial learning. Information Sciences 526(2020), 274–288.Google ScholarGoogle ScholarCross RefCross Ref
  64. Xin Tong, Jingya Wang, Changlin Zhang, Runzheng Wang, Zhilin Ge, Wenmao Liu, and Zhiyan Zhao. 2021. A content-based chinese spam detection method using a capsule network with long-short attention. IEEE Sensors Journal 21, 22 (2021), 25409–25420.Google ScholarGoogle ScholarCross RefCross Ref
  65. Amirsina Torfi, Rouzbeh A Shirvani, Yaser Keneshloo, Nader Tavaf, and Edward A Fox. 2020. Natural language processing advancements by deep learning: A survey. arXiv preprint arXiv:2003.01200(2020).Google ScholarGoogle Scholar
  66. Marco Vanetti, Elisabetta Binaghi, Elena Ferrari, Barbara Carminati, and Moreno Carullo. 2011. A system to filter unwanted messages from OSN user walls. IEEE Transactions on Knowledge and data Engineering 25, 2(2011), 285–297.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. S Venkatraman, B Surendiran, and P Arun Raj Kumar. 2020. Spam e-mail classification for the Internet of Things environment using semantic similarity approach. The Journal of Supercomputing 76, 2 (2020), 756–776.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Sandeep Verma, Sherali Zeadally, Satnam Kaur, and Ajay Kumar Sharma. 2021. Intelligent and Secure Clustering in Wireless Sensor Network (WSN)-Based Intelligent Transportation Systems. IEEE Transactions on Intelligent Transportation Systems (2021).Google ScholarGoogle Scholar
  69. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6Google ScholarGoogle ScholarCross RefCross Ref
  70. Chih-Hung Wu. 2009. Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert systems with Applications 36, 3 (2009), 4321–4330.Google ScholarGoogle Scholar
  71. Qussai Yaseen et al. 2021. Spam email detection using deep learning techniques. Procedia Computer Science 184 (2021), 853–858.Google ScholarGoogle ScholarCross RefCross Ref
  72. Ammara Zamir, Hikmat Ullah Khan, Waqar Mehmood, Tassawar Iqbal, and Abubakker Usman Akram. 2020. A feature-centric spam email detection model using diverse supervised machine learning algorithms. The Electronic Library(2020).Google ScholarGoogle ScholarCross RefCross Ref
  73. Yangfan Zhou, Xin Wang, Mingchuan Zhang, Junlong Zhu, Ruijuan Zheng, and Qingtao Wu. 2019. MPCE: a maximum probability based cross entropy loss function for neural network classification. IEEE Access 7(2019), 146331–146341.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing Just Accepted
    ISSN:2375-4699
    EISSN:2375-4702
    Table of Contents

    Copyright © 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Online AM: 28 June 2023
    • Accepted: 5 June 2023
    • Revised: 27 April 2023
    • Received: 10 September 2022
    Published in tallip Just Accepted

    Check for updates

    Qualifiers

    • research-article
  • Article Metrics

    • Downloads (Last 12 months)624
    • Downloads (Last 6 weeks)120

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader