Abstract
Identification of the family to which a malware specimen belongs is essential in understanding the behavior of the malware and developing mitigation strategies. Solutions proposed by prior work, however, are often not practicable due to the lack of realistic evaluation factors. These factors include learning under class imbalance, the ability to identify new malware, and the cost of production-quality labeled data. In practice, deployed models face prominent, rare, and new malware families. At the same time, obtaining a large quantity of up-to-date labeled malware for training a model can be expensive. In this article, we address these problems and propose a novel hierarchical semi-supervised algorithm, which we call the HNMFk Classifier, that can be used in the early stages of the malware family labeling process. Our method is based on non-negative matrix factorization with automatic model selection, that is, with an estimation of the number of clusters. With HNMFk Classifier, we exploit the hierarchical structure of the malware data together with a semi-supervised setup, which enables us to classify malware families under conditions of extreme class imbalance. Our solution can perform abstaining predictions, or rejection option, which yields promising results in the identification of novel malware families and helps with maintaining the performance of the model when a low quantity of labeled data is used. We perform bulk classification of nearly 2,900 both rare and prominent malware families, through static analysis, using nearly 388,000 samples from the EMBER-2018 corpus. In our experiments, we surpass both supervised and semi-supervised baseline models with an F1 score of 0.80.
- [1] . 2016. Novel feature extraction, selection and fusion for effective malware family classification. In Proceedings of the 6th ACM Conference on Data and Application Security and Privacy. 183–194.Google ScholarDigital Library
- [2] . 2019. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2623–2631.Google ScholarDigital Library
- [3] 2020. Source identification by non-negative matrix factorization combined with semi-supervised clustering. US Patent S10,776,718 (2020).Google Scholar
- [4] Boian S. Alexandrov, Ludmil B. Alexandrov, Filip L. Iliev, Valentin G. Stanev, and Velimir V. Vesselinov. 2020. Source identification by non-negative matrix factorization combined with semi-supervised clustering. US Patent S10,776,718.Google Scholar
- [5] Ludmil B. Alexandrov, Jaegil Kim, Nicholas J. Haradhvala, Mi Ni Huang, Alvin Wei Tian Ng, Yang Wu, Arnoud Boot, Kyle R. Covington, Dmitry A. Gordenin, Erik N. Bergstrom, S. M. Ashiqul Islam, Nuria Lopez-Bigas, Leszek J. Klimczak, John R. McPherson, Sandro Morganella, Radhakrishnan Sabarinathan, David A. Wheeler, Ville Mustonen, Paul Boutros, Kin Chan, Akihiro Fujimoto, Gad Getz, Marat Kazanov, Michael Lawrence, Iñigo Martincorena, Hidewaki Nakagawa, Paz Polak, Stephenie Prokopec, Steven A. Roberts, Steven G. Rozen, Natalie Saini, Tatsuhiro Shibata, Yuichi Shiraishi, Michael R. Stratton, Bin Tean Teh, Ignacio Vázquez-García, Fouad Yousif, Willie Yu, Lauri A. Aaltonen, Federico Abascal, Adam Abeshouse, Hiroyuki Aburatani, David J. Adams, Nishant Agrawal, Keun Soo Ahn, Sung-Min Ahn, Hiroshi Aikata, Rehan Akbani, Kadir C. Akdemir, Hikmat Al-Ahmadie, Sultan T. Al-Sedairy, Fatima Al-Shahrour, Malik Alawi, Monique Albert, Kenneth Aldape, Adrian Ally, Kathryn Alsop, Eva G. Alvarez, Fernanda Amary, Samirkumar B. Amin, Brice Aminou, Ole Ammerpohl, Matthew J. Anderson, Yeng Ang, Davide Antonello, Pavana Anur, Samuel Aparicio, Elizabeth L. Appelbaum, Yasuhito Arai, Axel Aretz, Koji Arihiro, Shun-ichi Ariizumi, Joshua Armenia, Laurent Arnould, Sylvia Asa, Yassen Assenov, Gurnit Atwal, Sietse Aukema, J. Todd Auman, Miriam R. R. Aure, Philip Awadalla, Marta Aymerich, Gary D. Bader, Adrian Baez-Ortega, Matthew H. Bailey, Peter J. Bailey, Miruna Balasundaram, Saianand Balu, Pratiti Bandopadhayay, Rosamonde E. Banks, Stefano Barbi, Andrew P. Barbour, Jonathan Barenboim, Jill Barnholtz- Sloan, Hugh Barr, Elisabet Barrera, John Bartlett, Javier Bartolome, Claudio Bassi, Oliver F. Bathe, Daniel Baumhoer, Prashant Bavi, Stephen B. Baylin, Wojciech Bazant, Duncan Beardsmore, Timothy A. Beck, Sam Behjati, Andreas Behren, Beifang Niu, Cindy Bell, Sergi Beltran, Christopher Benz, Andrew Berchuck, Anke K. Bergmann, Benjamin P. Berman, Daniel M. Berney, Stephan H. Bernhart, Rameen Beroukhim, Mario Berrios, Samantha Bersani, Johanna Bertl, Miguel Betancourt, Vinayak Bhandari, Shriram G. Bhosle, Andrew V. Biankin, Matthias Bieg, Darell Bigner, Hans Binder, Ewan Birney, Michael Birrer, Nidhan K. Biswas, Bodil Bjerkehagen, Tom Bodenheimer, Lori Boice, Giada Bonizzato, Johann S. De Bono, Moiz S. Bootwalla, Ake Borg, Arndt Borkhardt, Keith A. Boroevich, Ivan Borozan, Christoph Borst, Marcus Bosenberg, Mattia Bosio, Jacqueline Boultwood, Guillaume Bourque, Paul C. Boutros, G. Steven Bova, David T. Bowen, Reanne Bowlby, David D. L. Bowtell, Sandrine Boyault, Rich Boyce, Jeffrey Boyd, Alvis Brazma, Paul Brennan, Daniel S. Brewer, Arie B. Brinkman, Robert G. Bristow, Russell R. Broaddus, Jane E. Brock, Malcolm Brock, Annegien Broeks, Angela N. Brooks, Denise Brooks, Benedikt Brors, Søren Brunak, Timothy J. C. Bruxner, Alicia L. Bruzos, Alex Buchanan, Ivo Buchhalter, Christiane Buchholz, Susan Bullman, Hazel Burke, Birgit Burkhardt, Kathleen H. Burns, John Busanovich, Carlos D. Bustamante, Adam P. Butler, Atul J. Butte, Niall J. Byrne, Anne-Lise Børresen-Dale, Samantha J. Caesar-Johnson, Andy Cafferkey, Declan Cahill, Claudia Calabrese, Carlos Caldas, Fabien Calvo, Niedzica Camacho, Peter J. Campbell, Elias Campo, Cinzia Cantù, Shaolong Cao, Thomas E. Carey, Joana Carlevaro-Fita, Rebecca Carlsen, Ivana Cataldo, Mario Cazzola, Jonathan Cebon, Robert Cerfolio, Dianne E. Chadwick, Dimple Chakravarty, Don Chalmers, Calvin Wing Yiu Chan, Michelle Chan-Seng-Yue, Vishal S. Chandan, David K. Chang, Stephen J. Chanock, Lorraine A. Chantrill, Aurélien Chateigner, Nilanjan Chatterjee, Kazuaki Chayama, Hsiao-Wei Chen, Jieming Chen, Ken Chen, Yiwen Chen, Zhaohong Chen, Andrew D. Cherniack, Jeremy Chien, Yoke-Eng Chiew, Suet-Feung Chin, Juok Cho, Sunghoon Cho, Jung Kyoon Choi, Wan Choi, Christine Chomienne, Zechen Chong, Su Pin Choo, Angela Chou, Angelika N. Christ, Elizabeth L. Christie, Eric Chuah, Carrie Cibulskis, Kristian Cibulskis, Sara Cingarlini, Peter Clapham, Alexander Claviez, Sean Cleary, Nicole Cloonan, Marek Cmero, Colin C. Collins, Ashton A. Connor, Susanna L. Cooke, Colin S. Cooper, Leslie Cope, Vincenzo Corbo, Matthew G. Cordes, Stephen M. Cordner, Isidro Cortés-Ciriano, Kyle Covington, Prue A. Cowin, Brian Craft, David Craft, Chad J. Creighton, Yupeng Cun, Erin Curley, Ioana Cutcutache, Karolina Czajka, Bogdan Czerniak, Rebecca A. Dagg, Ludmila Danilova, Maria Vittoria Davi, Natalie R. Davidson, Helen Davies, Ian J. Davis, Brandi N. Davis-Dusenbery, Kevin J. Dawson, Francisco M. De La Vega, Ricardo De Paoli-Iseppi, Timothy Defreitas, Angelo P. Dei Tos, Olivier Delaneau, John A. Demchok, PCAWG Mutational Signatures Working Group, and P. C. A. W. G. Consortium. 2020. The repertoire of mutational signatures in human cancer. Nature 578, 7793 (01 Feb 2020), 94–101. Google ScholarCross Ref
- [6] Ludmil B. Alexandrov, Serena Nik-Zainal, David C. Wedge, Samuel A. J. R. Aparicio, Sam Behjati, Andrew V. Biankin, Graham R. Bignell, Niccolò Bolli, Ake Borg, Anne-Lise Børresen-Dale, Sandrine Boyault, Birgit Burkhardt, Adam P. Butler, Carlos Caldas, Helen R. Davies, Christine Desmedt, Roland Eils, Jórunn Erla Eyfjörd, John A. Foekens, Mel Greaves, Fumie Hosoda, Barbara Hutter, Tomislav Ilicic, Sandrine Imbeaud, Marcin Imielinski, Natalie Jäger, David T. W. Jones, David Jones, Stian Knappskog, Marcel Kool, Sunil R. Lakhani, Carlos López-Otín, Sancha Martin, Nikhil C. Munshi, Hiromi Nakamura, Paul A. Northcott, Marina Pajic, Elli Papaemmanuil, Angelo Paradiso, John V. Pearson, Xose S. Puente, Keiran Raine, Manasa Ramakrishna, Andrea L. Richardson, Julia Richter, Philip Rosenstiel, Matthias Schlesner, Ton N. Schumacher, Paul N. Span, Jon W. Teague, Yasushi Totoki, Andrew N. J. Tutt, Rafael Valdés-Mas, Marit M. van Buuren, Laura van ’t Veer, Anne Vincent-Salomon, Nicola Waddell, Lucy R. Yates, Jessica Zucman-Rossi, P. Andrew Futreal, Ultan McDermott, Peter Lichter, Matthew Meyerson, Sean M. Grimmond, Reiner Siebert, Elías Campo, Tatsuhiro Shibata, Stefan M. Pfister, Peter J. Campbell, Michael R. Stratton, Australian Pancreatic Cancer Genome Initiative, ICGC Breast Cancer Consortium, ICGC MMML-Seq Consortium, and I. C. G. C. PedBrain. 2013. Signatures of mutational processes in human cancer. Nature 500, 7463 (01 Aug 2013), 415–421. Google ScholarCross Ref
- [7] . 2013. Deciphering signatures of mutational processes operative in human cancer. Cell Reports 3, 1 (2013), 246–259.Google ScholarCross Ref
- [8] H. S. Anderson and P. Roth. 2018. EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models. ArXiv e-prints (April 2018). arXiv:1804.04637 [cs.CR].Google Scholar
- [9] . 2014. Drebin: Effective and explainable detection of Android malware in your pocket.. In NDSS, 14. 23–26.Google Scholar
- [10] . 2020. Clustering IoT malware based on binary similarity. In Proceedings of the IEEE/IFIP Network Operations and Management Symposium (NOMS 2020). IEEE, 1–6.Google ScholarDigital Library
- [11] . 2023. Distributed non-negative rescal with automatic model selection for exascale data. J. Parallel and Distrib. Comput. 179 (2023), 104709.Google ScholarDigital Library
- [12] Manish Bhattarai, Namita Kharat, Ismael Boureima, Erik Skau, Benjamin Nebgen, Hristo Djidjev, Sanjay Rajopadhye, James P. Smith, and Boian Alexandrov. 2023. Distributed non-negative RESCAL with automatic model selection for exascale data. J. Parallel and Distrib. Comput. 179 (2023), 104709. Google ScholarDigital Library
- [13] . 1999. Bayesian PCA. Advances in Neural Information Processing Systems (1999), 382–388.Google Scholar
- [14] . 2019. The Cost of Cybercrime.
Technical Report . Accenture, Ponemon Institute. https://www.accenture.com/_acnmedia/PDF-96/Accenture-2019-Cost-of-Cybercrime-Study-Final.pdfGoogle Scholar - [15] . 2022. Distributed out-of-memory NMF on CPU/GPU architectures. The Journal of Supercomputing (2022). https://api.semanticscholar.org/CorpusID:247011761Google Scholar
- [16] . 2022. Distributed out-of-memory SVD on CPU/GPU architectures. In 2022 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1–8.Google ScholarCross Ref
- [17] . 2004. Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the National Academy of Sciences 101, 12 (2004), 4164–4169.Google ScholarCross Ref
- [18] . 2004. GaP: A factor model for discrete data. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 122–129.Google ScholarDigital Library
- [19] . 2021. Simultaneous dimension reduction and clustering via the NMF-EM algorithm. Advances in Data Analysis and Classification 15 (2021), 231–260.Google ScholarDigital Library
- [20] . 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, CA)) (
KDD ’16 ). ACM, New York,, 785–794.DOI: Google ScholarDigital Library - [21] . 2016. On the equivalence between algorithms for non-negative matrix factorization and Latent Dirichlet Allocation.. In ESANN.Google Scholar
- [22] . 2023. MalwareDNA: Simultaneous classification of malware, malware families, and novel malware. arXiv preprint arXiv:2309.01350 (2023).Google Scholar
- [23] . 2022. One-shot federated group collaborative filtering. In 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA). 647–652.
DOI: Google ScholarCross Ref - [24] . 2022. SeNMFk-SPLIT: Large corpora topic modeling by semantic non-negative matrix factorization with automatic model selection. In Proceedings of the 22nd ACM Symposium on Document Engineering (San Jose, CA) (
DocEng ’22 ). ACM, New York,, Article10 , 4 pages.DOI: Google ScholarDigital Library - [25] . 2018. VirusShare Dataset.
DOI: Google ScholarCross Ref - [26] . 2018. Android malware familial classification and representative sample selection via frequent subgraph analysis. IEEE Transactions on Information Forensics and Security 13, 8 (2018), 1890–1905.
DOI: Google ScholarCross Ref - [27] . 2009. Nonnegative matrix factorizations as probabilistic inference in composite models. In Proceedings of the2009 17th European Signal Processing Conference. IEEE, 1913–1917.Google Scholar
- [28] . 2014. Hierarchical clustering of hyperspectral images using rank-two nonnegative matrix factorization. IEEE Transactions on Geoscience and Remote Sensing 53, 4 (2014), 2066–2078.Google ScholarCross Ref
- [29] . 2014. How many topics? stability analysis for topic models. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 498–513.Google ScholarDigital Library
- [30] . 2020. COVID-19 literature topic-based search via hierarchical NMF. arXiv preprint arXiv:2009.09074 (2020).Google Scholar
- [31] . 2016. An approach for detection and family classification of malware based on behavioral analysis. In Proceedings of the2016 International Conference on Computing, Networking and Communications (ICNC). 1–5.
DOI: Google ScholarCross Ref - [32] . 1994. Neural Networks: A Comprehensive Foundation. Prentice Hall PTR.Google ScholarDigital Library
- [33] . 2013. Wilcoxon Rank Sum Test. Springer New York, , 2354–2355.
DOI: Google ScholarCross Ref - [34] . 2016. MtNet: A multi-task neural network for dynamic malware classification. In Proceedings of 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2016) .). Springer, 399–418. https://www.microsoft.com/en-us/research/publication/mtnet-multi-task-neural-network-dynamic-malware-classification/Google ScholarDigital Library
- [35] . 2021. Cost of a Data Breach Report.
Technical Report . IBM. https://www.ibm.com/security/data-breachGoogle Scholar - [36] S. M. Ashiqul Islam, Marcos Diaz-Gay, Yang Wu, Mark Barnes, Raviteja Vangara, Erik N. Bergstrom, Yudou He, Mike Vella, Jingwei Wang, Jon W. Teague, Peter Clapham, Sarah Moody, Sergey Senkin, Yun Rose Li, Laura Riva, Tongwu Zhang, Andreas J. Gruber, Christopher D. Steele, Burcak Otlu, Azhar Khandekar, Ammal Abbasi, Laura Humphreys, Natalia Syulyukina, Samuel W. Brady, Boian S. Alexandrov, Nischalan Pillay, Jinghui Zhang, David J. Adams, Inigo Martincorena, David C. Wedge, Maria Teresa Landi, Paul Brennan, Michael R. Stratton, Steven G. Rozen, and Ludmil B. Alexandrov. 2022. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genomics 2, 11 (2022), 100179. Google ScholarCross Ref
- [37] . 2002. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20, 4 (Oct. 2002), 422–446.
DOI: Google ScholarDigital Library - [38] . 2019. Android malware family classification based on sensitive opcode sequence. In Proceedings of the 2019 IEEE Symposium on Computers and Communications (ISCC). IEEE, 1–7.Google ScholarCross Ref
- [39] . 2020. Machine Learning Methods for Malware Detection.
Technical Report .Google Scholar - [40] . 2017. LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17) (Long Beach, CA) . Curran Associates Inc., Red Hook, NY, 3149–3157.Google Scholar
- [41] . 2013. Fast rank-2 nonnegative matrix factorization for hierarchical document clustering. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 739–747.Google ScholarDigital Library
- [42] . 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788–791.Google ScholarCross Ref
- [43] . 2018. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18, 185 (2018), 1–52. http://jmlr.org/papers/v18/16-558.htmlGoogle Scholar
- [44] . 2019. Nonnegative matrix factorization and metamorphic malware detection. Journal of Computer Virology and Hacking Techniques 15, 3 (2019), 195–208.Google ScholarCross Ref
- [45] . 2021. Towards an automated pipeline for detecting and classifying malware through machine learning. arXiv preprint arXiv:2106.05625 (2021).Google Scholar
- [46] . 1994. Bayesian nonlinear modeling for the prediction competition. ASHRAE Transactions 100, 2 (1994), 1053–1062.Google Scholar
- [47] . 2020. Microsoft Researchers Work with Intel Labs to Explore New Deep Learning Approaches for Malware Classification. https://www.microsoft.com/security/blogGoogle Scholar
- [48] . 2015. AMAL: High-fidelity, behavior-based automated malware analysis and classification. Computers & Security 52 (2015), 251–266. Google ScholarDigital Library
- [49] . 2009. Tuning pruning in sparse non-negative matrix factorization. In Proceedings of the 2009 17th European Signal Processing Conference. IEEE, 1923–1927.Google Scholar
- [50] . 2011. Malware images: Visualization and automatic classification. In Proceedings of the 8th International Symposium on Visualization for Cyber Security (VizSec ’11) (Pittsburgh, PA) . ACM, New York,, Article
4 , 7 pages.DOI: Google ScholarDigital Library - [51] . 2021. A neural network for determination of latent dimensionality in non-negative matrix factorization. Machine Learning: Science and Technology 2, 2 (2021), 025012.Google Scholar
- [52] . 2021. Leveraging uncertainty for improved static malware detection under extreme false positive constraints. arXiv preprint arXiv:2108.04081 (2021).Google Scholar
- [53] . 2019. VirusTotal += Bitdefender Theta. https://blog.virustotal.com/2019/10/virustotal-bitdefender-theta.htmlGoogle Scholar
- [54] . 2019. VirusTotal += Sangfor Engine Zero. https://blog.virustotal.com/2019/11/virustotal-sangfor-engine-zero.htmlGoogle Scholar
- [55] . 2017. An alternative to NCD for large sequences, Lempel-Ziv Jaccard distance. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’17) (Halifax, NS, Canada) . ACM, New York,, 1007–1015.
DOI: Google ScholarDigital Library - [56] . 2020. A survey of machine learning methods and challenges for windows malware classification. ArXiv abs/2006.09271 (2020).Google Scholar
- [57] . 2020. A new burrows wheeler transform Markov distance. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. http://arxiv.org/abs/1912.13046Google ScholarCross Ref
- [58] . 2018. Lempel-Ziv Jaccard Distance, an effective alternative to ssdeep and sdhash. Digital Investigation (Feb. 2018).
DOI: Google ScholarCross Ref - [59] . 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20 (1987), 53–65.
DOI: Google ScholarDigital Library - [60] . 2018. Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations. In Proceedings of the 2018 World Wide Web Conference. 1105–1114.Google ScholarDigital Library
- [61] . 2017. Malware family classification method based on static feature extraction. In Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications (ICCC). 507–513.
DOI: Google ScholarCross Ref - [62] . 2012. Automatic relevance determination in nonnegative matrix factorization with the/spl beta/-divergence. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 7 (2012), 1592–1605.Google ScholarDigital Library
- [63] . 2021. Malware Statistics & Trends Report: AV-TEST. https://www.av-test.org/en/statistics/malware/Google Scholar
- [64] . 2014. A deep semi-NMF model for learning hidden representations. In Proceedings of the International Conference on Machine Learning. PMLR, 1692–1700.Google Scholar
- [65] . 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579–2605. http://www.jmlr.org/papers/v9/vandermaaten08a.htmlGoogle ScholarDigital Library
- [66] . 2021. Finding the number of latent topics with semantic non-negative matrix factorization. IEEE Access (2021).Google ScholarCross Ref
- [67] . 2020. Semantic nonnegative matrix factorization with automatic model determination for topic modeling. In Proceedings of the2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 328–335.Google ScholarCross Ref
- [68] . 2020. Semantic nonnegative matrix factorization with automatic model determination for topic modeling. In Proceedings of the 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). 328–335.
DOI: Google ScholarCross Ref - [69] . 2019. Robust intelligent malware detection using deep learning. IEEE Access 7 (2019), 46717–46738.
DOI: Google ScholarCross Ref - [70] . 2003. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. 267–273.Google ScholarDigital Library
- [71] . 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics (Cambridge, Massachusetts) (ACL ’95). Association for Computational Linguistics, , 189–196.
DOI: Google ScholarDigital Library - [72] . 2011. Robust non-negative matrix factorization. Frontiers of Electrical and Electronic Engineering in China 6, 2 (2011), 192–200.Google ScholarCross Ref
- [73] . 2019. Static PE malware type classification using machine learning techniques. In Proceedings of the 2019 International Conference on Intelligent Computing and its Emerging Applications (ICEA). 81–86.
DOI: Google ScholarCross Ref - [74] . 2020. Familial clustering for weakly-labeled android malware using hybrid representation learning. IEEE Transactions on Information Forensics and Security 15 (2020), 3401–3414.
DOI: Google ScholarCross Ref
Index Terms
- Semi-Supervised Classification of Malware Families Under Extreme Class Imbalance via Hierarchical Non-Negative Matrix Factorization with Automatic Model Selection
Recommendations
Label consistent semi-supervised non-negative matrix factorization for maintenance activities identification
Health prognostic is playing an increasingly essential role in product and system management, for which non-negative matrix factorization (NMF) has been an effective method to model the high dimensional recorded data of the device or system. However, ...
Semi-supervised non-negative matrix factorization for image clustering with graph Laplacian
Non-negative matrix factorization (NMF) plays an important role in multivariate data analysis, and has been widely applied in information retrieval, computer vision, and pattern recognition. NMF is an effective method to capture the underlying structure ...
Robust discriminative non-negative matrix factorization
Traditional non-negative matrix factorization (NMF) is an unsupervised method that represents non-negative data by a part-based dictionary and non-negative codes. Recently, the unsupervised NMF has been extended to discriminative ones for classification ...
Comments