Skip to main content

Novel Data Science Methodologies for Essential Genes Identification Based on Network Analysis

  • Chapter
  • First Online:
Data Science in Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1084))

  • 335 Accesses

Abstract

Essential genes (EGs) are fundamental for the growth and survival of a cell or an organism. Identifying EGs is an important issue in many areas of biomedical research, such as synthetic and system biology, drug development, mechanistic and therapeutic investigations. The essentiality is a context-dependent dynamic attribute of a gene that can vary in different cells, tissues, or pathological conditions, and wet-lab experimental procedures to identify EGs are costly and time-consuming. Commonly explored computational approaches are based on machine learning techniques applied to protein-protein interaction networks, but they are often unsuccessful, especially in the case of human genes. From a biological point of view, the identification of the node essentiality attributes is a challenging task. Nevertheless, from a data science perspective, suitable graph learning approaches still represent an open problem. Node classification in graph modeling/analysis is a machine learning task to predict an unknown node property based on defined node attributes. The model is trained based on both the relationship information and the node attributes. Here, we propose the use of a context-specific integrated network enriched with biological and topological attributes. To tackle the node classification task we exploit different machine and deep learning models. An extensive experimental phase demonstrates the effectiveness of both network structure and attributes associated with the nodes for EGs identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://bitbucket.org/tuliocampos/essential.

  2. 2.

    https://sites.google.com/site/bctnet/.

  3. 3.

    https://depmap.org/portal/.

  4. 4.

    Scikit-Learn: https://scikit-learn.org/stable/, Pytorch Geometric: https://pytorch-geometric.readthedocs.io/en/latest/, Imbalanced-learn: https://imbalanced-learn.org/stable/.

  5. 5.

    Google Colab notebook for result reproducibility are available at: https://github.com/giordamaug/EG-identification---Data-Science-in-App-Springer/tree/main/notebook.

References

  1. Chen, H., Zhang, Z., Jiang, S., Li, R., Li, W., Zhao, C., Hong, H., Huang, X., Li, H., Bo, X.: New insights on human essential genes based on integrated analysis and the construction of the hegiap web-based platform. Brief. Bioinform. 21(4), 1397–1410 (2020)

    Article  Google Scholar 

  2. Hasan, M.A., Lonardi, S.: DeeplyEssential: a deep neural network for predicting essential genes in microbes. BMC Bioinform. 21(367) (2020). https://doi.org/10.1186/s12859-020-03688-y

  3. Li, X., Li, W., Zeng, M., Zheng, R., Li, M.: Network-based methods for predicting essential genes or proteins: a survey. Brief. Bioinform. 21(2), 566–583 (2019). https://doi.org/10.1093/bib/bbz017

    Article  Google Scholar 

  4. Hutchison III, C.A., Chuang, R.-Y., Noskov, V.N., Assad-Garcia, N., Deerinck, T.J., Ellisman, M.H., Gill, J., Kannan, K., Karas, B.J., Ma, L., et al.: Design and synthesis of a minimal bacterial genome. Science 351(6280), 6253 (2016)

    Google Scholar 

  5. Dickerson, J.E., Zhu, A., Robertson, D.L., Hentges, K.E.: Defining the role of essential genes in human disease. PLoS ONE 6(11), 27368 (2011)

    Article  Google Scholar 

  6. Park, D., Park, J., Park, S.G., Park, T., Choi, S.S.: Analysis of human disease genes in the context of gene essentiality. Genomics 92(6), 414–418 (2008)

    Article  Google Scholar 

  7. Juhas, M., Eberl, L., Church, G.M.: Essential genes as antimicrobial targets and cornerstones of synthetic biology. Trends Biotechnol. 30(11), 601–607 (2012)

    Article  Google Scholar 

  8. Luo, L., Zheng, W., Chen, C., Sun, S.: Searching for essential genes and drug discovery in breast cancer and periodontitis via text mining and bioinformatics analysis. Anticancer Drugs 32(10), 1038 (2021)

    Article  Google Scholar 

  9. Chang, L., Ruiz, P., Ito, T., Sellers, W.R.: Targeting pan-essential genes in cancer: challenges and opportunities. Cancer Cell 39(4), 466–479 (2021)

    Article  Google Scholar 

  10. Wang, T., Birsoy, K., Hughes, N.W., Krupczak, K.M., Post, Y., Wei, J.J., Lander, E.S., Sabatini, D.M.: Identification and characterization of essential genes in the human genome. Science 350(6264), 1096–1101 (2015)

    Article  Google Scholar 

  11. Bartha, I., di Iulio, J., Venter, J.C., Telenti, A.: Human gene essentiality. Nat. Rev. Genet. 19(1), 51–62 (2018). https://doi.org/10.1038/nrg.2017.75

    Article  Google Scholar 

  12. Bartha, I., di Iulio, J., Venter, J.C., Telenti, A.: Human gene essentiality. Nat. Rev. Genet. 19(1), 51–62 (2018)

    Article  Google Scholar 

  13. Gurumayum, S., Jiang, P., Hao, X., Campos, T.L., Young, N.D., Korhonen, P.K., Gasser, R.B., Bork, P., Zhao, X.-M., He, L.-J., et al.: Ogee v3: Online gene essentiality database with increased coverage of organisms and human cell lines. Nucleic Acids Res. 49(D1), 998–1003 (2021)

    Google Scholar 

  14. Cowley, G.S., Weir, B.A., Vazquez, F., Tamayo, P., Scott, J.A., Rusin, S., East-Seletsky, A., Ali, L.D., Gerath, W.F., Pantel, S.E., et al.: Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Sci. Data 1(1), 1–12 (2014)

    Google Scholar 

  15. Aromolaran, O., Aromolaran, D., Isewon, I., Oyelade, J.: Machine learning approach to gene essentiality prediction: a review. Brief. Bioinform. 22(5) (2021). https://doi.org/10.1093/bib/bbab128

  16. Jeong, H., Mason, S.P., Barabási, A.-L., Oltvai, Z.N.: Lethality and centrality in protein networks. Nature 411(6833), 41–42 (2001)

    Article  Google Scholar 

  17. Liu, X., Hong, Z., Liu, J., Lin, Y., Rodríguez-Patón, A., Zou, Q., Zeng, X.: Computational methods for identifying the critical nodes in biological networks. Brief. Bioinform. 21(2), 486–497 (2020)

    Article  Google Scholar 

  18. Manipur, I., Giordano, M., Piccirillo, M., Parashuraman, S., Maddalena, L.: Community detection in protein-protein interaction networks and applications. IEEE/ACM Trans. Comput. Biol. Bioinform. 1 (2021). https://doi.org/10.1109/TCBB.2021.3138142

  19. Granata, I., Manzo, M., Kusumastuti, A., Guarracino, M.R.: Learning from metabolic networks: current trends and future directions for precision medicine. Curr. Med. Chem. 28(32), 6619–6653 (2021)

    Article  Google Scholar 

  20. Dong, C., Jin, Y.-T., Hua, H.-L., Wen, Q.-F., Luo, S., Zheng, W.-X., Guo, F.-B.: Comprehensive review of the identification of essential genes using computational methods: focusing on feature implementation and assessment. Brief. Bioinform. 21(1), 171–181 (2018). https://doi.org/10.1093/bib/bby116

    Article  Google Scholar 

  21. Aromolaran, O., Beder, T., Oswald, M., Oyelade, J., Adebiyi, E., Koenig, R.: Essential gene prediction in drosophila melanogaster using machine learning approaches based on sequence and functional features. Comput. Struct. Biotechnol. J. 18, 612–621 (2020). https://doi.org/10.1016/j.csbj.2020.02.022

    Article  Google Scholar 

  22. Campos, T.L., Korhonen, P.K., Gasser, R.B., Young, N.D.: An evaluation of machine learning approaches for the prediction of essential genes in eukaryotes using protein sequence-derived features. Comput. Struct. Biotechnol. J. 17, 785–796 (2019). https://doi.org/10.1016/j.csbj.2019.05.008

    Article  Google Scholar 

  23. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    Google Scholar 

  24. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  25. Zeng, M., Li, M., Fei, Z., Wu, F.-X., Li, Y., Pan, Y., Wang, J.: A deep learning framework for identifying essential proteins by integrating multiple types of biological information. IEEE/ACM Trans. Comput. Biol. Bioinf. 18(1), 296–305 (2021). https://doi.org/10.1109/TCBB.2019.2897679

    Article  Google Scholar 

  26. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16, pp. 855–864. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939754

  27. Dai, W., Chang, Q., Peng, W., Zhong, J., Li, Y.: Network embedding the protein-protein interaction network for human essential genes identification. Genes 11(2), 153 (2020)

    Article  Google Scholar 

  28. Wu, G., Feng, X., Stein, L.: A human functional protein interaction network and its application to cancer data analysis. Genome Biol. 11(R53) (2010). https://doi.org/10.1186/gb-2010-11-5-r53

  29. Li, T., Wernersson, R., Hansen, R., et al.: A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods 14, 61–64 (2017). https://doi.org/10.1038/nmeth.4083

  30. Rezaei, J., Zare Mirakabad, F., Marashi, S.-A., MirHassani, S.A.: The assessment of essential genes in the stability of PPI networks using critical node detection problem. AUT J. Math. Comput. 3(1), 59–76 (2022)

    Google Scholar 

  31. Schapke, J., Tavares, A., Recamonde-Mendoza, M.: EPGAT: gene essentiality prediction with graph attention networks. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(3), 1615–1626 (2022). https://doi.org/10.1109/TCBB.2021.3054738

    Article  Google Scholar 

  32. Zhang, X., Xiao, W., Xiao, W.: Deephe: accurately predicting human essential genes based on deep learning. PLoS Comput. Biol. 16(9), 1008229 (2020)

    Article  Google Scholar 

  33. Kuang, S., Wei, Y., Wang, L.: Expression-based prediction of human essential genes and candidate lncrnas in cancer cells. Bioinformatics 37(3), 396–403 (2021)

    Article  Google Scholar 

  34. Granata, I., Guarracino, M.R., Kalyagin, V.A., Maddalena, L., Manipur, I., Pardalos, P.M.: Supervised classification of metabolic networks. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2688–2693 (2018). https://doi.org/10.1109/BIBM.2018.8621500

  35. Manipur, I., Granata, I., Maddalena, L., Guarracino, M.R.: Clustering analysis of tumor metabolic networks. BMC Bioinform. (2020). https://doi.org/10.1186/s12859-020-03564-9

    Article  MATH  Google Scholar 

  36. Wang, H., Robinson, J.L., Kocabas, P., Gustafsson, J., Anton, M., Cholley, P.-E., Huang, S., Gobom, J., Svensson, T., Uhlen, M., et al.: Genome-scale metabolic network reconstruction of model animals as a platform for translational research. Proceed. Natil. Acad. Sci. 118(30) (2021)

    Google Scholar 

  37. Kotlyar, M., Pastrello, C., Malik, Z., Jurisica, I.: Iid 2018 update: context-specific physical protein-protein interactions in human, model organisms and domesticated species. Nucleic Acids Res. 47(D1), 581–589 (2019)

    Article  Google Scholar 

  38. Uhlén, M., Fagerberg, L., Hallström, B.M., Lindskog, C., Oksvold, P., Mardinoglu, A., Sivertsson, Å., Kampf, C., Sjöstedt, E., Asplund, A., et al.: Tissue-based map of the human proteome. Science 347(6220), 1260419 (2015)

    Google Scholar 

  39. Nandi, S., Subramanian, A., Sarkar, R.R.: An integrative machine learning strategy for improved prediction of essential genes in escherichia coli metabolism using flux-coupled features. Mol. BioSyst. 13(8), 1584–1596 (2017)

    Article  Google Scholar 

  40. Carithers, L.J., Ardlie, K., Barcus, M., Branton, P.A., Britton, A., Buia, S.A., Compton, C.C., DeLuca, D.S., Peter-Demchok, J., Gelfand, E.T., et al.: A novel approach to high-quality postmortem tissue procurement: the gtex project. Biopreservation Biobanking 13(5), 311–319 (2015)

    Google Scholar 

  41. Tang, G., Cho, M., Wang, X.: Oncodb: an interactive online database for analysis of gene expression and viral infection in cancer. Nucleic Acids Res. 50(D1), 1334–1339 (2022)

    Article  Google Scholar 

  42. Durinck, S., Spellman, P.T., Birney, E., Huber, W.: Mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomart. Nat. Protoc. 4, 1184–1191 (2009)

    Article  Google Scholar 

  43. Huang, D.W., Sherman, B.T., Lempicki, R.A.: Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nat. Protoc. 4(1), 44–57 (2009)

    Article  Google Scholar 

  44. Huang, D.W., Sherman, B.T., Lempicki, R.A.: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37(1), 1–13 (2009)

    Article  Google Scholar 

  45. Hart, T., Chandrashekhar, M., Aregger, M., Steinhart, Z., Brown, K.R., MacLeod, G., Mis, M., Zimmermann, M., Fradet-Turcotte, A., Sun, S., et al.: High-resolution crispr screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163(6), 1515–1526 (2015)

    Google Scholar 

  46. Wheeler, D.L., Barrett, T., Benson, D.A., Bryant, S.H., Canese, K., Chetvernin, V., Church, D.M., DiCuccio, M., Edgar, R., Federhen, S., et al.: Database resources of the national center for biotechnology information. Nucleic Acids Res. 36(suppl_1), 13–21 (2007)

    Google Scholar 

  47. Cacheiro, P., Muñoz-Fuentes, V., Murray, S.A., Dickinson, M.E., Bucan, M., Nutter, L.M., Peterson, K.A., Haselimashhadi, H., Flenniken, A.M., Morgan, H., et al.: Human and mouse essentiality screens as a resource for disease gene discovery. Nature Commun. 11(1), 1–16 (2020)

    Google Scholar 

  48. Piñero, J., Ramírez-Anguita, J.M., Saüch-Pitarch, J., Ronzano, F., Centeno, E., Sanz, F., Furlong, L.I.: The disgenet knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 48(D1), 845–855 (2020)

    Google Scholar 

  49. Granata, I., Guarracino, M.R., Maddalena, L., Manipur, I.: Network distances for weighted digraphs. In: Kochetov, Y., Bykadorov, I., Gruzdeva, T. (eds.) Mathematical Optimization Theory and Operations Research. CCIS, vol. 1275, pp. 389–408. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58657-7_31

  50. Rasti, S., Vogiatzis, C.: A survey of computational methods in protein-protein interaction networks. Ann. Oper. Res. 276(1), 35–87 (2019). https://doi.org/10.1007/s10479-018-2956-2

    Article  MathSciNet  MATH  Google Scholar 

  51. Bonacich, P.: Factoring and weighting approaches to status scores and clique identification. The Journal of Mathematical Sociology 2(1), 113–120 (1972). https://doi.org/10.1080/0022250X.1972.9989806

    Article  Google Scholar 

  52. Granata, I., Guarracino, M.R., Kalyagin, V.A., Maddalena, L., Manipur, I., Pardalos, P.M.: Model simplification for supervised classification of metabolic networks. Ann. Math. Artif. Intell. 88, 91–104 (2020). https://doi.org/10.1007/s10472-019-09640-y

    Article  MathSciNet  MATH  Google Scholar 

  53. Barrat, A., Barthélemy, M., Pastor-Satorras, R., Vespignani, A.: The architecture of complex weighted networks. Proc. Natl. Acad. Sci. 101(11), 3747–3752 (2004). https://doi.org/10.1073/pnas.0400087101

    Article  MATH  Google Scholar 

  54. Csardi, G., Nepusz, T.: The igraph software package for complex network research. Inter. J. Complex Syst. 1695 (2006)

    Google Scholar 

  55. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824–827 (2002)

    Article  Google Scholar 

  56. Sporns, O., Kötter, R., Friston, K.J.: Motifs in brain networks. PLoS Biol. 2(11), 369 (2004)

    Article  Google Scholar 

  57. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999). https://doi.org/10.1145/324133.324140

    Article  MathSciNet  MATH  Google Scholar 

  58. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1), 107–117 (1998). https://doi.org/10.1016/S0169-7552(98)00110-X. Proceedings of the Seventh International World Wide Web Conference

  59. Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B., Ideker, T.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504 (2003)

    Article  Google Scholar 

  60. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: Improving classification performance when training data is skewed. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4 (2008). IEEE

    Google Scholar 

  61. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)

    Article  MathSciNet  Google Scholar 

  62. Yue, X., Wang, Z., Huang, J., Parthasarathy, S., Moosavinasab, S., Huang, Y., Lin, S.M., Zhang, W., Zhang, P., Sun, H.: Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 36(4), 1241–1251 (2020)

    Article  Google Scholar 

  63. Nelson, W., Zitnik, M., Wang, B., Leskovec, J., Goldenberg, A., Sharan, R.: To embed or not: network embedding as a paradigm in computational biology. Front. Genet. 10, 381 (2019)

    Article  Google Scholar 

  64. Manipur, I., Manzo, M., Granata, I., Giordano, M., Maddalena, L., Guarracino, M.R.: Netpro2vec: a graph embedding framework for biomedical applications. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(2), 729–740 (2022). https://doi.org/10.1109/TCBB.2021.3078089

    Article  Google Scholar 

  65. Maddalena, L., Manipur, I., Manzo, M., Guarracino, M.R.: In: Mondaini, R.P. (ed.) On Whole-Graph Embedding Techniques, pp. 115–131. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73241-7_8

  66. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16, pp. 785–794. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939785

  67. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  68. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS’16, pp. 3844–3852. Curran Associates Inc., Red Hook, NY, USA (2016)

    Google Scholar 

  69. Manzo, M., Giordano, M., Maddalena, L., Guarracino, M.R.: Performance evaluation of adversarial attacks on whole-graph embedding models. In: Simos, D.E., Pardalos, P.M., Kotsireas, I.S. (eds.) Learning and Intelligent Optimization. Lecture Notes in Computer Science, vol. 12931, pp. 219–236. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-92121-7_19

Download references

Acknowledgements

This work has been partially funded by the BiBiNet project (H35F21000430002) within POR-Lazio FESR 2014-2020. It was carried out also within the activities of the authors as members of the ICAR-CNR INdAM Research Unit and partially supported by the INdAM research project “Computational Intelligence methods for Digital Health”. The work of Mario R. Guarracino was conducted within the framework of the Basic Research Program at the National Research University Higher School of Economics (HSE). Mario Manzo thanks Prof. Alfredo Petrosino for the guidance and supervision during the years of working together.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mario Rosario Guarracino .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Manzo, M., Giordano, M., Maddalena, L., Guarracino, M.R., Granata, I. (2023). Novel Data Science Methodologies for Essential Genes Identification Based on Network Analysis. In: Dzemyda, G., Bernatavičienė, J., Kacprzyk, J. (eds) Data Science in Applications. Studies in Computational Intelligence, vol 1084. Springer, Cham. https://doi.org/10.1007/978-3-031-24453-7_7

Download citation

Publish with us

Policies and ethics