Analyzing and Comparing Omicron Lineage Variants Protein–Protein Interaction Network Using Centrality Measure

Das, Mamata; Selvakumar, K.; Alphonse, P. J. A.

doi:10.1007/s42979-023-01685-5

Analyzing and Comparing Omicron Lineage Variants Protein–Protein Interaction Network Using Centrality Measure

Original Research
Published: 30 March 2023

Volume 4, article number 299, (2023)
Cite this article

Download PDF

SN Computer Science Aims and scope Submit manuscript

Analyzing and Comparing Omicron Lineage Variants Protein–Protein Interaction Network Using Centrality Measure

Download PDF

1324 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The Worldwide spread of the Omicron lineage variants has now been confirmed. It is crucial to understand the process of cellular life and to discover new drugs need to identify the important proteins in a protein interaction network (PPIN). PPINs are often represented by graphs in bioinformatics, which describe cell processes. There are some proteins that have significant influences on these tissues, and which play a crucial role in regulating them. The discovery of new drugs is aided by the study of significant proteins. These significant proteins can be found by reducing the graph and using graph analysis. Studies examining protein interactions in the Omicron lineage (B.1.1.529) and its variants (BA.5, BA.4, BA.3, BA.2, BA.1.1, BA.1) are not yet available. Studying Omicron has been intended to find a significant protein. 68 nodes represent 68 proteins and 52 edges represent the relationship among the protein in the network. A few centrality measures are computed namely page rank centrality (PRC), degree centrality (DC), closeness centrality (CC), and betweenness centrality (BC) together with node degree and Local clustering coefficient (LCC). We also discover 18 network clusters using Markov clustering. 8 significant proteins (candidate gene of Omicron lineage variants) were detected among the 68 proteins, including AHSG, KCNK1, KCNQ1, MAPT, NR1H4, PSMC2, PTPN11 and, UBE21 which scored the highest among the Omicron proteins. It is found that in the variant of Omicron protein–protein interaction networks, the MAPT protein’s impact is the most significant.

NET-GE: a novel NETwork-based Gene Enrichment for detecting biological processes associated to Mendelian diseases

Article Open access 18 June 2015

Comparison of tissue/disease specific integrated networks using directed graphlet signatures

Article Open access 22 March 2017

LENS: web-based lens for enrichment and network studies of human proteins

Article Open access 09 December 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

An impartial panel of scientists known as TAG-VE (Technical Advisory Group on SARS-CoV-2 Virus Evolution) regularly observes and examines the appraise of the SARS-CoV-2 virus to determine if specific mutations or combinations of mutations have an impact on the behavior of the virus. The B.1.1.529 variants of SARS-CoV-2 were the subject of an evaluation by the TAG-VE on November 26, 2021. South Africa disclosed the B.1.1.529 version on November 24, 2021 to World Health Organization (WHO) [26] for the first time. The WHO has classified B.1.1.529 as a VOC under the name Omicron considering the data that a negative shift in COVID-19 epidemiology has occurred. Similar to other SARS-CoV-2 variations, there are numerous lineages and sublineages in the Omicron variation. Omicron presently has 3 main lineages: BA.5, BA.4, and BA.2. Omicron Pango Lineage currently has six different variations or sublineages (BA.5, BA.4, BA.3, BA.2, BA.1.1, BA.1). Although these lineages are frequently extremely similar to one another, there may be variations between lineages that influence how the virus behaves. In our research, we have created seven PPI networks of Omicron Pango Lineage including all the variants. The network has been created on STRING, analyzed the network and find the most influential proteins from the network. The networks that describe the interactions between the parts of such complex systems are easier to analyze than it is to investigate each component separately. The placement of some significant or influential elements in most networks such as crucial proteins in PPI networks is a well-known fact in the analysis of biological and social networks. These locations, or vertices, have some unique structural characteristics. Such facts are quantified using various centrality metrics. The vertices and edges of a graph can be ranked from several perspectives based on centrality measurements. To pinpoint “central” nodes in extensive networks, numerous centrality measures (CM) have been developed. The user can choose whatever metric best fits the study of a certain network because there are several options available for ranking influential nodes. The effect of the network architecture on how influential nodes are ranked by centrality metrics further complicates the selection of an appropriate measure. In order to find the centrality metric that is most successful at predicting influential proteins, we looked at the centrality profiles of the nodes of Omicron PPINs. We looked at how a broad range of widely used centrality measurements reflects various topological network properties. This study demonstrates the state-of-the-art in biological network centrality estimations. In order to identify the most significant protein in the network, this research presents 4 centrality metrics [page rank centrality (PRC), degree centrality (DC), closeness centrality (CC), and betweenness centrality (BC)] that are added with some significant scores [node degree and local clustering coefficient (CCo), and p value] on Omicron variant’s PPI networks.

Related Work

Graph structures known as biological networks and social networks can be used to describe a variety of complex systems, including biological and social systems [11]. For determining significant functional characteristics of a network [7, 8], selecting an appropriate set of centrality measurements is essential. [18] the paper has been considered in relation to a critical analysis of centrality measures in social networks. Three straightforward conditions for the behavior of centrality measures were used to analyze certain centrality measures (BC, CC, DC, and eigenvector centrality). The author has been analysis of PPI using Skyline Query on Parkinson’s disease [9]. One of the disorders with the highest rate of global growth, Parkinson’s disease, was shown to have 12 important proteins. The PPI network features have been represented by attributes based on centrality measures. The target genes for cancer illnesses were discovered by the author using protein–protein interaction networks [1]. Hubs and centrality measurements were used to examine the possible genes. They extract the genes with the highest scores in both mutation rates and graph centrality in order to identify the target genes. The author compared 27 popular centrality measurements using yeast PPINs [2]. The measurements classify and arrange the networks’ influential nodes. They have also used hierarchical clustering and principal component analysis (PCA), and they discovered that the topology of the network affects which metrics are the most useful. The author has provided both historical and contemporary research on social network centrality measures in [6] survey paper. They discussed created centrality measurements and mathematical definitions. In addition, they demonstrate various centrality measure uses in the fields of education research [12], biology [11], traffic [14], transportation [25], and security [5, 21]. There are so many applications of centrality measure in different field network [4] such as psychological networks [3, 16], brain networks [15], and differential privacy models [17].

Methods

This study used Omicron lineage variants data. The research has been completed in different steps like, data collection, data cleaning, data validation, creation of PPIN data, centrality measure and finally clustered the whole network in different clusters. The clustering is done by the MCL (Markov clustering algorithm) [20]. The objective of this research work is to get the significant protein or prioritize the protein. For this we have focuses on the centrality measure of the network. Figure 1 illustrates the research workflow.

Data Collection

We have taken the real dataset of Omicron from Universal Protein Resource/Swiss-Prot (UniProt/Swiss-Prot) [23] database which is reviewed and found in the human body. In addition to storing experimental results, computational features, and scientific conclusions, Swiss-Prot is a highly annotated, non-redundant protein sequence database. Currently, the UniProt Knowledgebase is comprised of UniProtKB/Swiss-Prot, which has been reviewed. It provides accurate, consistent, and rich annotations for functional information about proteins. Initially, we have taken a total of 228 proteins: B.1.1.529 (27), BA.5 (30), BA.4 (31), BA.3 (34), BA.2 (38), BA.1.1 (34), BA.1 (34) and analyzed individual Omicron lineage PPIN. The PPIN of Omicron Lineage Variants are shown in Figs. 2, 3, 4, 5, 6 and 7. Then we sum up the data and cleaned the data by removing duplicate data entries to create the Omicron PPIN. The data validation and PPIN data creation in all the cases are done by STRING [22]. There are several sources of information within the STRING database, including computational prediction methods, experimental data, and public text collections. A regular update keeps it up-to-date and it is free to access. In addition, it generates network images using a spring model. In this model, nodes are considered masses, and edges are considered springs. After cleaning the data we gate unique 68 proteins which create the Omicron PPIN.

Centrality Measure

Here, we will discuss very interesting aspect of network measure called centrality. Centrality is basically widely used measure of how central a particular node is with respect to the network. The network that results from the PPI data is thought to be an undirected graph. Each node’s weight in the graph is determined by the centrality approach. The BC, CC, DC, and PRC are a few centrality techniques that can be applied to undirected graphs. Figures 2, 3, 4, 5, 6, 7 and 14 depicts a protein network as an example of an undirected graph. The variant BA.1 and BA.1.1 has the same PPIN only the difference in mutation. The edges of the graph reflect the functional interaction or relationship that takes place between proteins, whereas the nodes in the graph demonstrate the proteins that affect Omicron’s activity.

Degree Centrality

The first basic centrality measure is the degree centrality (DC) [10]. We know that the degree is basically the number of edges which are adjacent on a particular node. The DC is essentially is a degree of a node but it is normalized.

The DC of a node $v$ is a degree of the node $v$ and divided by the maximum degree of a node present in the graph. A node’s degree centrality $C_\textrm{d} (v)$ in a network G(V, E) is denoted mathematically as follows:

$$\begin{aligned} C_\textrm{d} ( v) = \frac{\textrm{deg} ( v)}{\text{ max } \text{ deg}_{u \in v } ( u)}. \end{aligned}$$

(1)

It basically ranges between 0 and 1 and more the degree centrality mean higher the likelihood that the node has maximum degree. The $C_\textrm{d} ( v)$ can use to identify the more prominent or influential node from a network.

Closeness Centrality

The closeness centrality (CC) [19] indicates how close a node from the rest of the network. A approach to identify nodes that can efficiently spread information throughout a graph is through their CC. Average distance between a node and all other nodes is measured by its proximity centrality. The distances between nodes that have a high proximity score are the shortest. A node’s closeness centrality $C_\textrm{c}( v )$ in a graph G(V, E) is denoted mathematically as follows:

$$\begin{aligned} C_\textrm{c} ( v) = \frac{|V |- 1}{\sum _{u\in V - { \{ v\} }}^{}d ( u, v)}. \end{aligned}$$

(2)

where number of nodes is given by $|V |$ and the distance between two nodes $u$ and $v$ is represented as $d(u, v)$. Higher the value of CC, better would be the quality of the particular node. The measure is useful in examining or restricting the spread of disease in epidemic modeling.

Betweenness Centrality

Betweenness (BC) [10] is the measure to compute how central a node is in between paths of the network or we can say to compute how many paths(shortest) of the network passes through the node. A node’s Betweenness centrality $C_\textrm{b}( v )$ in a network G(V, E) is denoted mathematically as follows:

$$\begin{aligned} C_\textrm{b} ( v) =\sum _{xy \in V - \{v\} }^{} \frac{\sigma _{xy} ( v )}{\sigma _{xy}} \end{aligned}$$

(3)

where the frequency of shortest paths in the network between nodes x and y is indicated by $\sigma _{xy}$ and $\sigma _{xy} ( v )$ denotes the same passing through $v$. If $x= 1$, then $\sigma _{xy} = 1$. The BC is useful in identifying the super spreaders in analyzing disease spreading in epidemiology.

Page Rank

PageRank centrality [13] is an adaptation of Eigen centrality that ranks web content using the value of linkages between sites. Any type of network, including protein interaction networks, can be used with it. Mathematically, the pagerank Centrality $C_\textrm{PR}( v )$ in a network G(V, E) of a node $v_i$ is defined as

$$\begin{aligned} C_\textrm{PR} ( v_i) = \frac{1-d}{\vert V \vert } + d \sum _{ ( v_t)\in \textrm{Inneighbor} ( v_i)}^{} \frac{C_\textrm{PR}( v_t )}{\textrm{outdeg}( v_t )} \end{aligned}$$

(4)

where d is constant and called damping factor, usually the constant value is considered as 0.85.

Markov Clustering

At the Centre for Mathematics and Computer Science in the Netherlands, Stijn van Dongen created the Markov Cluster Algorithm, MCL algorithm [24]. It is an unsupervised cluster approach for networks that is extremely quick and scalable and is based on the simulation of graph flow. It is employed in bioinformatics and other fields. The distance matrix derived from the STRING global scores in our study serves as the input to MCL. Higher global scores for these interacting proteins increase the likelihood that they will cluster together. The MCL [7, 8] operates primarily in two ways: expanding the operation corresponds to the multiplication of standard matrices and simulates how a flow spreads and becomes more homogeneous. The next is inflation which is described logically as a diagonal scaling proceeded by a Hadamard power. Flow is compressed by inflation by thickening only in areas where current density is high and thinning only in areas where current density is low. There is no way to know how many clusters there are. With the help of the inflation parameter, it is implicitly managed. Higher inflation results in more clusters being obtained, which is indirectly connected to the clustering’s precision. Here, the inflation value has been set at 2.

Results and Discussion

The global properties of Omicron base lineage variants are shown in Table 1. All the seven network except BA.1.1.259 has an average node degree greater than 1. The 3 base lineage (BA.1.1.259, BA.1, BA.1.1) has same density 0.0284. The highest density is 0.06719 (BA.4) and the lowest density belongs to BA.3 (0.00416). The average LCC is pretty good (highest 0.771). The best network is the BA.2 with the smallest p value (0.00038). Table 2 shows the global features of the Omicron PPIN. Node degree is 1.53 on average and the density is 0.0228. The information in Table 3 contains the centrality scores of 68 proteins, which allow us to identify the protein’s relevance. The network has a maximum degree of 7 with an average local clustering coefficient (LCC) of 0.385. The LCC range from 0 to 1, and they represent the density of connections among neighbors. Nodes that have higher values belong to densely connected clusters. The node is considered a part of the clique if it has a value of 1. The proteins GRB7, KCNK17, NDUFB5, NDUFV1, RPSA, SNRPB, SNRPD1, and SNRPE in Table 3 are containing CCo value as 1 as they are part of the clique. Figure 14 is showing the PPI network of Omicron and the score of the CM are visualizing in Figs. 8, 9, 10, 11, 12 and 13. We have calculated the maximum value of each centrality measure and divided it by two to get each category’s threshold value. The threshold value will help us to signify the important protein in the network. We have highlighted the significant protein by getting the intersection of all the important proteins of each category (CC, DC, and, PCR). A total of 8 significant proteins were detected from 68 unique proteins. In our research work, we have extracted the 18 network cluster from the Omicron main network with the help of the Markov clustering algorithm shown in Figs. 14 and 15. In Table 4, we can see, cluster ${\mathcal {C}}_1, {\mathcal {C}}_2$ and ${\mathcal {C}}_3$ has 4 protein in each, ${\mathcal {C}}_4$ to ${\mathcal {C}}_8$ has 4 protein in each and rest of the clusters are containing 2 protein in each.

Table 1 Global properties of Omicron lineage variant’s network

Full size table

Table 2 Global properties of Omicron network

Full size table

Table 3 Centrality measure and some important score of 68 Omicron protein

Full size table

Table 4 Generated 18 clusters from MCL algorithm

Full size table

Conclusion

Centrality analysis are very useful for analyzing large biological networks. Using a candidate gene network of Omicron as a case study, we investigated and compared different centrality measures. According to the findings, it is beneficial to explore candidate gene networks using methods from other fields of science such as social network analysis. On the 7 base lineage of Omicron variations, including the 68 unique protein encoded by the Omicron candidate gene, graph analysis is done. From the Omicron main network, we extracted the Markov clustering algorithm’s findings i.e., 18 network clusters. The primary Omicron network has 68 nodes, each of which represents a protein. Of the 68 proteins, 8 were found to be significant, including AHSG, KCNK1, KCNQ1, MAPT, NR1H4, PSMC2, PTPN11, and UBE21, with the MAPT protein receiving the highest score. The MAPT protein has the most dominating influence on the protein–protein interaction network of the Omicron candidate gene, according to the centrality score. Medical researchers as well as the general public will benefit from this work, as it will allow them to consider biological knowledge in network analysis of the Omicron virus.

Analysis of networks can benefit greatly from centrality measures. However, they are also required to be properly informed, selected, and applied. As part of our main research work, we present information about the four major centrality measures that have been found to be relevant for finding the most significant proteins in the Omicron Lineage Variants PPIN. A wide range of new and large networks are being created and developed due to different applications and different centrality measures. The majority of studies have tried to demonstrate the uniqueness and superiority of their centrality measures. We still have a lot to learn about making a difference and applying them properly. This is how we presented it.

References

Arumugam A, Arnold EI. Identification of target genes in cancer diseases using protein–protein interaction networks. Netw Model Anal Health Inform Bioinform. 2019;8(1):1–13.
Google Scholar
Ashtiani M, Salehzadeh-Yazdi A. A systematic survey of centrality measures for protein–protein interaction networks. BMC Syst Biol. 2018;12(1):1–17.
Article Google Scholar
Bringmann LF, Elmer T. What do centrality measures measure in psychological networks? J Abnorm Psychol. 2019;128(8):892.
Article Google Scholar
Brohée S, van Helden J. Evaluation of clustering algorithms for protein–protein interaction networks. BMC Bioinform. 2006;7(1):1–19.
Article Google Scholar
Carrington PJ. Crime and social network analysis. SAGE Handb Soc Netw Anal. 2011:236–255.
Das K, Samanta S, Pal M. Study on centrality measures in social networks: a survey. Soc Netw Anal Min. 2018;8(1):1–11.
Article Google Scholar
Das M, Alphonse P, Kamalanathan S. Markov clustering algorithms and their application in analysis of PPI network of malaria genes. In: IDAACS, vol. 2. IEEE; 2021. p. 855–60.
Das M, Alphonse P, Kamalanathan S. An analytical study of COVID-19 dataset using graph-based clustering algorithms. In: Smart intelligent computing and applications, vol. 1. Springer; 2022. p. 1–15.
Diansyah MR, Kusuma WA. Analysis of protein–protein interaction using skyline query on Parkinson disease. In: ICACSIS. IEEE; 2019, p. 175–80.
Freeman LC. A set of measures of centrality based on betweenness. Sociometry (1977);40(1):35–41. https://doi.org/10.2307/3033543.
Article Google Scholar
Ghasemi M, Seidkhani H, Tamimi F, Rahgozar. Centrality measures in biological networks. Curr Bioinform. 2014;9(4):426–41.
Article Google Scholar
Grunspan DZ, Wiggins BL, Goodreau SM. Understanding classrooms through social network analysis: a primer for social network analysis in education research. CBE-Life Sci Educ. 2014;13(2):167–78.
Article Google Scholar
Iván G, Grolmusz V. When the Web meets the cell: using personalized PageRank for analyzing protein interaction networks. Bioinformatics. 2011;27(3):405–7.
Article Google Scholar
Jayaweera IMLN, Perera KKKR, Munasinghe J. Centrality measures to identify traffic congestion on road networks: a case study of Sri Lanka. IOSR J. Math. (IOSR-JM) 2017;13(2):13–19
Article Google Scholar
Joyce KE, Laurienti PJ, Burdette JH. A new measure of centrality for brain networks. PLoS ONE. 2010;5(8): e12200.
Article Google Scholar
Khojasteh H, Khanteymoori A, Olyaee MH. Comparing protein-protein interaction networks of SARS-CoV-2 and (H1N1) influenza using topological features. Sci Rep. 2022;12(1):1–11.
Article Google Scholar
Laeuchlia J, Ramírez-Cruzb Y. Analysis of centrality measures under differential privacy models. Appl Math Comput. 2022;412: 126546.
Article MathSciNet MATH Google Scholar
Landherr A, Friedl B, Heidemann J. A critical review of centrality measures in social networks. Bus Inform Syst Eng. 2010;2(6):371–85.
Article Google Scholar
Sabidussi G. The centrality index of a graph. Psychometrika. 1966;31(4):581–603.
Article MathSciNet MATH Google Scholar
Satuluri V, Parthasarathy S. Scalable graph clustering using stochastic flows: applications to community discovery. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p. 737–46, 2009.
Sparrow MK. The application of network analysis to criminal intelligence: an assessment of the prospects. Soc Netw. 1991;13(3):251–74.
Article Google Scholar
Szklarczyk D, Franceschini A, Wyder S, et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43(D1):D447–52.
Article Google Scholar
The UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucl Acids Res. 2021;49(D1):D480-9.
Article Google Scholar
van Dongen S. A cluster algorithm for graphs. Inf Syst. 2000.
Wang J, Hou X, Li K, Dinga Y. A novel weight neighborhood centrality algorithm for identifying influential spreaders in complex networks. Physica A. 2017;475:88–105.
Article Google Scholar
World Health Organization. Office of Library and Health Literature Services. Styles for bibliographic citations: guidelines for WHO-produced bibliographies. ONLINE. 1988.

Download references

Funding

This is the work of the first author under her doctoral. This research received no external funding.

Author information

K. Selvakumar and P. J. A. Alphonse contributed equally to this work.

Authors and Affiliations

Department of Computer Applications, NIT Trichy, NH 83, Trichy, Tamil Nadu, 620015, India
Mamata Das, K. Selvakumar & P. J. A. Alphonse

Authors

Mamata Das
View author publications
You can also search for this author in PubMed Google Scholar
K. Selvakumar
View author publications
You can also search for this author in PubMed Google Scholar
P. J. A. Alphonse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Mamata Das or K. Selvakumar.

Ethics declarations

Conflict of interest

On behalf of all the authors, the corresponding author states that there is no conflict of interest.

Research Involving Human Participants and/or Animals

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Enabling Innovative Computational Intelligence Technologies for IOT” guest edited by Omer Rana, Rajiv Misra, Alexander Pfeiffer, Luigi Troiano and Nishtha Kesswani.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Das, M., Selvakumar, K. & Alphonse, P.J.A. Analyzing and Comparing Omicron Lineage Variants Protein–Protein Interaction Network Using Centrality Measure. SN COMPUT. SCI. 4, 299 (2023). https://doi.org/10.1007/s42979-023-01685-5

Download citation

Received: 30 August 2022
Accepted: 10 January 2023
Published: 30 March 2023
DOI: https://doi.org/10.1007/s42979-023-01685-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Analyzing and Comparing Omicron Lineage Variants Protein–Protein Interaction Network Using Centrality Measure

Abstract

Similar content being viewed by others

NET-GE: a novel NETwork-based Gene Enrichment for detecting biological processes associated to Mendelian diseases

Comparison of tissue/disease specific integrated networks using directed graphlet signatures

LENS: web-based lens for enrichment and network studies of human proteins

Introduction

Related Work