Abstract

Tuberculosis (TB) is the world's most prevalently infectious disease. Molecular mechanisms behind tuberculosis remain unknown. microRNA (miRNA) is involved in a wide variety of diseases. To validate the significant genes and miRNAs in the current sample, two messenger RNA (mRNA) expression profile datasets and three miRNA expression profile datasets were downloaded from the Gene Expression Omnibus (GEO) database. The differentially expressed (DE) genes (DEGs) and miRNAs (DE miRNAs) between healthy and TB patients were filtered out. Enrichment analysis was executed, and a protein-protein interaction (PPI) network was developed to understand the enrich pathways and hub genes of TB. Additionally, the target genes of miRNA were predicted and overlapping target genes were identified. We studied a total of 181 DEGs (135 downregulated and 46 upregulated genes) and two DE miRNAs (2 downregulated miRNAs) from two gene profile datasets and three miRNA profile datasets, respectively. 10 hub genes were defined based on high degree of connectivity. A PPI network's top module was constructed. The 23 DEGs identified have a significant relationship with miRNAs. 25 critically significant Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were discovered. The detailed study revealed that, in tuberculosis, the DE miRNA and DEGs form an interaction network. The identification of novel target genes and main pathways would aid with our understanding of miRNA's function in tuberculosis progression.

1. Introduction

Tuberculosis (TB) is one of the most common infectious diseases, caused by the pathogen Mycobacterium tuberculosis (MTB). It was linked to a high rate of infection and a long-term disease course [1]. TB is one of the top 10 causes of death according to the World Health Organization (WHO)'s global TB report for 2020. Each year, about ten million people were infected with tuberculosis. TB was identified as a repetitive immune reaction, and a subset of patients with tuberculosis will grow into active tuberculosis. It has a close connection with poverty. Around 90% of tuberculosis patients stayed in underdeveloped countries [2]. There are many diagnostic tools available today, including the tuberculin skin examination (TST), the interferon gamma release assay (IGRA), and imaging procedures. They all lack precision and are more expensive and highly technical [3, 4].

microRNAs (miRNAs) are noncoding RNA molecules that have 22 to 23 nucleotides [5]. They regulate gene expression at the posttranscriptional stage by facilitating mRNA degradation and preventing mRNA translation [6]. When a patient is infected with tuberculosis, major miRNAs are released into the bloodstream. MiRNAs have shown to play an important role in a variety of pathological and physiological mechanisms in tuberculosis [7]. Microarray analysis, miRNA, and gene have been extensively utilized in the quest for biomarkers and therapeutic targets in recent years. The discovery of novel DE miRNAs and DEGs provides useful and accurate perspectives for future study [8].

To validate the significant genes and miRNAs, two mRNA expression profile datasets, GSE62147 [9] and GSE34608-GPL6480 [10], and three miRNA expression profile datasets, GSE34608-GPL7731 [10], GSE29190 [11], and GSE49951 [12], were used. The 250 DEGs in speech profile datasets were, respectively, identified. The enrichment analysis of DEGs showed 5 KEGG pathways were related to the development and progress of TB, including ‘hepatocellular carcinoma,’ ‘Kaposi sarcoma-associated herpesvirus infection,’ ‘phosphatidylinositol signaling system,’ ‘circadian entrainment,’ and ‘apelin signaling pathway.’ A protein-protein interaction (PPI) network of DEGs and DEGs-DE miRNAs was created.10 hub genes were then chosen.

The present study established two DE miRNAs, hsa-mir-7 and hsa-mir-451. The gene-encoding hsa-mir-7 and hsa-mir-451 were separately located in human chromosomal region 9q21.32 and 17qll.2. The recent documents presented that Hsa-mir-7 and hsa-mir-451were involved in cancer-related biological processes and innate immune response [13, 14]. But, their future roles in TB remain unknown.

2. Materials and Methods

2.1. Microarray Data Screening

The bioinformatics analysis was carried out in accordance with the protocol depicted in Figure 1. The term 'Tuberculosis” or “microRNA and Tuberculosis” was searched in the Gene Expression Omnibus database (GEO). There are some database criteria that are helpful with more analysis: I) clinical study pairs must have both healthy and tuberculosis patients, and II) mRNA and miRNA data from plasma (Affymetrix miRNA 4.0) were retrieved. As a result, two gene expression profile datasets, GSE62147 and GSE34608-GPL6480, were used in the primal mRNA datasets. GSE62147 had 14 TB-positive donors and 14 healthy donors. GSE34608-GPL6480 had eight TB donors and eighteen healthy donors. Six donors with tuberculosis and three healthy donors contained GSE29190. GSE49951 had 71 TB-positive donors and 71 healthy donors.

2.2. Selecting DEGs and DE miRNAs and Constructing a Volcanic Map

To diagnose DEGs and DE miRNAs in TB patients, the GEO2R online research method (https://www.ncbi.nlm.nih.gov/geo/geo2r/) was used. Exact cutoff values ( < 0.05 and |log fold change (FC)|≥1) were established. The two gene and three miRNA datasets were submitted to VENN's online tool (http://bioinformatics.psb.ugent.be/webtools/Venn/). The DEGs and DE miRNAs that overlap were filtered out. The volcanic chart of overlapping DEGs was constructed using the online method of bioinformatics (http://www.bioinformatics.com.cn/).

2.3. GO and KEGG Pathway analysis

Gene Ontology (Go) was applied to perform the category biological process (BP), molecular function (MF), and cellular component (CC) enrichment analysis [15]. The KEGG was a set of databases that provides information regarding biological mechanisms, cellular processes, chemical substances, and diseases [16]. The R software package clusterProfiler was used in functional gene annotations and KEGG enrichment analysis [17]. A cutoff criterion ( < 0.05 and FDR<0.05) was set. Also, the top 10 pathways with the maximum number of genes for the corresponding term were chosen. Significant items of GO and KEGG were submitted to R software package ggplot2 to visualize and merge enriching analysis. The enrichment dot bubble method was used to build the bubble plot.

2.4. Construction of Network and Identification of Top Modules and Hub Genes

Permanent DEGs were uploaded to the STRING online tool (https://string-db.org/) [18]. A cutoff point (interaction score >0.4) was established. The PPI network was constructed using the CytoScape program [19]. The MCODE plug-in was used to determine the PPI network's top module. Strict cutoff conditions (degree cutoff = 2, node score cutoff = 0.2, k-core = 2, and maxdepth = 100) were established. Meanwhile, the top 10 hub genes were identified using the CytoHubba plug-in.

2.5. Identification of the Target Gene and Construction of the DE miRNA-DEGs Regulatory Network

To predict miRNA target genes, the online tools TargetScan (http://www.targetscan.org/vert71) [20], miRDB (http://www.mirdb.org/) [21], miRWalk (http://mirwalk.uni-hd.de/) [22], and miRTarBase (http://mirtarbase.cuhk.edu.cn/) [21] were used. The criterion for goal gene selection is cumulative weighted context++ score online > -0.5. The target gene that met the criteria was chosen from three databases. The overlapping target gene of miRNA was discovered using the VENN online method. CytoScape was used to upload miRNA and DEGs. To visualize and merge analysis, a DE miRNA-DEGs regulatory network was developed.

3. Results

3.1. Screening of DEGs and DE miRNAs

Under the precise cutoff criteria employed ( < 0.05 and |log fold change (FC)|≥1), there were 614 and 4211 DEGs extracted from GSE6214 and GSE34608-GPL6480, respectively. Following that, DE miRNAs were extracted from GSE34608-GPL7731, GSE29190, and GSE49951, respectively, yielding 174, 36, and 23 DE miRNAs. Two gene expression profile datasets yielded a total of 181 overlapping DGEs (135 downregulated genes and 46 upregulated genes) (Figure 2(b) and Table 1). Two overlapping DE miRNAs (two downregulated and zero upregulated) were derived from three miRNA expression profile datasets (Figure 2(c) and Table 2). The volcano plot of three gene expression profile datasets revealed a substantial difference between regular and tuberculosis patients (Figure 2(a)).

3.2. GO Functional and KEGG Pathway Enrichment analysis

DEGs that were consistent were submitted to R software for GO functional and KEGG pathway enrichment analysis. GO-CC showed that upregulated DEGs were mainly associated with ‘sarcoplasm’ and downregulated DEGs were mainly associated with ‘specific granule,’ ‘tertiary granule,’ ‘secretory granule lumen,’ ‘cytoplasmic vesicle lumen,’ ‘vesicle lumen,’ ‘secretory granule membrane,’ ‘specific granule membrane,’ ‘specific granule lumen,’ ‘tertiary granule membrane,’ ‘primary lysosome,’ and ‘azurophil granule.’ GO-BP showed that downregulated DEGs were primarily involved in ‘neutrophil-mediated immunity,’ ‘neutrophil activation,’ ‘neutrophil degranulation,’ ‘neutrophil activation involved in immune response,’ ‘response to lipopolysaccharide,’ ‘response to a molecule of bacterial origin,’ ‘defense response to a bacterium,’ ‘regulation of response to biotic stimulus,’ ‘humoral immune response,’ and ‘positive regulation of anion transport.’ GO-MF showed that downregulated DEGs were mainly related with ‘cysteine-type endopeptidase inhibitor activity,’ ‘virus receptor activity,’ and ‘exogenous protein binding’ (Figure 3(a)3(d) and Table 3).

The result represented that upregulated DEGs were significantly associated with 5 KEGG pathways, including ‘hepatocellular carcinoma,’ ‘Kaposi sarcoma-associated herpesvirus infection,’ ‘phosphatidylinositol signaling system,’ ‘circadian entrainment,’ and ‘apelin signaling pathway’ (Figure 3(e), Table 4).

3.3. Hub Gene Identification Based on the DEG PPI Network and Module Analysis

The STRING online tool was updated with 181 common DEGs (135 downregulated and 46 upregulated). Significant DEGs were identified in 135 of 181 DEGs. Significant DEGs were visualized in detail using the CytoScape software. The PPI network was constructed with 122 nodes and 241 edges (Figure 4(a)). Additionally, 59 of 181 DEGs were not part of the PPI network. The hub genes contained Cathelicidin Antimicrobial Peptide (CAMP), CEA Cell Adhesion Molecule 8 (CEACAM8), CD59 molecule (CD59), C-Type Lectin Domain Containing 5A (CLEC5A), Stomatin (STOM), Membrane-Spanning 4-Domains A3 (MS4A3), G Protein-Coupled Receptor 84 (GPR84), Late Endosomal/Lysosomal Adaptor, MAPK and MTOR Activator 3 (LAMTOR3), Defensin Alpha 4 (DEFA4), and CEA Cell Adhesion Molecule 1 (CEACAM1) (Figure 4(b)). The top significant module was defined from the PPI network using the MCODE plug-in based on its degree value. The top module featured fifteen nodes and forty-five corners (Figure 4(c)).

3.4. Construction of the DEGs-DE miRNA Network

Because of the shortcomings of each dataset, hsa-mir-7 and hsa-mir-451 were each submitted to four accurate datasets for target gene prediction. Using the VENN online research website, miRNA target genes were integrated with DEGs. The identified data revealed that hsa-target mir-7 genes shared 19 common genes with DEGs (Figure 5(a)) and the hsa-goal mir-451 gene shared four genes with DEGs (Figure 5(b)). Meanwhile, CytoScape developed the DEGs-DE miRNA PPI network (Figure 6).

4. Discussion

In recent years, a number of studies have been carried out to reveal the potential mechanisms of tuberculosis. The prevalence of tuberculosis has been steadily rising. Traditional studies have two flaws: a single genetic case and a limited cohort [23]. Bioinformatics analysis of gene expression profile datasets of TB patients is now being used to screen more reliable data. A total of 181 common DEGs (135 downregulated genes and 46 upregulated genes) were extracted from two gene expression profile datasets in this study. The hsa-mir-7 and hsa-mir-451, two DE miRNAs, were extracted from four gene expression profile datasets. Both hsa-mir-7 and hsa-mir-451 were reduced in expression. The 181 DEGs underwent GO and KEGG enrichment analysis, allowing them to be classified into BP, CC, MF, and KEGG groups. Finally, a DEGs-DE miRNA PPI network was built. The PPI network was used to screen the top module and ten hub genes.

To determinate the underlying molecular mechanisms in the TB process, the most enriched BP, CC, and MF pathways were combined with downregulated and upregulated DEGs for comprehensive analysis, separately. GO analysis presented that upregulated DEGs were mainly related with sarcoplasm and downregulated DEGs were mainly related with neutrophil, antibacterial activities, primary lysosome, and exogenous protein binding. The comprehensive analysis of GO and hub genes demonstrated that hub genes mainly associated with neutrophil and primary lysosome signal pathways. KEGG analysis presented that upregulated DEGs were mostly involved in hepatocellular carcinoma, Kaposi sarcoma-associated herpesvirus infection, the phosphatidylinositol signaling system, circadian entrainment, and the apelin signaling pathway. Most of the abovementioned KEGG pathways play an important role in the immune response and apoptosis of Mycobacterium tuberculosis. Kaposi sarcoma-associated herpesvirus is well known to be involved in antiapoptosis, enhancement of cytokine production, and cell proliferation [24]. The previous study reported that MTB enhances bacterial virulence by avoiding host cell death [25]. Phosphatidylinositol is a lipid anchor, which as virulence factor and modulate host immune response in MTB [26]. Also, the apelin decreases mitochondrial apoptosis, mitochondrial ROS-triggered oxidative damage, and NF-κB activation to inhibit acute lung injury (ALI) and acute respiratory distress syndrome (ARDS) [27]. Circadian entrainment was also involved in innate immune response and photoperiod significant impairment and enhancing immune function [28].

According to a recent study, hsa-mir-7 impaired NF-κB and AKT transcriptional activity. The NF-κB and AKT pathways are important inflammatory-associated pathways and inhibited host cell autophagy in tuberculosis [4, 29, 30]. The hsa-mir-7 was thought to be involved in innate immune responses in a previous study [31]. Meanwhile, hsa-mir-451 protects against cell death caused by ischemia/reperfusion injury, cancer, and myocardial I/R injury [3234]. CAMP, CEACAM8, CD59, CLEC5A, STOM, MS4A3, GPR84, LAMTOR3, CEACAM1, and DEFA4 are among the ten hub genes discovered. They act as TB immune mediators and, additionally, as an active participant in the control of the inflammatory response [3540]. Meanwhile, macrophages are innate immune cells that serve as a guardian in the immune system's protection and response to microbial infection [41]. GPR84 is a member of the family of metabolic G protein-coupled receptors [42]. GPR84 activation enhances bacterial adhesion and phagocytosis in macrophages. According to a recent study, elevated glucose levels promote GPR84 expression [43]. STOM is a member of the family of integral membrane proteins [44]. As previously mentioned, STOM's primary role is to regulate glucose transporter type 1 operation. Furthermore, elevated glucose levels augment macrophage anti-inflammatory function [45]. GPR84 and STOM are closely associated with the adaptive immune response, as determined by experimentation. CAMP is an antimicrobial peptide that is synthesized at the C-terminus of proteins [46]. It eradicated MTB and slowed the progression of tuberculosis. CAMP is a vitamin D-related gene that would be triggered in response to vitamin D. It is essential for macrophages infected with MTB to benefit from the antimicrobial reaction induced by vitamin D. CAMP activation enhances the effectiveness of immune responses in tuberculosis [47]. CEACAM1 and CEACAM8 are members of the class of heavily glycosylated carcinoembryonic antigens [48, 49]. The primary role of CEACM1 is antiapoptotic. It is involved in granulocyte survival [50]. It is classified as an immunoregulatory checkpoint regulator. Previous research demonstrated that inadequate CEACAM1 resulted in inflammatory exacerbation [51]. CEACM8 is a responsive granulocyte biomarker. It performs two functions: it recognizes neutrophils and degrades extracellular matrix, thus stimulating the immune response [52]. MTB has been shown to control the immune system by controlling the macrophage cell cycle. It prevents macrophages from entering the interphase and gap phase 1 phases [53]. MS4A3 is a member of the family of membrane-spanning 4A genes [54]. MS4A3 is intimately linked to the cell cycle. As a consequence, the activation of the kinase-associated phosphatase (KAP) leads to cell cycle arrest. CLEC5A is a member of the superfamily C-type lectin/C-type lectin-like domain (CTL/CTLD) [54]. CLEC5A controls cell development by inducing apoptosis and arresting the cell cycle. Additionally, it plays a critical function in the production of inflammation and serves as a critical gene for the clinical management of pulmonary inflammation [55]. DEFA4 is a part of the lipocalin family, which is involved in the transportation of vitamins, lipids, and steroid hormones [56]. It inhibits bacterial growth by attaching to pathogenic bacterial siderophores. It plays a significant role in immune response and protection. In recent years, LAMTOR3 and CD59 have been extensively used to inhibit the development of a variety of cancers [57, 58]. CD59's primary role is to regulate immune cell activation throughout the tumor microenvironment [59]. Additionally, the top module was defined through the PPI network of DEGs. The top module included a large number of genes associated with macrophages and the innate immune response. Previously published studies established that macrophages act as a host cell for MTB and are active in the innate immune response. MTB destroyed the macrophage's main immune systems, antigen introduction and intracellular killing [41].

The important DEGs were established using integrated bioinformation analysis. GPR84, STOM, CAMP, CEACM8A, MS4A3, LAMTOR3, DEFA4, CLEC5A, and CD59 were the ten hub genes that were filtered out. Two DE miRNAs, hsa-mir-7 and hsa-mir-451, were discovered. Some important pathways, including neutrophil, antibacterial activities, primary lysosome, exogenous protein binding, the apelin signaling pathway, Kaposi sarcoma-associated herpesvirus infection, and the phosphatidylinositol signaling system, are involved in TB. DEGs and DE miRNAs may be used as sensitive biomarkers in the clinical diagnosis of tuberculosis. The network of DE miRNA-DEGs revealed the molecular mechanism causing tuberculosis. More accurate clinical samples and experiments are needed to validate the cause of tuberculosis. In the future, bioinformatics research could be able to identify novel genes and pathways for use in genomic therapy for tuberculosis.

Data Availability

There are no files that need to be deposited in public repositories among the data in the manuscript. The manuscript contains all necessary information about the data collected.

Conflicts of Interest

The authors declare no conflicts of interest.

Authors’ Contributions

Shejie Shen contributed equally to this work.