Data landscape and data access
ExomiRHub provides 191 human extracellular miRNA expression datasets associated with 112 disease phenotypes, 62 treatments, and 24 genotypes, including 2,656 miRNAs, 29,198 samples, and 23 sample types. (Fig. 1A). The statistic results shown that 80.63% (154/191) of the datasets provide miRNA expression profile from extracellular vesicle (including 145 datasets of exosome), while the rest 19.37% (37/191) provide circulating miRNA profiles of serum (13.09%, 25/191), plasma ( 4.71%, 9/191), and whole blood (1.57%, 3/191). These sample types include extracellular vesicles, exosomes, and circulating miRNA derived from whole blood, serum, plasma, urinary, ascites, cerebrospinal fluid, endometrial fluid, follicular fluid, pericardial fluid, Liquid milk, saliva, tissue fluid, umbilical cord blood, tracheal aspirate, faecal fluid, and the culture supernatant of cells with specific genotype and treatment. While the treatments include adjuvant chemotherapy, antibiotic, cell therapy, chemotherapy, immunotherapy, radiotherapy, target therapy, etc. The further statistical analysis indicated that the disease phenotypes included in ExomiRHub were associated with more than 23 body sites. The top six body sites with the largest numbers of the extracellular miRNA expression datasets are blood system, breast, brain, lung, intestine, and pancreas. Moreover, it was found that China, Japan, and USA are the top three countries with the largest contribution to the datasets, accounting for 25.65% (49/191), 24.08% (46/191) and 20.42% (39/191) respectively. Furthermore, we summarized the data features of body site, sample type, isolation method, sample resource, extraction method, disease phenotype, genotype, and treatment on the “Home” and “Browse” webpages of the database platform.
Currently, about 63.87% (122/191) of the extracellular miRNA datasets are associated with 52 cancer sub-types. In order to enhance the usability of the database platform in the field of cancer research, ExomiRHub further integrated the human miRNA expression quantification data of 16,012 samples and 156 cancer sub-types from 42 TCGA projects (Fig. 1A). In addition, each sample from the TCGA projects was annotated with rich biospecimen and clinical data, including various demographics, diagnosis, progression, tumor microenvironment, and lifestyle, which facilitates users to browse and select specific samples for designing and defining the analytical and visual comparison.
In addition, ExomiRHub enables to quikly search and browse interesting extracellular miRNA expression dataset and further navigate to the ExomiRlyzer application for discover significant miRNA through user-definedly investigating on the dataset. All data in the ExomiRHub database platform can be freely downloaded for research, and it provides a variety of data formats for users to download, including CSV, TABLE, and JSON. Moreover, ExomiRHub encourages users to submit their novo human extracellular miRNA expression data to the database platform through the submit webpage (http://hpcc.siat.ac.cn/exomirhub/submit). Once the submitted data was approved by our submission review committee, these data will be included in the next version.
Comprehensive Web Analysis And Visualization Function, Server, And Tool
To understand the role of extracellular miRNA and further discover non-invasive biomarkers in human disease, the web applications of ExomiRlyzer and TCGA-miRNAlyzer have been designed and developed on ExomiRHub to provide four cutting-edge bioinformatics tool-kits, including Differential Expression Tool-kit, Co-expression Tool-Kit, WGCNA Tool-kit, and Feature Selection Tool-kit. These four tool-kits provide 25 analytical and visualization functions to integrated extracellular miRNA expression data and cancer related miRNA expression data. It involves various aspects of differential expression, co-expression, WGCNA, and GO function enrichment, COX Regression, Least Absolute Shrinkage and Selection Operator (LASSO), Receiver Operating Characteristic (ROC), and Survival analysis and visualization. Due to the samples in each dataset were annotated with rich biomedical information, all 25 functions were designed with customized grouping and setting format, and allow users to select specific samples, define their own groups and parameters for their own comparison analysis (Fig. S1 and S2). We demonstrated with more details on the two web applications and four tool-kits with their analytical and visualization functions in Supplementary Table S1 and Fig. 2.
To serve a wider community of research, ExomiRHub provides the Web Service application, which is a systematic pipeline to integrate and standardize the miRNA expression data and its sample metadata uploaded by users, and its further assigns a temporary identifier to the uploaded data, so that they can perform the six, five, and ten further analyses and visualizations on their uploaded data in the Differential Expression, Co-expression and WGCNA tool-kits, respectively (Fig. 3 and Fig. 2A-C). To protect the security of the data uploaded by users, this application enables them to delete their uploaded data at any time through its assigned temporary identifier, and we also clean up the uploaded data monthly (Fig. 3).
To increase the application efficiency of ExomiRHub in the community of miRNA research, ExomiRHub provides four additional tools to predict and validate the potential function and target (such as mRNA, lncRNA, and circRNA) of miRNA and its variation. The four tools include miRNA Function Prediction, miRNA Mutant Evaluation, miRNA Target Prediction, and miRNA Target Validation. A usage guide of the above analytical and visualization functions, servers, and tools were addressed on the “Help” webpage (http://hpcc.siat.ac.cn/exomirhub/help). Last but not least, the results from these applications are supported to generate publication-quality vector images in PDF format and tables for further analysis and download (Fig. 2A-D).
Case study: discover non-invasive miRNA biomarkers associated with angiogenesis for diagnosis and monitoring progression of glioma
In order to discover and identify non-invasive biomarkers for diagnosis and monitoring progression of glioma, we performed comprehensive analysis on an exosomal miRNA dataset in ExomiRHub (ID: EMIR00000186), which provides miRNA expression profiles derived from plasma exosomes of healthy controls and glioblastoma patients. First, the WGCNA analysis results showed that plasma exosomal miRNAs could be used to distinguish glioblastoma patients from their healthy controls and obviously divided the patients into two subgroups named G1 and G2 (Fig. 4A). Subsequently, the hierarchical cluster analysis confirmed that the two subgroups have distinct exosomal miRNA expression profiles (Fig. 4B and Fig. S3). Moreover, we further performed WGCNA analysis on the two subgroups (Fig. S4) and identified four modules of eigengenes that associated with the two subgroups (Fig. 4C) and showed significant correlation with each other (Fig. 4D), including MEbrown, MEred, MEyellow, and MEturquoise. Further analysis results suggested that these four modules of eigengenes consistently enriched on the GO terms of regulation of angiogenesis and vasculature development (Fig. 4E-H), which GO terms have been proved to be closely related to the development, treatment and prognosis of glioma [32, 33].
To discover the difference of angiogenesis related pathway between G1 and G2, we conducted differential analysis and found 48 dysregulated exosomal miRNAs (|Log2(fold change)| >= 0.5 & p-value < = 0.05) between the two subgroups, including 21 down-regulated and 27 up-regulated (Fig. 5A & Supplementary Table S3). In line with the GO function enrichment analysis on the module eigengenes, we confirmed that these differential expression exosomal miRNAs enriched on the angiogenesis related pathways (Fig. 4I). For example, the dysregulation of hsa-miR-132-5p and hsa-miR-200a-5p associates with the positive regulation of blood vessel endothelial cell migration (Fig. 4I and Fig. 5A-C), and they showed significant correlation (Fig. 5D). Furthermore, the function of hsa-miR-132-5p and hsa-miR-200a-5p consistently enriched on the angiogenesis related pathway based on the GO annotations of their co-expression exosomal miRNAs (Fig. S5).
In order to further address the clinical significance of the two exosomal miRNAs in glioma, we performed COX regression and survival analysis on the glioma dataset integrated from TCGA on ExomiRHub. The analysis results suggested that the down-regulation of angiogenic genes of hsa-mir-132 and hsa-mir-200a can significantly decrease the death event (Fig. 6A) and prolong the overall survival rate of glioma patients (Fig. 6B). In consistent with the above analysis results, the two miRNAs were up-regulated in high-grade glioma (HGG, WHO III & IV) compared with low-grade glioma (LGG, (WHO I & II)) (Fig. 6C&D) and showed significant correlation (Fig. 6E). Moreover, the ROC analysis result suggested that hsa-mir-132 (AUC = 0.98) and hsa-mir-200a (AUC = 0.88) could be used as independent factors to distinguish HGG from LGG (Fig. 6F). Thus, these results imply that the plasma exosomal miRNAs of hsa-miR-132-5p and hsa-miR-200a-5p may regulate the glioma progression through regulation of angiogenesis, and they have potential as non-invasive biomarkers for diagnosis and monitoring the progression of glioma.
In addition, the differential expression and WGCNA analysis on the miRNA expression datasets of LGG and HGG patient tissues suggested that the two different grades patients have significantly different miRNA expression patterns, and the differential features were consistently enriched on the functional pathway of angiogenesis regulation (Fig. 7 and Fig. S6). Clinically, we know that glioblastoma (WHO Grade IV) can be differentiated from anaplastic glioma (WHO Grade III) by the presence of neoplastic vasculature, so our non-invasive markers are in accordance with the known pathological feature. Altogether, these findings consistently suggested that miRNA plays a critical role in the progression of LGG to HGG through regulating angiogenesis related functional pathways, but need to be further validated by further in vitro and in vivo experiments, and other large-scale clinical analysis.