A Null Model for Pearson Coexpression Networks

Andrea Gobbi; Giuseppe Jurman

doi:10.1371/journal.pone.0128115

Abstract

Gene coexpression networks inferred by correlation from high-throughput profiling such as microarray data represent simple but effective structures for discovering and interpreting linear gene relationships. In recent years, several approaches have been proposed to tackle the problem of deciding when the resulting correlation values are statistically significant. This is most crucial when the number of samples is small, yielding a non-negligible chance that even high correlation values are due to random effects. Here we introduce a novel hard thresholding solution based on the assumption that a coexpression network inferred by randomly generated data is expected to be empty. The threshold is theoretically derived by means of an analytic approach and, as a deterministic independent null model, it depends only on the dimensions of the starting data matrix, with assumptions on the skewness of the data distribution compatible with the structure of gene expression levels data. We show, on synthetic and array datasets, that the proposed threshold is effective in eliminating all false positive links, with an offsetting cost in terms of false negative detected edges.

Citation: Gobbi A, Jurman G (2015) A Null Model for Pearson Coexpression Networks. PLoS ONE 10(6): e0128115. https://doi.org/10.1371/journal.pone.0128115

Academic Editor: Francisco J. Esteban, University of Jaén, SPAIN

Received: February 11, 2014; Accepted: April 23, 2015; Published: June 1, 2015

Copyright: © 2015 Gobbi, Jurman. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Funding: The work has been funded by Fondazione Bruno Kessler. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Universally acknowledged by the scientific community as the basic task of systems biology, network inference is the prototypical procedure for moving from the classical reductionist approach to the novel paradigm of data-driven complex systems in the interpretation of biological processes [1].

The goal of all network inference (or network reconstruction) procedures is the detection of the topology (i.e., the wiring diagram) of a graph whose nodes belong to a given set of biological entities, starting from measurements of the entities themselves. In the last fifteen years, the reconstruction of the regulation mechanism of a gene network and of the interactions among proteins from high-throughput data such as expression microarray or, more recently, from Next Generation Sequencing data has become a major line of research for laboratories worldwide. The proposed approaches rely on techniques ranging from deterministic to stochastic, and their number is constantly growing. Nonetheless, network inference is still considered an open, unsolved problem [2]. In fact, in many practical cases, the performance of the reconstruction algorithms are poor, due to several factors limiting inference accuracy [3, 4] to the point of making it no better than a coin toss in some situations [5]. The major problem is the under-determinacy of the task [6], due to the overwhelming number of interactions to be predicted starting from a (usually) small number of available measurements. In general, size and quality of available data are critical factors for all inference algorithms.

In what follows the impact of data size is discussed for one of the simplest inference techniques, i.e., the gene coexpression network, where interaction strength between two genes is a function of the correlation between the corresponding expression levels across the available tissue samples. Despite its simplicity, the number of studies in the literature based on coexpression networks is still a large fraction of all manuscripts in systems biology [7–13], including Next Generation Sequencing data [14]. The underlying biological hypothesis is that functionally related genes have similar expression patterns [15], and thus that coexpression is correlated with functional relationships, although this does not imply causality. This implies for instance that genes that are closer in a biological network are likely to have more similar expression [16, 17]: this hypothesis has been validated to some extent, for example by Jensen et al. in [18]. However, a caveat is mandatory here: since correlation is a univariate method, it is unable to capture the relations occurring among genes when the independence hypothesis does not hold, which is a common situation in the -omics datasets [19, 20]. To overcome this problem, characterized by small correlation values between functionally related genes, multivariate approaches are required [21], also in the network case [22].

Notwithstanding this limitation, as highlighted in [23], correlation can help unveil the underlying cellular processes, since coordinated coexpression of genes encode interacting proteins. The Pearson Correlation Coefficient (PCC for short) is used as the standard measure, although alternative correlation measures can be also employed: see for instance [24] for a comparative review of eight well known association metrics. Furthermore, coexpression analysis has been intensively used as an effective algorithm to explore the system-level functionality of genes, sometimes outperforming much more refined approaches [25, 26]. The observation that simpler approaches such as correlation can be superior even on synthetic data has been explained by some authors [27, 28] with the difficulties of a complex algorithm in detecting the subtleties of combinatorial regulation. Coexpression networks can also capture more important features than the conventional differential expression approach [29], and their use has been extended to other tasks, for instance the investigation of complex biological traits [30]. Finally, these networks can be crucial for understanding regulatory mechanisms [31], for the development of personalised medicine [32] or, more recently, in metagenomics [33].

However, as noted in [34], correlation between genes may sometimes be due to unobserved factors affecting expression levels. Moreover, deciding when a given correlation value between two nodes can be deemed statistically significant and thus worthwhile for assigning a link connecting them is a major issue affecting coexpression networks [35]. This translates mathematically into choosing (a function of) a suitable threshold, as in the case of mutual information and relevance networks [36]. As reported in [37], in the literature statistical methods for testing the correlations are underdeveloped, and thresholding is often overlooked even in important studies [38]. The two main approaches known in the literature to solve this problem can be classified as soft or hard thresholding. Soft thresholding is adopted in a well-known framework called Weighted Gene Coexpression Network Analysis (WGCNA) [39], recently used also for other network types [40, 41]. All genes are mutually connected, and the weight of a link is a positive power of the absolute value of PCC, where the exponent is chosen as the best fit of the resulting network according to a scale-free model [42, 43]. This approach, without discarding any correlations, promotes high correlation values and penalises low values. In the hard thresholding approach, instead, only correlation values larger than the threshold are taken into account, and an unweighted link is set for each of these values, so that a binary network is generated (see [44] for one of the earliest references). Clearly, an incorrectly chosen threshold value can jeopardise the obtained results with false negative links (for too strict a threshold) or false positive links (for too loose a threshold). Many heuristics have been proposed for setting the threshold values, such as using the False Discovery Rate [45–47], using a cross-validation strategy [48], the p-value of the correlation test [15, 32], employing partial correlation [49, 50], using rank-based techniques [51–53], more complex randomisation techniques [54], or keeping only values exceeding a minimum acceptable strength (MAS) level specified by the threshold [55]. Alternative approaches have been recently studied, evaluating the correlation distribution, both experimentally [56] and theoretically [57]. However, these studies focus on the level of single interaction rather than considering the whole network.

Moreover, in many studies in the literature, the threshold is not chosen according to an algorithmic procedure, but referring to standard choices [58–61], or to heuristics not directly related to the correlation values, but rather with the resulting network topology [62–71]. In [72] a comparison of some coexpression thresholds is shown on a few microarray datasets. Furthermore, each threshold choice yields a compromise between detecting artefacts (false positives) and neglecting existing connections (false negatives) [56], which can be driven by the cost attached to the task studied. For instance, the optimized threshold selection procedure in [73] attains a very low false positive rate by using a permutation-based strategy [74].

Here we propose a new a priori model for the computation of a hard threshold based on the assumption that a random coexpression graph should not have any edges. This threshold follows from the work of Fisher [75] and Bevington [76], and, as a deterministic independent null model, it depends only on the dimensions of the starting data matrix, with assumptions on the skewness of the data distribution compatible with the structure of gene expression levels data [77, 78]. Hence, the procedure to obtain the threshold is not stochastic and thus deeply different from approaches based on permutation tests [74]. Further, this model is non-parametric, because its only input is the data themselves and no additional quantity needs to be tuned. By definition, this threshold is aimed at minimising the possible false positive links, paying a price in terms of false negative detected edges. This characterising property makes this method especially useful when the intrinsic cost function associated with the studied task is biased towards penalising false positives. We conclude by demonstrating the procedure first on a synthetic dataset and then on an ovarian epithelial carcinoma dataset on a large cohort of 285 cases [79, 80].

A Motivating Example

We show hereafter an example of a common situation arising in the small sample size setting, where PCC can reach extremely high values possibly leading to the detection of false positives.

Consider the HepatoCellular Carcinoma (HCC) dataset, first introduced in [81] and later used in [82] and publicly available at the Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo with accession number GSE6857. The dataset collects 482 tissue samples from 241 patients affected by HCC. For each patient, a sample from cancerous hepatic tissue and a sample from surrounding non-cancerous hepatic tissue were extracted, hybridised on the Ohio State University CCC MicroRNA Microarray Version 2.0 platform consisting of 11,520 probes collecting expressions of 250 non-redundant human and 200 mouse miRNA. After a preprocessing phase including imputation of missing values [83] and discarding probes corresponding to non-human (mouse and controls) miRNA, the resulting dataset HCC includes 240 + 240 paired samples described by 210 human miRNA, with the cohort consisting of 210 male and 30 female patients. In particular, consider now the three microRNA identified as hsa.mir.010b.precNo1, hsa.mir.016a.chr13 and hsa.mir.016b.chr3. A search by sequence with the miRBase web-service [84] shows that the two probes hsa.mir.016a.chr13, and hsa.mir.016b.chr3 share a high alignment score (e-value: 5 · 10⁻⁴[85]), while no link due to coherent coexpression is known between each of these probes and hsa.mir.010b.precNo1. Consistent with this observation, the absolute PCC |ρ| on the dataset H consisting of the 210 tumoral samples from male patients is When we restrict the analysis to a suitable small subset of patients, different situations can occur. Consider in fact the following sub-cohort of 10 patients On S, PCC reads as follows: that is, the correlation between the two unrelated miRNA on S is very high and even higher than the correlation of the two aligned probes. The curves (averaged on 1000 runs) of PCC versus the sample size when the remaining samples are increasingly and randomly added (while S remains the same across the 1000 runs) are shown in Fig 1. When sample size increases, |ρ(hsa.mir.010b.precNo1,hsa.mir.016b.chr3)| quickly drops down to 0.536, while the correlation of the two aligned probes remains almost constant.

Download:

Fig 1. HCC dataset: PCC versus sample size for the coexpression of hsa.mir.016a.chr13 with hsa.mir.016b.chr3 (blue line) and with hsa.mir.010b.precNo1 (red line).

When the correlation is computed on the 10-sample set S the correlation between the two uncoexpressed probes is even higher than the correlation between the two probes sharing an almost identical alignment. When more samples are randomly added, the blue line slightly increases, while the red line quickly drops to the final value 0.536 on the whole dataset H. Both curves are averaged over 1000 randomisations of the added samples, keeping the sub-cohort S constant.

https://doi.org/10.1371/journal.pone.0128115.g001

Clearly, if the task were to build a PCC coexpression network N_S on S, whatever the chosen hard threshold t, the network N_S would include either a false positive or a false negative link (or even both, for 0.907 ≤ t ≤ 0.941). Note that the probability of extracting 10-sample sub-cohorts S′ satisfying is about 0.03%, but when the constraints are relaxed the probability of detecting artificially high correlation values |ρ_S′(hsa.mir.010b.precNo1,hsa.mir.016b.chr3)| > 0.75 raises to about 6.2% for this dataset. Situations like those shown in this example regularly occur when building coexpression networks, supporting the need for an algorithmic solution to the problem of hard thresholding in the small sample size setup.

Methods

Distribution of PCC

Given a real number 0 < p < 1, define the function (1) where x and y are two independent normal vectors of length n and ρ is the PCC.

The results in [75, 86–92] and the symmetry of ρ yield that F(n, p) has the following close form: (2) where Γ(x) is the Gamma function $Γ (x) = \int_{0}^{+ \infty} t^{x - 1} e^{- t} d t$ . In Text A in S1 Text we also propose a novel proof of Eq (2) by means of an analytic argument stemming from the work of Bevington in [76, Ch. 7]. Moreover, Eq (2) is also a good approximation in the case of a non-normal distribution of x and y when data skewness can be bounded [93], because of the generalization shown in [89, 94–96]. Non-Gaussian asymmetric distributions can occasionally be detected in some array studies [78]: however, techniques for reducing the skewness are routinely applied during preprocessing [77], and thus the aforementioned results can be safely used in the microarray framework.

Coexpression network and threshold selection

The results derived in the previous section are used here to construct a null model for the correlation network, thus yielding a threshold for the inference of a coexpression network from the nodes’ data. As mentioned in the Introduction, these correlation networks are subject to the hypothesis of independence between genes, so they are not detecting higher-order relations for which a multivariate method is needed. In Text E in S1 Text we show a few synthetic and -omics examples of relations among genes that are not captured by coexpression networks.

In the general situation, we are measuring the signal (expression) of m probes (genes), in n different instances/conditions (samples), to infer the corresponding correlation networks on m nodes.

Formally, let $𝒳 = {x_{i}}_{i = 1}^{m}$ be a set such that x_i ∈ ℝⁿ ∀i = 1, …, m. Then the coexpression p-graph 𝒢_p = {V, E_p} is the graph where

The first result characterizes the coexpression graphs in terms of null models:

Proposition 1 If the vectors x_i are sampled from the uniform distribution, the graph 𝒢_p is an Erdös-Rényi model [97] with m nodes and probability F(n, p) as in Eq (2).

The proof follows immediately from the definition of 𝒢_p and Eq (2).

Definition Using the results in the previous section, the secure threshold $\overline{p}$ is defined as follows, for an arbitrarily small ɛ > 0: (3) for m nodes measured on n samples. As a consequence of [98], stating that the median of the binomial distribution is the integer part of the mean when F(n, p) < 1 − log 2, the secure threshold $\overline{p}$ is the minimum value of p such that the corresponding random coexpression network is on average an empty graph E_m, i.e., $P (𝒢_{\overline{p}} = E_{m}) \geq \frac{1}{2}$ .

The underlying hypothesis for Eq (3) is the assumption that in a random dataset no edge is expected, since no relation should occur between nodes. Due to its definition, the secure threshold $\overline{p}$ is biased towards avoiding the false positive links, paying a price in terms of false negatives. In Table 1 a collection of values of $\overline{p}$ is listed for different m and n, while in Fig 2 the contour plot of the function $\overline{p} (n, m)$ is shown first on a large range of values and then zooming on the small sample size area.

Download:

Table 1. A subset of values of the secure threshold

\overline{p}

for different number of samples n and genes m.

https://doi.org/10.1371/journal.pone.0128115.t001

Download:

Fig 2. Contour plot of the function

\overline{p} (n, m)

in the Samples × Genes space on (a) a wide (n, m) range and (b) zoomed on the small sample size area.

https://doi.org/10.1371/journal.pone.0128115.g002

As a first example, we compare the secure threshold $\overline{p}$ with the threshold derived by applying the Bonferroni correction procedure [99, 100] to the p-value of PCC. Let now Φ_F and Φ_{t, n − 2} be the cumulative distribution functions for the Fisher transformation and the Student’s t-distribution with n − 2 degrees of freedom, respectively. For a given PCC ρ computed between vectors in ℝⁿ, let (4) be the functions computing the p-value associated to ρ following either the Fisher approximation FT_n [75, 89, 101] or the Student’s t-distribution tT_n [102–106]. Set now the global familywise error rate to 5% significance level (FWER < 0.05): for a network with m nodes, by Bonferroni correction, this yields a p-value at most $p_{m} = \frac{0.1}{n (n - 1)}$ for each link in the graph. By using the inverse functions $B_{5, F} = {F T}_{n}^{- 1} (p_{m})$ and $B_{5, t} = {t T}_{n}^{- 1} (p_{m})$ we obtain the corresponding thresholds (Fisher and Student’s t, respectively) attaining the desired significance level. In Table 2 the values of B_{5, F} and B_{5, t} are reported for a selection of pairs (n, m), together with the corresponding secure threshold $\overline{p}$ , while in Fig 3 the surface plot of $\overline{p}$ is compared to B_{5, t}.

Download:

Table 2. The secure threshold and the Bonferroni correction: comparison for different number of samples n and genes m among the secure threshold

\overline{p}

and two Bonferroni-derived thresholds B_{5, F} and B_{5, t}.

https://doi.org/10.1371/journal.pone.0128115.t002

Download:

Fig 3. Comparison of the threshold functions

\overline{p} (n, m)

(yellow-red gradient) and B_{5, t} (blue gradient) in the Samples × Genes space; darker colors correspond to larger threshold values.

The relation $\overline{p} < B_{5, t}$ consistently holds.

https://doi.org/10.1371/journal.pone.0128115.g003

In all cases, the relation $\overline{p} < B_{5, F} < B_{5, t}$ holds, and when n < 10 both B_{5, F} and B_{5, t} cannot be defined, since the 5% significance level is not reached by any threshold. Both these results do not come unexpected, since the Bonferroni correction is known to be a very conservative procedure, especially when the number of tests is large (see for instance [107–109]) because it ignores existing relations between measurements. This results in a large False Negative Rate. Thus, in many situations in general and in -omics studies in particular, the Bonferroni correction should be substituted by more refined multiple testing approaches [110].

We show now in Table 3 the comparison on a set of synthetic and array datasets of the secure threshold $\overline{p}$ with other well known hard thresholding methods, the clustering coefficient-based threshold C* [67] and the statistical thresholds based on the adjusted p-values of 0.01, 0.05 or 0.1, while in Table 4 we compare the secure threshold $\overline{p}$ with the optimal threshold O [73] on a collection of array datasets on brain tissues. The threshold O is optimized for each dataset (i.e., sample size, data quality and biological structure), considering also the effects induced by the influence of subpopulations on network generation, and comparing to permutated data. Note that for a given pair (n, m), the threshold $\overline{p}$ is constant, while the threshold O depends on the particular dataset. In almost all cases, the threshold $\overline{p}$ is the strictest.

Download:

Table 3. Comparison of the secure threshold

\overline{p}

with the clustering coefficient-based threshold C* [67] and the statistical thresholds based on the adjusted p-values B0.01, B0.05 or B0.1 on a collection of synthetic and array datasets.

https://doi.org/10.1371/journal.pone.0128115.t003

Download:

Table 4. Comparison of the secure threshold

\overline{p}

with the optimal threshold O [73] on a collection of array datasets on brain tissues.

Note that for a given pair (n, m), the threshold p is constant, while the threshold O depends on the particular dataset.

https://doi.org/10.1371/journal.pone.0128115.t004

As shown in the previous section, for a not very skewed distribution, the good approximation provided by the exact formula for F(n, p) given in Eq (2) guarantees the effectiveness of the secure threshold $\overline{p}$ in detecting actual links between nodes. Nonetheless, whenever a stricter threshold is needed, it is still possible to follow the construction proposed, with the following refinement: the edge-creation process in the Erdös-Rényi model follows a binomial distribution, where n is the number of trials and p the probability associated with the success of a trial. The mean np of this distribution is one of the contributing terms in the definition Eq (3) of the secure threshold. To further restrict the number of falsely detected links, the variance term (np(1 − p) for the binomial distribution) can be added to the original formula through Chebyshev’s inequality: where μ and σ are the mean and the standard deviation of X. Thus, the definition of secure threshold can be sharpened to ${\tilde{p}}_{k}$ as follows, for an arbitrarily small ɛ > 0:

For instance, the binomial distribution, for large values of n, can be approximated as a normal distribution for which 95.45% of the values lie in the interval (μ − 2σ, μ + 2σ). Moreover, the Chebyshev’s inequality implies that at least 96% of the values lie in the interval (μ − 5σ, μ + 5σ). Finally, in Text D of S1 Text we show the analogue of Table 1 for ${\tilde{p}}_{2}$ and ${\tilde{p}}_{5}$ , respectively.

Results and Discussion

To conclude, we show the application of the secure threshold $\overline{p}$ in two datasets—synthetic and array—demonstrating its behaviour as a function of data subsampling.

Synthetic dataset

In order to construct a simulated microarray dataset 𝒢, we first created a correlation matrix M_𝒢 on 20 genes G₁, …G₂₀, together with a dataset 𝒢 of the corresponding expression $G_{i}^{1000}$ across 1000 synthetic samples, so that $M_{𝒢} (i, j) = | ρ (G_{i}^{1000}, G_{j}^{1000}) |$ is the absolute PCC between the expression of the genes G_i and G_j from 𝒢. In particular, M_𝒢 has two 10 × 10 blocks highly correlated on the main diagonal, and two 10 × 10 poorly correlated blocks on the minor diagonal, as shown in Fig 4(a).

Download:

Fig 4. Synthetic dataset

: level plot of the structure of the correlation matrix

(a) and heatmap of the dataset

(b).

The generating gene expression vectors $G_{1}^{1000}$ and $G_{11}^{1000}$ are marked with *.

https://doi.org/10.1371/journal.pone.0128115.g004

These blocks are derived from the following generating rule, given uncorrelated starting element $G_{1}^{1000}$ and $G_{11}^{1000}$ : Outside the two main blocks, all correlation values range between 0.002 and 0.074. In order to use M_𝒢 as the ground truth in what follows, all values outside the two blocks on the diagonal are thresholded to zero (M_𝒢)_{i 10 + j} = (M_𝒢)_{i 10 + j} = 0 for 1 ≤ i, j ≤ 10, while entries in the two main blocks are considered as real numbers when evaluating HIM distance and binarized to one when evaluating True/False Positive/Negative links.

In Fig 4(b) we also show the heatmap of the gene expression dataset 𝒢. Then a subset of n_s samples is selected from the starting 1000, and the corresponding coexpression networks is built, for the 100 hard threshold values 0.01j, for 1 ≤ j ≤ 100. The secure threshold for these cases are respectively 0.799, 0.596 and 0.389. This procedure is repeated 500 times for each value n_s = 10,20,50. The same experiment is then repeated adding a 20% and a 40% level of Gaussian noise to the original data: for a given signal s, we build s + ɛ with ɛ ∈ 𝒩(0, α ⋅ (max(s) − min(s))) for α = 0.2,0.4 respectively.

Using M_𝒢 as the ground truth, for each hard threshold 0.01j we evaluate the ratio of False Positive links (i.e., the quotient between the number of False Positive links and 190, the number of all possible links in a complete undirected network on 20 nodes), the ratio of False Negative links and the Hamming-Ipsen-Mikhailov (HIM) distance from the gold standard. The HIM distance [111, 112] is a metric between networks having the same nodes, ranging from 0 (distance between two identical networks) to 1 (comparison between the complete and the empty graph). The graphs summarizing the experiments are displayed in Fig 5.

Download:

Fig 5. Synthetic dataset

.

Coexpression inference of the network from random subsampling of the dataset, without noise (a,b,c), with 20% Gaussian noise (d,e,f) and with 40% Gaussian noise (g,h,i), on 10 (a,d,g), 20 (b,e,h) and 50 (c,f,i) samples. Solid lines indicate the mean over 500 replicated instances of HIM distance (black), ratio of False Positive (blue) and ratio of False Negative (red); dotted lines of the same color indicate confidence bars (+/-σ), while grey vertical dashed lines correspond to the secure threshold $\overline{p}$ .

https://doi.org/10.1371/journal.pone.0128115.g005

In all cases, the secure threshold $\overline{p}$ corresponds to the strictest value yielding a coexpression network with no false positive links included, which is its characterizing property. Moreover, in almost all displayed situations, thresholding at $\overline{p}$ still guarantees an acceptable HIM distance from the ground truth, and a false negative ratio that is always less than 0.4.

Ovarian cancer

The aforementioned results obtained in the synthetic case are then tested here in a large array study on 285 patients of ovarian cancer at different stages [80], recently used in a comparative study on conservation of coexpressed modules across different pathologies [79]. In detail, the authors profiled the gene expression of 285 predominately high-grade and advanced stage serous cancers of the ovary, fallopian tube, and peritoneum; the samples were hybridized on the Affymetrix Human Genome HG-U133 Plus 2.0 Array, including 54,621 probes. The goal of the original study was to identify novel molecular subtypes of ovarian cancer by gene expression profiling with linkage to clinical and pathologic features. As a major result, the authors presented two ranked gene lists supporting their claim that molecular subtypes show distinct survival characteristics. The two gene lists characterize the Progression Free Survival (PSF) and the Overall Survival (OS) patients, respectively. In each list genes are ranked according to a score weighting their association to target phenotype (PSF or OS): association is stronger for larger absolute values of the score, while the score’s sign is negative for associations with good outcome and positive for associations with poor outcome.

Following the procedure of the previous, synthetic example, first we individuate the sample subset corresponding to the homogeneous cohort of 161 patients with grade three cancer and a set T of 20 genes, with 11 genes strongly associated with good PSF or good OS (EDG7, LOC649242, SCGB1D2, CYP4B1, NQO1, MYCL1, PRSS21, MGC13057, PPP1R1B, KIAA1324, LOC646769) and 9 genes strongly associated to poor PFS or poor OS (THBS2, SFRP2, DPSG3, COL11A1, COL10A1, COL8A1, FAP, FABP4, POSTN), thus generating a dataset 𝒪_T of dimension 161 samples and 20 features. The corresponding absolute PCC matrix O_T is then used as the ground truth for the subsampling experiments, thresholding to zero all values smaller than 0.1: the levelplot of O_T and the heatmap of 𝒪_T are displayed in Fig 6.

Download:

Fig 6. Ovarian cancer dataset

.

Level plot of the structure of the correlation matrix O_T (a) and heatmap of the Ovarian dataset restricted to the set of 20 selected genes T (b). Solid lines separate the group of good and poor PFS/OS top genes.

https://doi.org/10.1371/journal.pone.0128115.g006

In these experiments, a random subdataset of n_s samples is extracted from 𝒪_T, and the corresponding absolute PCC coexpression network on the nodes T is built, for increasing threshold values. In Fig 7 we report the HIM and the ratio of False Positive and False Negative links for 500 runs of the experiments, separately for n_s = 5,10,20 and 50. Again, the secure threshold $\overline{p}$ corresponds to the smallest PCC value warranting that no false positive links are included. Finally, on average, for threshold values greater than $\overline{p}$ , the derivative of HIM distance is larger than before $\overline{p}$ , while the false negative rate remains under 0.8.

Download:

Fig 7. Ovarian cancer dataset

.

Coexpression inference of the coexpression network from subsampling of , on 5 (a), 10 (b), 20 (c) and 50 (d) samples. Solid lines indicate the mean over 500 replicated instances of HIM distance (black), ratio of False Positive (blue) and ratio of False Negative (red); dotted lines of the same color indicate confidence bars (+/-σ), while grey vertical dashed lines correspond to the secure threshold $\overline{p}$ .

https://doi.org/10.1371/journal.pone.0128115.g007

Conclusions

We have proposed a simple a priori, theoretical and non-parametric method for the selection of a hard threshold for the construction of correlation networks. This model is based on the requirements of filtering random data due to noise and reducing the number of false positives, and it is implemented by means of analytic properties of PCC. This new approach can be especially useful where there is a small sample size and when the task requires minimising the number of false positive links, probably the most common situation in profiling studies in functional genomics. Finally, when the number of samples increases, coupling this method with soft thresholding approaches, can help recovering false negative links neglected by overly strict thresholds.

Supporting Information

S1 Text. Supplementary Text and Figures.

https://doi.org/10.1371/journal.pone.0128115.s001

(PDF)

Author Contributions

Conceived and designed the experiments: AG GJ. Performed the experiments: AG GJ. Analyzed the data: AG GJ. Wrote the paper: AG GJ.

References

1. Barabási AL. The network takeover. Nature Physics. 2012;8:14–16.
- View Article
- Google Scholar
2. Szederkenyi G, Banga J, Alonso A. Inference of complex biological networks: distinguishability issues and optimization-based solutions. BMC Systems Biology. 2011;5(1):177. pmid:22034917
- View Article
- PubMed/NCBI
- Google Scholar
3. He F, Balling R, Zeng AP. Reverse engineering and verification of gene networks: Principles, assumptions, and limitations of present methods and future perspectives. Journal of Biotechnology. 2009;144:190–203. pmid:19631244
- View Article
- PubMed/NCBI
- Google Scholar
4. Meyer P, Alexopoulos LG, Bonk T, Califano A, Cho CR, de la Fuente A, et al. Verification of systems biology research in the age of collaborative competition. Nature Biotechnology. 2011;29(9):811–815. pmid:21904331
- View Article
- PubMed/NCBI
- Google Scholar
5. Prill RJ, Marbach D, Saez-Rodriguez J, Sorger PK, Alexopoulos LG, Xue X, et al. Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges. PLoS ONE. 2010 02;5(2):e9202. pmid:20186320
- View Article
- PubMed/NCBI
- Google Scholar
6. De Smet R, Marchal K. Advantages and limitations of current network inference methods. Nature Reviews Microbiology. 2010;8(10):717–729. pmid:20805835
- View Article
- PubMed/NCBI
- Google Scholar
7. Liang M, Zhang F, Jin G, Zhu J. FastGCN: A GPU Accelerated Tool for Fast Gene Co-Expression Networks. PLoS ONE. 2015;10(1):e0116776. pmid:25602758
- View Article
- PubMed/NCBI
- Google Scholar
8. Song Q, Zhao C, Ou S, Meng Z, Kang P, Fan L, et al. Co-expression analysis of differentially expressed genes in hepatitis C virus-induced hepatocellular carcinoma. Molecular Medicine Reports. 2015;11(1):21–28. pmid:25339452
- View Article
- PubMed/NCBI
- Google Scholar
9. Wang S, Pandis I, Johnson D, Emam I, Guitton F, Oehmichen A, et al. Optimising parallel R correlation matrix calculations on gene expression data using MapReduce. BMC Bioinformatics. 2014;15:35.
- View Article
- Google Scholar
10. Rotival M, Petretto E. Leveraging gene co-expression networks to pinpoint the regulation of complex traits and disease, with a focus on cardiovascular traits. Briefings in Functional Genomics. 2014;13(1):66–78. pmid:23960099
- View Article
- PubMed/NCBI
- Google Scholar
11. Okamura Y, Aoki Y, Obayashi T, Tadaka S, Ito S, Narise T, et al. COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems. Nucleic Acids Research. 2015;43(D1):D82–D86. pmid:25392420
- View Article
- PubMed/NCBI
- Google Scholar
12. Yang Y, Han L, Yuan Y, Li J, Hei N, Liang H. Gene co-expression network analysis reveals common system-level properties of prognostic genes across cancer types. Nature Communications. 2014;5:Article 3231.
- View Article
- Google Scholar
13. López-Kleine L, Leal L, López C. Biostatistical approaches for the reconstruction of gene co-expression networks based on transcriptomic data. Briefings in Functional Genomics. 2013;12(5):457–467. pmid:23407269
- View Article
- PubMed/NCBI
- Google Scholar
14. Rau A, Maugis-Rabusseau C, Martin-Magniette ML, Celeux G. Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models. Bioinformatics. 2015;2015 Jan 5 [Epub ahead of print]:btu845.
- View Article
- Google Scholar
15. Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P. Coexpression Analysis of Human Genes Across Many Microarray Data Sets. Genome Research. 2004;14(6):1085–1094. pmid:15173114
- View Article
- PubMed/NCBI
- Google Scholar
16. Lavi O, Dror G, Shamir R. Network-Induced Classification Kernels for Gene Expression Profile Analysis. Journal of Computational Biology. 2012;19(6):694–709. pmid:22697242
- View Article
- PubMed/NCBI
- Google Scholar
17. Rapaport F, Zinovyev A, Dutreix M, Barillot E, Vert JP. Classification of microarray data using gene networks. BMC Bioinformatics. 2007;p. 35. pmid:17270037
- View Article
- PubMed/NCBI
- Google Scholar
18. Jansen R, Greenbaum D, Gerstein M. Relating Whole-Genome Expression Data with Protein-Protein Interactions. Genome Research. 2002;12(1):376.
- View Article
- Google Scholar
19. Zucknick M, Richardson S, Stronach EA. Comparing the Characteristics of Gene Expression Profiles Derived by Univariate and Multivariate Classification Methods. Statistical Applications in Genetics and Molecular Biology. 2008;7(1):Article 7. pmid:18312212
- View Article
- PubMed/NCBI
- Google Scholar
20. Khodakarim S, AlaviMajd H, Zayeri F, Rezaei-Tavirani M, Dehghan-Nayeri N, Tabatabaee SM, et al. Comparison of Univariate and Multivariate Gene Set Analysis in Acute Lymphoblastic Leukemia. Asian Pacific Journal of Cancer Prevention. 2013;14(3):1629–1633.
- View Article
- Google Scholar
21. Rahmatallah Y, Emmert-Streib F, Glazko G. Gene Sets Net Correlations Analysis (GSNCA): a multivariate differential coexpression test for gene sets. Bioinformatics. 2014;30(3):360–368. pmid:24292935
- View Article
- PubMed/NCBI
- Google Scholar
22. Zhi W, Minturn J, Rappaport E, Brodeur G, Li H. Network-based Analysis of Multivariate Gene Expression Data. In: Statistical Methods for Microarray Data Analysis. vol. 972 of Methods in molecular biology. Springer; 2013. p. 121–139.
23. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences. 1998;95(25):14863–14868.
- View Article
- Google Scholar
24. Kumari S, Nie J, Chen HS, Ma H, Stewart R, Li X, et al. Evaluation of Gene Association Methods for Coexpression Network Construction and Biological Knowledge Discovery. PLoS ONE. 2012;7(11):e50411. pmid:23226279
- View Article
- PubMed/NCBI
- Google Scholar
25. Allen JD, Xie Y, Chen M, Girard L, Xiao G. Comparing Statistical Methods for Constructing Large Scale Gene Networks. PLoS ONE. 2012;7(1):e29348. pmid:22272232
- View Article
- PubMed/NCBI
- Google Scholar
26. Song L, Langfelder P, Horvath S. Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinformatics. 2012;13:328. pmid:23217028
- View Article
- PubMed/NCBI
- Google Scholar
27. Madhamshettiwar P, Maetschke S, Davis M, Reverter A, Ragan M. Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets. Genome Medicine. 2012;4(5):41. pmid:22548828
- View Article
- PubMed/NCBI
- Google Scholar
28. Baralla A, Mentzen WI, de la Fuente A. Inferring Gene Networks: Dream or Nightmare? Annals of the New York Academy of Science. 2009;1158:246–256. pmid:19348646
- View Article
- PubMed/NCBI
- Google Scholar
29. Carter SL, Brechbühler CM, Griffin M, Bond AT. Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics. 2004;20(14):2242–2250. pmid:15130938
- View Article
- PubMed/NCBI
- Google Scholar
30. Wood LD, Parsons DW, Jones S, Lin J, Sjöblom T, Leary RJ, et al. The Genomic Landscapes of Human Breast and Colorectal Cancers. Science. 2007;318(5853):1108–1113. pmid:17932254
- View Article
- PubMed/NCBI
- Google Scholar
31. Carlson M, Zhang B, Fang Z, Mischel P, Horvath S, Nelson S. Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks. BMC Genomics. 2006;7(1):40. pmid:16515682
- View Article
- PubMed/NCBI
- Google Scholar
32. Chen R, Mias GI, Li-Pook-Than J, Jiang L, Lam HYK, Chen R, et al. Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes. Cell. 2012;148(6):1293–1307. pmid:22424236
- View Article
- PubMed/NCBI
- Google Scholar
33. Friedman J, Alm EJ. Inferring Correlation Networks from Genomic Survey Data. PLoS Computational Biology. 2012;8(9):e1002687. pmid:23028285
- View Article
- PubMed/NCBI
- Google Scholar
34. Furlotte NA, Kang HM, Ye C, Eskin E. Mixed-model coexpression: calculating gene coexpression while accounting for expression heterogeneity. Bioinformatics. 2011;27(13):i288–i294. pmid:21685083
- View Article
- PubMed/NCBI
- Google Scholar
35. Rider AK, Milenković T, Siwo GH, Pinapati RS, Emrich SJ, Ferdig MT, et al. Networks’ characteristics are important for systems biology. Network Science. 2014;2(02):139–161.
- View Article
- Google Scholar
36. Butte AJ, Kohane IS. Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurements. Pacific Symposium on Biocomputing. 2000;5:415–426.
- View Article
- Google Scholar
37. Wang HQ, Tsai CJ. CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis. PLoS ONE. 2013;8(10):e77429. pmid:24194884
- View Article
- PubMed/NCBI
- Google Scholar
38. Cho DY, Kim YA, Przytycka TM. Chapter 5: Network Biology Approach to Complex Diseases. PLoS Computational Biology. 2012;8(12):e1002820. pmid:23300411
- View Article
- PubMed/NCBI
- Google Scholar
39. Zhang B, Horvath S. A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applications in Genetics and Molecular Biology. 2005;4(1):Article 17. pmid:16646834
- View Article
- PubMed/NCBI
- Google Scholar
40. Zhang J, Lu K, Xiang Y, Islam M, Kotian S, Kais Z, et al. Weighted Frequent Gene Co-expression Network Mining to Identify Genes Involved in Genome Stability. PLoS Computational Biology. 2012;8(8):e1002656. pmid:22956898
- View Article
- PubMed/NCBI
- Google Scholar
41. Gibbs D, Baratt A, Baric R, Kawaoka Y, Smith R, Orwoll E, et al. Protein co-expression network analysis (ProCoNA). Journal of Clinical Bioinformatics. 2013;3(1):11. pmid:23724967
- View Article
- PubMed/NCBI
- Google Scholar
42. de Solla Price DJ. Networks of Scientific Papers. Science. 1965;149(3683):510–515.
- View Article
- Google Scholar
43. Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. pmid:10521342
- View Article
- PubMed/NCBI
- Google Scholar
44. Davidson GS, Wylie BN, Boyack KW. Cluster Stability and the Use of Noise in Interpretation of Clustering. In: Proceedings of the IEEE Symposium on Information Visualization 2001 INFOVIS’01. IEEE Computer Society; 2001. p. 23.
45. Chen H. Clustering and Network Analysis with Single Nucleotide Polymorphism (SNP) [Ph.D. Thesis]. Stony Brook University; 2011.
46. Numata J, Ebenhöh O, Knapp EW. Measuring correlations in metabolomic networks with mutual information. Genome Informatics. 2008;20:112–122. pmid:19425127
- View Article
- PubMed/NCBI
- Google Scholar
47. Fukushima A. DiffCorr: An R package to analyze and visualize differential correlations in biological networks. Gene. 2013;518(1):209–214. pmid:23246976
- View Article
- PubMed/NCBI
- Google Scholar
48. Prieto C, Risueño A, Fontanillo C, De Las Rivas J. Human Gene Coexpression Landscape: Confident Network Derived from Tissue Transcriptomic Profiles. PLoS ONE. 2008;3(12):e3911. pmid:19081792
- View Article
- PubMed/NCBI
- Google Scholar
49. Opgen-Rhein R, Strimmer K. From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Systems Biology. 2007;1:37. pmid:17683609
- View Article
- PubMed/NCBI
- Google Scholar
50. Khanin R, Wit E. Construction of Malaria Gene Expression Network Using Partial Correlations. In: McConnell P, Lin SM, Hurban P, editors. Methods of Microarray Data Analysis V. Springer US; 2007. p. 75–88.
51. Obayashi T, Hayashi S, Shibaoka M, Saeki M, Ohta H, Kinoshita K. COXPRESdb: a database of coexpressed gene networks in mammals. Nucleic Acids Research. 2008;36(suppl 1):D77–D82. pmid:17932064
- View Article
- PubMed/NCBI
- Google Scholar
52. Ruan J, Dean A, Zhang W. A general co-expression network-based approach to gene expression analysis: comparison and applications. BMC Systems Biology. 2010;4(1):8. pmid:20122284
- View Article
- PubMed/NCBI
- Google Scholar
53. Mistry M, Gillis J, Pavlidis P. Meta-analysis of gene coexpression networks in the post-mortem prefrontal cortex of patients with schizophrenia and unaffected controls. BMC Neuroscience. 2013;14(1):105. pmid:24070017
- View Article
- PubMed/NCBI
- Google Scholar
54. Luo F, Yang Y, Zhong J, Gao H, Khan L, Thompson D, et al. Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory. BMC Bioinformatics. 2007;8(1):299. pmid:17697349
- View Article
- PubMed/NCBI
- Google Scholar
55. Zhu D, Hero AO, Qin ZS, Swaroop A. High throughput screening of co-expressed gene pairs with controlled false discovery rate (FDR) and minimum acceptable strength (MAS). Journal of Computational Biology. 2005;12(7):1029–1045. pmid:16201920
- View Article
- PubMed/NCBI
- Google Scholar
56. Scholz M. Approaches to analyse and interpret biological profile data [Ph.D. Thesis]. Potsdam University; 2006.
57. Ma C, Wang X. Application of the Gini Correlation Coefficient to Infer Regulatory Relationships in Transcriptome Analysis. Plant Physiology. 2012;160(1):192–203. pmid:22797655
- View Article
- PubMed/NCBI
- Google Scholar
58. Caraiani P. Using Complex Networks to Characterize International Business Cycles. PLoS ONE. 2013;8(3):e58109. pmid:23483979
- View Article
- PubMed/NCBI
- Google Scholar
59. Inouye M, Silander K, Hamalainen E, Salomaa V, Harald K, Jousilahti P, et al. An immune response network associated with blood lipid levels. PLoS Genetics. 2010;6(9):e1001113. pmid:20844574
- View Article
- PubMed/NCBI
- Google Scholar
60. Giorgi FM. Expression-based Reverse Engineering of Plant Transcriptional Networks [Ph.D. Thesis]. Potsdam University; 2011.
61. Usadel B, Obayashi T, Mutwil M, Giorgi FM, Bassel GW, Tanimoto M, et al. Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. Plant, Cell & Environment. 2009;32(12):1633–1651.
- View Article
- Google Scholar
62. Yuan A, Yue Q, Apprey V, Bonney GE. Global pattern of pairwise relationship in genetic network. Journal of Biomedical Science and Engineering. 2010;3:977–985. pmid:21804923
- View Article
- PubMed/NCBI
- Google Scholar
63. Bassel GW, Lan H, Glaab E, Gibbs DJ, Gerjets T, Krasnogor N, et al. Genome-wide network model capturing seed germination reveals coordinated regulation of plant cellular phase transitions. Proceedings of the National Academy of Sciences. 2011;108(23):9709–9714.
- View Article
- Google Scholar
64. Zheng ZL, Zhao Y. Transcriptome comparison and gene coexpression network analysis provide a systems view of citrus response to “Candidatus Liberibacter asiaticus” infection. BMC Genomics. 2013;14:27. pmid:23324561
- View Article
- PubMed/NCBI
- Google Scholar
65. Stöckel J, Welsh EA, Liberton M, Kunnvakkam R, Aurora R, Pakrasi HB. Global transcriptomic analysis of Cyanothece 51142 reveals robust diurnal oscillation of central metabolic processes. Proceedings of the National Academy of Sciences. 2008;105(16):6156–6161.
- View Article
- Google Scholar
66. Dempsey K, Bonasera S, Bastola D, Ali H. A Novel Correlation Networks Approach for the Identification of Gene Targets. In: Proceedings of the 44th Hawaii International Conference on System Sciences—HICSS 2011. IEEE; 2011. p. 1–8.
67. Elo LL, Järvenpää H, Orešič M, Lahesmaa R, Aittokallio T. Systematic construction of gene coexpression networks with applications to human T helper cell differentiation process. Bioinformatics. 2007;23(16):2096–2103. pmid:17553854
- View Article
- PubMed/NCBI
- Google Scholar
68. Gibson SM, Ficklin SP, Isaacson S, Luo F, Feltus FA, Smith MC. Massive-Scale Gene Co-Expression Network Construction and Robustness Testing Using Random Matrix Theory. PLoS ONE. 2013;8(2):e55871. pmid:23409071
- View Article
- PubMed/NCBI
- Google Scholar
69. Feltus FA, Ficklin SP, Gibson SM, Smith MC. Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study. BMC Systems Biology. 2013;7:44. pmid:23738693
- View Article
- PubMed/NCBI
- Google Scholar
70. Perkins AD, Langston MA. Threshold selection in gene co-expression networks using spectral graph theory techniques. BMC Bioinformatics. 2009;10(Suppl 11):1–11.
- View Article
- Google Scholar
71. Stathias V, Pastori C, Griffin TZ, Komotar R, Clarke J, Zhang M, et al. Identifying Glioblastoma Gene Networks Based on Hypergeometric Test Analysis. PLoS ONE. 2014;9(12):e115842. pmid:25551752
- View Article
- PubMed/NCBI
- Google Scholar
72. Borate B, Chesler E, Langston M, Saxton A, Voy B. Comparison of threshold selection methods for microarray gene co-expression matrices. BMC Research Notes. 2009;2(1):240. pmid:19954523
- View Article
- PubMed/NCBI
- Google Scholar
73. Gaiteri C, Sibille E. Differentially expressed genes in major depression reside on the periphery of resilient gene coexpression networks. Frontiers in Neuroscience. 2011;5:Article 95. pmid:21922000
- View Article
- PubMed/NCBI
- Google Scholar
74. Good P. Permutation Tests. Springer; 2000.
75. Fisher RA. Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Indefinitely Large Population. Biometrika. 1915;10(4):507–521.
- View Article
- Google Scholar
76. Bevington PR. Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill; 1969.
77. Zhang A. Advanced Analysis of Gene Expression Microarray Data. World Scientific; 2006.
78. Casellas J, Varona L. Modeling Skewness in Human Transcriptomes. PLoS ONE. 2012;7(6):e38919. pmid:22701729
- View Article
- PubMed/NCBI
- Google Scholar
79. Doig T, Hume D, Theocharidis T, Goodlad J, Gregory C, Freeman T. Coexpression analysis of large cancer datasets provides insight into the cellular phenotypes of the tumour microenvironment. BMC Genomics. 2013;14(1):469. pmid:23845084
- View Article
- PubMed/NCBI
- Google Scholar
80. Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, et al. Novel Molecular Subtypes of Serous and Endometrioid Ovarian Cancer Linked to Clinical Outcome. Clinical Cancer Research. 2008;14(16):5198–5208. pmid:18698038
- View Article
- PubMed/NCBI
- Google Scholar
81. Budhu A, Jia HL, Forgues M, Liu CG, Goldstein D, Lam A, et al. Identification of Metastasis-Related MicroRNAs in Hepatocellular Carcinoma. Hepatology. 2008;47(3):897–907. pmid:18176954
- View Article
- PubMed/NCBI
- Google Scholar
82. Ji J, Shi J, Budhu A, Yu Z, Forgues M, Roessler S, et al. MicroRNA Expression, Survival, and Response to Interferon in Liver Cancer. New England Journal of Medicine. 2009;361:1437–1447. pmid:19812400
- View Article
- PubMed/NCBI
- Google Scholar
83. Troyanskaya OG, Cantor M, Sherlock G, Brown PO, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–525. pmid:11395428
- View Article
- PubMed/NCBI
- Google Scholar
84. Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Research. 2011;39(suppl 1):D152–D157. pmid:21037258
- View Article
- PubMed/NCBI
- Google Scholar
85. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology. 1990;215(3):403–410. pmid:2231712
- View Article
- PubMed/NCBI
- Google Scholar
86. Olkin I, Pratt JW. Unbiased estimation of certain correlation coefficients. Annals of Mathematical Statistics. 1958;29:201–211.
- View Article
- Google Scholar
87. Pearson ES, Adyanthāya NK. The distribution of frequency constants in small samples from non-normal symmetrical and skew populations. Biometrika. 1929;21(1/4):259–286.
- View Article
- Google Scholar
88. Rider PR. On the distribution of the correlation coefficient in small samples. Biometrika. 1932;24(3/4):382–403.
- View Article
- Google Scholar
89. Gayen AK. The Frequency Distribution of the Product-Moment Correlation Coefficient in Random Samples of any Size Drawn from Non-Normal Universes. Biometrika. 1951;38(1/2):219–247. pmid:14848124
- View Article
- PubMed/NCBI
- Google Scholar
90. Kenney JF, Keeping ES. Mathematics of Statistics, Part 2. 2nd ed. Van Nostrand; 1951.
91. Kenney JF, Keeping ES. Mathematics of Statistics, Part 1. 3rd ed. Van Nostrand; 1962.
92. Pugh EM, Winslow GH. The Analysis of Physical Measurements. Addison-Wesley; 1966.
93. Kendall MG, Stuart A. The Advanced Theory of Statistics: Distribution theory. Griffin; 1977.
94. Haldane JBS. A note on non-normal correlation. Biometrika. 1949;36:467–468. pmid:15409360
- View Article
- PubMed/NCBI
- Google Scholar
95. Hey GB. A new method for experimental sampling illustrated in certain non-normal populations. Biometrika. 1938;30:68–80.
- View Article
- Google Scholar
96. Kowalski CJ. On the Effects of Non-Normality on the Distribution of the Sample Product-Moment Correlation Coefficient. Journal of the Royal Statistical Society Series C (Applied Statistics). 1972;21(1):1–12.
- View Article
- Google Scholar
97. Erdös P, Rényi A. On Random Graphs. I. Publicationes Mathematicae. 1959;6:290–297.
- View Article
- Google Scholar
98. Hamza K. The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions. Statistics & Probability Letters. 1995;23(1):21–25.
- View Article
- Google Scholar
99. Bonferroni CE. Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del Regio Istituto Superiore di Scienze Economiche e Commerciali di Firenze. 1936;8:3–62.
- View Article
- Google Scholar
100. Miller RGJ. Simultaneous Statistical Inference. Springer; 1981.
101. Fisher RA. On the “probable error” of a coefficient of correlation deduced from a small sample. Metron. 1921;1:3–32.
- View Article
- Google Scholar
102. Sealy Gossetm WS. The probable error of a mean. Biometrika. 1908;6(1):1–25.
- View Article
- Google Scholar
103. Soper HE, Young AW, Cave BM, Lee A, Pearson K. On the distribution of the correlation coefficient in small samples. Appendix II to the papers of “Student” and R. A. Fisher. A co-operative study. Biometrika. 1917;11:328–413.
- View Article
- Google Scholar
104. Fisher RA. Applications of “Student’s” distribution. Metron. 1925;5:90–104.
- View Article
- Google Scholar
105. Rahman NA. A Course in Theoretical Statistics. Charles Griffin and Company; 1968.
106. Kendall MG, Stuart A. The Advanced Theory of Statistics, Volume 2: Inference and Relationship. Charles Griffin and Company; 1973.
107. Bland JM, Altman DG. Multiple significance tests: the Bonferroni method. British Medical Journal. 1995;310(6973):170. pmid:7833759
- View Article
- PubMed/NCBI
- Google Scholar
108. Perneger TV. What’s wrong with Bonferroni adjustments. British Medical Journal. 1998;316(7139):1236–1238. pmid:9553006
- View Article
- PubMed/NCBI
- Google Scholar
109. Azuaje FJ. Selecting biologically informative genes in co-expression networks with a centrality score. Biology Direct. 2014;9:12. pmid:24947308
- View Article
- PubMed/NCBI
- Google Scholar
110. Dudoit S, van der Laan MJ. Multiple Testing Procedures with Applications to Genomics. Springer; 2008.
111. Jurman G, Visintainer R, Riccadonna S, Filosi M, Furlanello C. The HIM glocal metric and kernel for network comparison and classification; 2013. ArXiv:1201.2931 [math.CO].
112. Filosi M, Visintainer R, Riccadonna S, Jurman G, Furlanello C. Stability Indicators in Network Reconstruction. PLoS ONE. 2014;9(2):e89815. pmid:24587057
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Barabási AL. The network takeover. Nature Physics. 2012;8:14–16.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Szederkenyi G, Banga J, Alonso A. Inference of complex biological networks: distinguishability issues and optimization-based solutions. BMC Systems Biology. 2011;5(1):177. pmid:22034917
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. He F, Balling R, Zeng AP. Reverse engineering and verification of gene networks: Principles, assumptions, and limitations of present methods and future perspectives. Journal of Biotechnology. 2009;144:190–203. pmid:19631244
View Article
PubMed/NCBI
Google Scholar

[9] View Article

[10] PubMed/NCBI

[11] Google Scholar

[ref4] 4. Meyer P, Alexopoulos LG, Bonk T, Califano A, Cho CR, de la Fuente A, et al. Verification of systems biology research in the age of collaborative competition. Nature Biotechnology. 2011;29(9):811–815. pmid:21904331
View Article
PubMed/NCBI
Google Scholar

[13] View Article

[14] PubMed/NCBI

[15] Google Scholar

[ref5] 5. Prill RJ, Marbach D, Saez-Rodriguez J, Sorger PK, Alexopoulos LG, Xue X, et al. Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges. PLoS ONE. 2010 02;5(2):e9202. pmid:20186320
View Article
PubMed/NCBI
Google Scholar

[17] View Article

[18] PubMed/NCBI

[19] Google Scholar

[ref6] 6. De Smet R, Marchal K. Advantages and limitations of current network inference methods. Nature Reviews Microbiology. 2010;8(10):717–729. pmid:20805835
View Article
PubMed/NCBI
Google Scholar

[21] View Article

[22] PubMed/NCBI

[23] Google Scholar

[ref7] 7. Liang M, Zhang F, Jin G, Zhu J. FastGCN: A GPU Accelerated Tool for Fast Gene Co-Expression Networks. PLoS ONE. 2015;10(1):e0116776. pmid:25602758
View Article
PubMed/NCBI
Google Scholar

[25] View Article

[26] PubMed/NCBI

[27] Google Scholar

[ref8] 8. Song Q, Zhao C, Ou S, Meng Z, Kang P, Fan L, et al. Co-expression analysis of differentially expressed genes in hepatitis C virus-induced hepatocellular carcinoma. Molecular Medicine Reports. 2015;11(1):21–28. pmid:25339452
View Article
PubMed/NCBI
Google Scholar

[29] View Article

[30] PubMed/NCBI

[31] Google Scholar

[ref9] 9. Wang S, Pandis I, Johnson D, Emam I, Guitton F, Oehmichen A, et al. Optimising parallel R correlation matrix calculations on gene expression data using MapReduce. BMC Bioinformatics. 2014;15:35.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref10] 10. Rotival M, Petretto E. Leveraging gene co-expression networks to pinpoint the regulation of complex traits and disease, with a focus on cardiovascular traits. Briefings in Functional Genomics. 2014;13(1):66–78. pmid:23960099
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref11] 11. Okamura Y, Aoki Y, Obayashi T, Tadaka S, Ito S, Narise T, et al. COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems. Nucleic Acids Research. 2015;43(D1):D82–D86. pmid:25392420
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref12] 12. Yang Y, Han L, Yuan Y, Li J, Hei N, Liang H. Gene co-expression network analysis reveals common system-level properties of prognostic genes across cancer types. Nature Communications. 2014;5:Article 3231.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref13] 13. López-Kleine L, Leal L, López C. Biostatistical approaches for the reconstruction of gene co-expression networks based on transcriptomic data. Briefings in Functional Genomics. 2013;12(5):457–467. pmid:23407269
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref14] 14. Rau A, Maugis-Rabusseau C, Martin-Magniette ML, Celeux G. Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models. Bioinformatics. 2015;2015 Jan 5 [Epub ahead of print]:btu845.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref15] 15. Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P. Coexpression Analysis of Human Genes Across Many Microarray Data Sets. Genome Research. 2004;14(6):1085–1094. pmid:15173114
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref16] 16. Lavi O, Dror G, Shamir R. Network-Induced Classification Kernels for Gene Expression Profile Analysis. Journal of Computational Biology. 2012;19(6):694–709. pmid:22697242
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref17] 17. Rapaport F, Zinovyev A, Dutreix M, Barillot E, Vert JP. Classification of microarray data using gene networks. BMC Bioinformatics. 2007;p. 35. pmid:17270037
View Article
PubMed/NCBI
Google Scholar

[62] View Article

[63] PubMed/NCBI

[64] Google Scholar

[ref18] 18. Jansen R, Greenbaum D, Gerstein M. Relating Whole-Genome Expression Data with Protein-Protein Interactions. Genome Research. 2002;12(1):376.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref19] 19. Zucknick M, Richardson S, Stronach EA. Comparing the Characteristics of Gene Expression Profiles Derived by Univariate and Multivariate Classification Methods. Statistical Applications in Genetics and Molecular Biology. 2008;7(1):Article 7. pmid:18312212
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref20] 20. Khodakarim S, AlaviMajd H, Zayeri F, Rezaei-Tavirani M, Dehghan-Nayeri N, Tabatabaee SM, et al. Comparison of Univariate and Multivariate Gene Set Analysis in Acute Lymphoblastic Leukemia. Asian Pacific Journal of Cancer Prevention. 2013;14(3):1629–1633.
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref21] 21. Rahmatallah Y, Emmert-Streib F, Glazko G. Gene Sets Net Correlations Analysis (GSNCA): a multivariate differential coexpression test for gene sets. Bioinformatics. 2014;30(3):360–368. pmid:24292935
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref22] 22. Zhi W, Minturn J, Rappaport E, Brodeur G, Li H. Network-based Analysis of Multivariate Gene Expression Data. In: Statistical Methods for Microarray Data Analysis. vol. 972 of Methods in molecular biology. Springer; 2013. p. 121–139.

[ref23] 23. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences. 1998;95(25):14863–14868.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref24] 24. Kumari S, Nie J, Chen HS, Ma H, Stewart R, Li X, et al. Evaluation of Gene Association Methods for Coexpression Network Construction and Biological Knowledge Discovery. PLoS ONE. 2012;7(11):e50411. pmid:23226279
View Article
PubMed/NCBI
Google Scholar

[84] View Article

[85] PubMed/NCBI

[86] Google Scholar

[ref25] 25. Allen JD, Xie Y, Chen M, Girard L, Xiao G. Comparing Statistical Methods for Constructing Large Scale Gene Networks. PLoS ONE. 2012;7(1):e29348. pmid:22272232
View Article
PubMed/NCBI
Google Scholar

[88] View Article

[89] PubMed/NCBI

[90] Google Scholar

[ref26] 26. Song L, Langfelder P, Horvath S. Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinformatics. 2012;13:328. pmid:23217028
View Article
PubMed/NCBI
Google Scholar

[92] View Article

[93] PubMed/NCBI

[94] Google Scholar

[ref27] 27. Madhamshettiwar P, Maetschke S, Davis M, Reverter A, Ragan M. Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets. Genome Medicine. 2012;4(5):41. pmid:22548828
View Article
PubMed/NCBI
Google Scholar

[96] View Article

[97] PubMed/NCBI

[98] Google Scholar

[ref28] 28. Baralla A, Mentzen WI, de la Fuente A. Inferring Gene Networks: Dream or Nightmare? Annals of the New York Academy of Science. 2009;1158:246–256. pmid:19348646
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref29] 29. Carter SL, Brechbühler CM, Griffin M, Bond AT. Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics. 2004;20(14):2242–2250. pmid:15130938
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref30] 30. Wood LD, Parsons DW, Jones S, Lin J, Sjöblom T, Leary RJ, et al. The Genomic Landscapes of Human Breast and Colorectal Cancers. Science. 2007;318(5853):1108–1113. pmid:17932254
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref31] 31. Carlson M, Zhang B, Fang Z, Mischel P, Horvath S, Nelson S. Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks. BMC Genomics. 2006;7(1):40. pmid:16515682
View Article
PubMed/NCBI
Google Scholar

[112] View Article

[113] PubMed/NCBI

[114] Google Scholar

[ref32] 32. Chen R, Mias GI, Li-Pook-Than J, Jiang L, Lam HYK, Chen R, et al. Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes. Cell. 2012;148(6):1293–1307. pmid:22424236
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref33] 33. Friedman J, Alm EJ. Inferring Correlation Networks from Genomic Survey Data. PLoS Computational Biology. 2012;8(9):e1002687. pmid:23028285
View Article
PubMed/NCBI
Google Scholar

[120] View Article

[121] PubMed/NCBI

[122] Google Scholar

[ref34] 34. Furlotte NA, Kang HM, Ye C, Eskin E. Mixed-model coexpression: calculating gene coexpression while accounting for expression heterogeneity. Bioinformatics. 2011;27(13):i288–i294. pmid:21685083
View Article
PubMed/NCBI
Google Scholar

[124] View Article

[125] PubMed/NCBI

[126] Google Scholar

[ref35] 35. Rider AK, Milenković T, Siwo GH, Pinapati RS, Emrich SJ, Ferdig MT, et al. Networks’ characteristics are important for systems biology. Network Science. 2014;2(02):139–161.
View Article
Google Scholar

[128] View Article

[129] Google Scholar

[ref36] 36. Butte AJ, Kohane IS. Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurements. Pacific Symposium on Biocomputing. 2000;5:415–426.
View Article
Google Scholar

[131] View Article

[132] Google Scholar

[ref37] 37. Wang HQ, Tsai CJ. CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis. PLoS ONE. 2013;8(10):e77429. pmid:24194884
View Article
PubMed/NCBI
Google Scholar

[134] View Article

[135] PubMed/NCBI

[136] Google Scholar

[ref38] 38. Cho DY, Kim YA, Przytycka TM. Chapter 5: Network Biology Approach to Complex Diseases. PLoS Computational Biology. 2012;8(12):e1002820. pmid:23300411
View Article
PubMed/NCBI
Google Scholar

[138] View Article

[139] PubMed/NCBI

[140] Google Scholar

[ref39] 39. Zhang B, Horvath S. A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applications in Genetics and Molecular Biology. 2005;4(1):Article 17. pmid:16646834
View Article
PubMed/NCBI
Google Scholar

[142] View Article

[143] PubMed/NCBI

[144] Google Scholar

[ref40] 40. Zhang J, Lu K, Xiang Y, Islam M, Kotian S, Kais Z, et al. Weighted Frequent Gene Co-expression Network Mining to Identify Genes Involved in Genome Stability. PLoS Computational Biology. 2012;8(8):e1002656. pmid:22956898
View Article
PubMed/NCBI
Google Scholar

[146] View Article

[147] PubMed/NCBI

[148] Google Scholar

[ref41] 41. Gibbs D, Baratt A, Baric R, Kawaoka Y, Smith R, Orwoll E, et al. Protein co-expression network analysis (ProCoNA). Journal of Clinical Bioinformatics. 2013;3(1):11. pmid:23724967
View Article
PubMed/NCBI
Google Scholar

[150] View Article

[151] PubMed/NCBI

[152] Google Scholar

[ref42] 42. de Solla Price DJ. Networks of Scientific Papers. Science. 1965;149(3683):510–515.
View Article
Google Scholar

[154] View Article

[155] Google Scholar

[ref43] 43. Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. pmid:10521342
View Article
PubMed/NCBI
Google Scholar

[157] View Article

[158] PubMed/NCBI

[159] Google Scholar

[ref44] 44. Davidson GS, Wylie BN, Boyack KW. Cluster Stability and the Use of Noise in Interpretation of Clustering. In: Proceedings of the IEEE Symposium on Information Visualization 2001 INFOVIS’01. IEEE Computer Society; 2001. p. 23.

[ref45] 45. Chen H. Clustering and Network Analysis with Single Nucleotide Polymorphism (SNP) [Ph.D. Thesis]. Stony Brook University; 2011.

[ref46] 46. Numata J, Ebenhöh O, Knapp EW. Measuring correlations in metabolomic networks with mutual information. Genome Informatics. 2008;20:112–122. pmid:19425127
View Article
PubMed/NCBI
Google Scholar

[163] View Article

[164] PubMed/NCBI

[165] Google Scholar

[ref47] 47. Fukushima A. DiffCorr: An R package to analyze and visualize differential correlations in biological networks. Gene. 2013;518(1):209–214. pmid:23246976
View Article
PubMed/NCBI
Google Scholar

[167] View Article

[168] PubMed/NCBI

[169] Google Scholar

[ref48] 48. Prieto C, Risueño A, Fontanillo C, De Las Rivas J. Human Gene Coexpression Landscape: Confident Network Derived from Tissue Transcriptomic Profiles. PLoS ONE. 2008;3(12):e3911. pmid:19081792
View Article
PubMed/NCBI
Google Scholar

[171] View Article

[172] PubMed/NCBI

[173] Google Scholar

[ref49] 49. Opgen-Rhein R, Strimmer K. From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Systems Biology. 2007;1:37. pmid:17683609
View Article
PubMed/NCBI
Google Scholar

[175] View Article

[176] PubMed/NCBI

[177] Google Scholar

[ref50] 50. Khanin R, Wit E. Construction of Malaria Gene Expression Network Using Partial Correlations. In: McConnell P, Lin SM, Hurban P, editors. Methods of Microarray Data Analysis V. Springer US; 2007. p. 75–88.

[ref51] 51. Obayashi T, Hayashi S, Shibaoka M, Saeki M, Ohta H, Kinoshita K. COXPRESdb: a database of coexpressed gene networks in mammals. Nucleic Acids Research. 2008;36(suppl 1):D77–D82. pmid:17932064
View Article
PubMed/NCBI
Google Scholar

[180] View Article

[181] PubMed/NCBI

[182] Google Scholar

[ref52] 52. Ruan J, Dean A, Zhang W. A general co-expression network-based approach to gene expression analysis: comparison and applications. BMC Systems Biology. 2010;4(1):8. pmid:20122284
View Article
PubMed/NCBI
Google Scholar

[184] View Article

[185] PubMed/NCBI

[186] Google Scholar

[ref53] 53. Mistry M, Gillis J, Pavlidis P. Meta-analysis of gene coexpression networks in the post-mortem prefrontal cortex of patients with schizophrenia and unaffected controls. BMC Neuroscience. 2013;14(1):105. pmid:24070017
View Article
PubMed/NCBI
Google Scholar

[188] View Article

[189] PubMed/NCBI

[190] Google Scholar

[ref54] 54. Luo F, Yang Y, Zhong J, Gao H, Khan L, Thompson D, et al. Constructing gene co-expression networks and predicting functions of unknown genes by random matrix theory. BMC Bioinformatics. 2007;8(1):299. pmid:17697349
View Article
PubMed/NCBI
Google Scholar

[192] View Article

[193] PubMed/NCBI

[194] Google Scholar

[ref55] 55. Zhu D, Hero AO, Qin ZS, Swaroop A. High throughput screening of co-expressed gene pairs with controlled false discovery rate (FDR) and minimum acceptable strength (MAS). Journal of Computational Biology. 2005;12(7):1029–1045. pmid:16201920
View Article
PubMed/NCBI
Google Scholar

[196] View Article

[197] PubMed/NCBI

[198] Google Scholar

[ref56] 56. Scholz M. Approaches to analyse and interpret biological profile data [Ph.D. Thesis]. Potsdam University; 2006.

[ref57] 57. Ma C, Wang X. Application of the Gini Correlation Coefficient to Infer Regulatory Relationships in Transcriptome Analysis. Plant Physiology. 2012;160(1):192–203. pmid:22797655
View Article
PubMed/NCBI
Google Scholar

[201] View Article

[202] PubMed/NCBI

[203] Google Scholar

[ref58] 58. Caraiani P. Using Complex Networks to Characterize International Business Cycles. PLoS ONE. 2013;8(3):e58109. pmid:23483979
View Article
PubMed/NCBI
Google Scholar

[205] View Article

[206] PubMed/NCBI

[207] Google Scholar

[ref59] 59. Inouye M, Silander K, Hamalainen E, Salomaa V, Harald K, Jousilahti P, et al. An immune response network associated with blood lipid levels. PLoS Genetics. 2010;6(9):e1001113. pmid:20844574
View Article
PubMed/NCBI
Google Scholar

[209] View Article

[210] PubMed/NCBI

[211] Google Scholar

[ref60] 60. Giorgi FM. Expression-based Reverse Engineering of Plant Transcriptional Networks [Ph.D. Thesis]. Potsdam University; 2011.

[ref61] 61. Usadel B, Obayashi T, Mutwil M, Giorgi FM, Bassel GW, Tanimoto M, et al. Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. Plant, Cell & Environment. 2009;32(12):1633–1651.
View Article
Google Scholar

[214] View Article

[215] Google Scholar

[ref62] 62. Yuan A, Yue Q, Apprey V, Bonney GE. Global pattern of pairwise relationship in genetic network. Journal of Biomedical Science and Engineering. 2010;3:977–985. pmid:21804923
View Article
PubMed/NCBI
Google Scholar

[217] View Article

[218] PubMed/NCBI

[219] Google Scholar

[ref63] 63. Bassel GW, Lan H, Glaab E, Gibbs DJ, Gerjets T, Krasnogor N, et al. Genome-wide network model capturing seed germination reveals coordinated regulation of plant cellular phase transitions. Proceedings of the National Academy of Sciences. 2011;108(23):9709–9714.
View Article
Google Scholar

[221] View Article

[222] Google Scholar

[ref64] 64. Zheng ZL, Zhao Y. Transcriptome comparison and gene coexpression network analysis provide a systems view of citrus response to “Candidatus Liberibacter asiaticus” infection. BMC Genomics. 2013;14:27. pmid:23324561
View Article
PubMed/NCBI
Google Scholar

[224] View Article

[225] PubMed/NCBI

[226] Google Scholar

[ref65] 65. Stöckel J, Welsh EA, Liberton M, Kunnvakkam R, Aurora R, Pakrasi HB. Global transcriptomic analysis of Cyanothece 51142 reveals robust diurnal oscillation of central metabolic processes. Proceedings of the National Academy of Sciences. 2008;105(16):6156–6161.
View Article
Google Scholar

[228] View Article

[229] Google Scholar

[ref66] 66. Dempsey K, Bonasera S, Bastola D, Ali H. A Novel Correlation Networks Approach for the Identification of Gene Targets. In: Proceedings of the 44th Hawaii International Conference on System Sciences—HICSS 2011. IEEE; 2011. p. 1–8.

[ref67] 67. Elo LL, Järvenpää H, Orešič M, Lahesmaa R, Aittokallio T. Systematic construction of gene coexpression networks with applications to human T helper cell differentiation process. Bioinformatics. 2007;23(16):2096–2103. pmid:17553854
View Article
PubMed/NCBI
Google Scholar

[232] View Article

[233] PubMed/NCBI

[234] Google Scholar

[ref68] 68. Gibson SM, Ficklin SP, Isaacson S, Luo F, Feltus FA, Smith MC. Massive-Scale Gene Co-Expression Network Construction and Robustness Testing Using Random Matrix Theory. PLoS ONE. 2013;8(2):e55871. pmid:23409071
View Article
PubMed/NCBI
Google Scholar

[236] View Article

[237] PubMed/NCBI

[238] Google Scholar

[ref69] 69. Feltus FA, Ficklin SP, Gibson SM, Smith MC. Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study. BMC Systems Biology. 2013;7:44. pmid:23738693
View Article
PubMed/NCBI
Google Scholar

[240] View Article

[241] PubMed/NCBI

[242] Google Scholar

[ref70] 70. Perkins AD, Langston MA. Threshold selection in gene co-expression networks using spectral graph theory techniques. BMC Bioinformatics. 2009;10(Suppl 11):1–11.
View Article
Google Scholar

[244] View Article

[245] Google Scholar

[ref71] 71. Stathias V, Pastori C, Griffin TZ, Komotar R, Clarke J, Zhang M, et al. Identifying Glioblastoma Gene Networks Based on Hypergeometric Test Analysis. PLoS ONE. 2014;9(12):e115842. pmid:25551752
View Article
PubMed/NCBI
Google Scholar

[247] View Article

[248] PubMed/NCBI

[249] Google Scholar

[ref72] 72. Borate B, Chesler E, Langston M, Saxton A, Voy B. Comparison of threshold selection methods for microarray gene co-expression matrices. BMC Research Notes. 2009;2(1):240. pmid:19954523
View Article
PubMed/NCBI
Google Scholar

[251] View Article

[252] PubMed/NCBI

[253] Google Scholar

[ref73] 73. Gaiteri C, Sibille E. Differentially expressed genes in major depression reside on the periphery of resilient gene coexpression networks. Frontiers in Neuroscience. 2011;5:Article 95. pmid:21922000
View Article
PubMed/NCBI
Google Scholar

[255] View Article

[256] PubMed/NCBI

[257] Google Scholar

[ref74] 74. Good P. Permutation Tests. Springer; 2000.

[ref75] 75. Fisher RA. Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Indefinitely Large Population. Biometrika. 1915;10(4):507–521.
View Article
Google Scholar

[260] View Article

[261] Google Scholar

[ref76] 76. Bevington PR. Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill; 1969.

[ref77] 77. Zhang A. Advanced Analysis of Gene Expression Microarray Data. World Scientific; 2006.

[ref78] 78. Casellas J, Varona L. Modeling Skewness in Human Transcriptomes. PLoS ONE. 2012;7(6):e38919. pmid:22701729
View Article
PubMed/NCBI
Google Scholar

[265] View Article

[266] PubMed/NCBI

[267] Google Scholar

[ref79] 79. Doig T, Hume D, Theocharidis T, Goodlad J, Gregory C, Freeman T. Coexpression analysis of large cancer datasets provides insight into the cellular phenotypes of the tumour microenvironment. BMC Genomics. 2013;14(1):469. pmid:23845084
View Article
PubMed/NCBI
Google Scholar

[269] View Article

[270] PubMed/NCBI

[271] Google Scholar

[ref80] 80. Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, et al. Novel Molecular Subtypes of Serous and Endometrioid Ovarian Cancer Linked to Clinical Outcome. Clinical Cancer Research. 2008;14(16):5198–5208. pmid:18698038
View Article
PubMed/NCBI
Google Scholar

[273] View Article

[274] PubMed/NCBI

[275] Google Scholar

[ref81] 81. Budhu A, Jia HL, Forgues M, Liu CG, Goldstein D, Lam A, et al. Identification of Metastasis-Related MicroRNAs in Hepatocellular Carcinoma. Hepatology. 2008;47(3):897–907. pmid:18176954
View Article
PubMed/NCBI
Google Scholar

[277] View Article

[278] PubMed/NCBI

[279] Google Scholar

[ref82] 82. Ji J, Shi J, Budhu A, Yu Z, Forgues M, Roessler S, et al. MicroRNA Expression, Survival, and Response to Interferon in Liver Cancer. New England Journal of Medicine. 2009;361:1437–1447. pmid:19812400
View Article
PubMed/NCBI
Google Scholar

[281] View Article

[282] PubMed/NCBI

[283] Google Scholar

[ref83] 83. Troyanskaya OG, Cantor M, Sherlock G, Brown PO, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17(6):520–525. pmid:11395428
View Article
PubMed/NCBI
Google Scholar

[285] View Article

[286] PubMed/NCBI

[287] Google Scholar

[ref84] 84. Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Research. 2011;39(suppl 1):D152–D157. pmid:21037258
View Article
PubMed/NCBI
Google Scholar

[289] View Article

[290] PubMed/NCBI

[291] Google Scholar

[ref85] 85. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology. 1990;215(3):403–410. pmid:2231712
View Article
PubMed/NCBI
Google Scholar

[293] View Article

[294] PubMed/NCBI

[295] Google Scholar

[ref86] 86. Olkin I, Pratt JW. Unbiased estimation of certain correlation coefficients. Annals of Mathematical Statistics. 1958;29:201–211.
View Article
Google Scholar

[297] View Article

[298] Google Scholar

[ref87] 87. Pearson ES, Adyanthāya NK. The distribution of frequency constants in small samples from non-normal symmetrical and skew populations. Biometrika. 1929;21(1/4):259–286.
View Article
Google Scholar

[300] View Article

[301] Google Scholar

[ref88] 88. Rider PR. On the distribution of the correlation coefficient in small samples. Biometrika. 1932;24(3/4):382–403.
View Article
Google Scholar

[303] View Article

[304] Google Scholar

[ref89] 89. Gayen AK. The Frequency Distribution of the Product-Moment Correlation Coefficient in Random Samples of any Size Drawn from Non-Normal Universes. Biometrika. 1951;38(1/2):219–247. pmid:14848124
View Article
PubMed/NCBI
Google Scholar

[306] View Article

[307] PubMed/NCBI

[308] Google Scholar

[ref90] 90. Kenney JF, Keeping ES. Mathematics of Statistics, Part 2. 2nd ed. Van Nostrand; 1951.

[ref91] 91. Kenney JF, Keeping ES. Mathematics of Statistics, Part 1. 3rd ed. Van Nostrand; 1962.

[ref92] 92. Pugh EM, Winslow GH. The Analysis of Physical Measurements. Addison-Wesley; 1966.

[ref93] 93. Kendall MG, Stuart A. The Advanced Theory of Statistics: Distribution theory. Griffin; 1977.

[ref94] 94. Haldane JBS. A note on non-normal correlation. Biometrika. 1949;36:467–468. pmid:15409360
View Article
PubMed/NCBI
Google Scholar

[314] View Article

[315] PubMed/NCBI

[316] Google Scholar

[ref95] 95. Hey GB. A new method for experimental sampling illustrated in certain non-normal populations. Biometrika. 1938;30:68–80.
View Article
Google Scholar

[318] View Article

[319] Google Scholar

[ref96] 96. Kowalski CJ. On the Effects of Non-Normality on the Distribution of the Sample Product-Moment Correlation Coefficient. Journal of the Royal Statistical Society Series C (Applied Statistics). 1972;21(1):1–12.
View Article
Google Scholar

[321] View Article

[322] Google Scholar

[ref97] 97. Erdös P, Rényi A. On Random Graphs. I. Publicationes Mathematicae. 1959;6:290–297.
View Article
Google Scholar

[324] View Article

[325] Google Scholar

[ref98] 98. Hamza K. The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions. Statistics & Probability Letters. 1995;23(1):21–25.
View Article
Google Scholar

[327] View Article

[328] Google Scholar

[ref99] 99. Bonferroni CE. Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del Regio Istituto Superiore di Scienze Economiche e Commerciali di Firenze. 1936;8:3–62.
View Article
Google Scholar

[330] View Article

[331] Google Scholar

[ref100] 100. Miller RGJ. Simultaneous Statistical Inference. Springer; 1981.

[ref101] 101. Fisher RA. On the “probable error” of a coefficient of correlation deduced from a small sample. Metron. 1921;1:3–32.
View Article
Google Scholar

[334] View Article

[335] Google Scholar

[ref102] 102. Sealy Gossetm WS. The probable error of a mean. Biometrika. 1908;6(1):1–25.
View Article
Google Scholar

[337] View Article

[338] Google Scholar

[ref103] 103. Soper HE, Young AW, Cave BM, Lee A, Pearson K. On the distribution of the correlation coefficient in small samples. Appendix II to the papers of “Student” and R. A. Fisher. A co-operative study. Biometrika. 1917;11:328–413.
View Article
Google Scholar

[340] View Article

[341] Google Scholar

[ref104] 104. Fisher RA. Applications of “Student’s” distribution. Metron. 1925;5:90–104.
View Article
Google Scholar

[343] View Article

[344] Google Scholar

[ref105] 105. Rahman NA. A Course in Theoretical Statistics. Charles Griffin and Company; 1968.

[ref106] 106. Kendall MG, Stuart A. The Advanced Theory of Statistics, Volume 2: Inference and Relationship. Charles Griffin and Company; 1973.

[ref107] 107. Bland JM, Altman DG. Multiple significance tests: the Bonferroni method. British Medical Journal. 1995;310(6973):170. pmid:7833759
View Article
PubMed/NCBI
Google Scholar

[348] View Article

[349] PubMed/NCBI

[350] Google Scholar

[ref108] 108. Perneger TV. What’s wrong with Bonferroni adjustments. British Medical Journal. 1998;316(7139):1236–1238. pmid:9553006
View Article
PubMed/NCBI
Google Scholar

[352] View Article

[353] PubMed/NCBI

[354] Google Scholar

[ref109] 109. Azuaje FJ. Selecting biologically informative genes in co-expression networks with a centrality score. Biology Direct. 2014;9:12. pmid:24947308
View Article
PubMed/NCBI
Google Scholar

[356] View Article

[357] PubMed/NCBI

[358] Google Scholar

[ref110] 110. Dudoit S, van der Laan MJ. Multiple Testing Procedures with Applications to Genomics. Springer; 2008.

[ref111] 111. Jurman G, Visintainer R, Riccadonna S, Filosi M, Furlanello C. The HIM glocal metric and kernel for network comparison and classification; 2013. ArXiv:1201.2931 [math.CO].

[ref112] 112. Filosi M, Visintainer R, Riccadonna S, Jurman G, Furlanello C. Stability Indicators in Network Reconstruction. PLoS ONE. 2014;9(2):e89815. pmid:24587057
View Article
PubMed/NCBI
Google Scholar

[362] View Article

[363] PubMed/NCBI

[364] Google Scholar