Exploring Massive, Genome Scale Datasets with the GenometriCorr Package

doi:10.1371/journal.pcbi.1002529

Exploring Massive, Genome Scale Datasets with the GenometriCorr Package

Figure 3

A schematic of the various tests implemented in the software package, showing when certain tests are most useful.

(A) depicts the intervals created in silico and (B) shows how the query distances are evaluated within the intervals. (C) depicts a random distribution of query versus reference intervals; here the observed and expected distances for both the absolute and relative tests are the same. In (D) we show a relationship best uncovered by the absolute distance test; useful especially for small genomes, this test determines whether the query and reference are often separated by a fixed distance. In (E), the query points are consistently far away from the reference points, so the relative distance test will be significant, while the absolute distances are not significant in this case. Interestingly, the query intervals are variable enough in size that even though the query and reference points are usually separated, the absolute distances between them vary widely in size, including some fairly small distances. (F) demonstrates the projection test, which evaluates whether pointwise data falls consistently inside or outside of a set of intervals. Finally, in (G) we see the Jaccard test, which looks for significant overlaps between datasets by evaluating the ratio of the intersection of the datasets (dark grey) to the union of the datasets (light grey). Perfect correlation will give a ratio of 1, and perfect anticorrelation will result in a ratio of zero.

doi: https://doi.org/10.1371/journal.pcbi.1002529.g003