Robust chromatin state annotation

  1. Maxwell W. Libbrecht
  1. School of Computing Science, Simon Fraser University, Burnaby, British Columbia V51 1S6, Canada
  • Corresponding author: maxwl{at}sfu.ca
  • Abstract

    With the goal of mapping genomic activity, international projects have recently measured epigenetic activity in hundreds of cell and tissue types. Chromatin state annotations produced by segmentation and genome annotation (SAGA) methods have emerged as the predominant way to summarize these epigenomic data sets in order to annotate the genome. These chromatin state annotations are essential for many genomic tasks, including identifying active regulatory elements and interpreting disease-associated genetic variation. However, despite the widespread applications of SAGA methods, no principled approach exists to evaluate the statistical significance of chromatin state assignments. Here, we propose the first method for assigning calibrated confidence scores to chromatin state annotations. Toward this goal, we performed a comprehensive evaluation of the reproducibility of the two most widely used existing SAGA methods, ChromHMM and Segway. We found that their predictions are frequently irreproducible. For example, when applying the same SAGA method on two sets of experimental replicates, 27%–69% of predicted enhancers fail to replicate. This suggests that a substantial fraction of predicted elements in existing chromatin state annotations cannot be relied upon. To remedy this problem, we introduce SAGAconf, a method for assigning a measure of confidence (r-value) to chromatin state annotations. SAGAconf works with any SAGA method and assigns an r-value to each genomic bin of a chromatin state annotation that represents the probability that the label of this bin will be reproduced in a replicated experiment. Thus, SAGAconf allows a researcher to select only the reliable predictions from a chromatin annotation for use in downstream analyses.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.278343.123.

    • Freely available online through the Genome Research Open Access option.

    • Received July 28, 2023.
    • Accepted March 19, 2024.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server