Abstract
Proper cell fate determination is largely orchestrated by complex gene regulatory networks centered around transcription factors. However, experimental elucidation of key transcription factors that drive cellular identity is currently often intractable. Here, we present ANANSE (ANalysis Algorithm for Networks Specified by Enhancers), a network-based method that exploits enhancer-encoded regulatory information to identify the key transcription factors in cell fate determination. As cell type-specific transcription factors predominantly bind to enhancers, we use regulatory networks based on enhancer properties to prioritize transcription factors. First, we predict genome-wide binding profiles of transcription factors in various cell types using enhancer activity and transcription factor binding motifs. Subsequently, applying these inferred binding profiles, we construct cell type-specific gene regulatory networks, and then predict key transcription factors controlling cell fate transitions using differential networks between cell types. This method outperforms existing approaches in correctly predicting major transcription factors previously identified to be sufficient for trans-differentiation. Finally, we apply ANANSE to define an atlas of key transcription factors in 18 normal human tissues. In conclusion, we present a ready-to-implement computational tool for efficient prediction of transcription factors in cell fate determination and to study transcription factor-mediated regulatory mechanisms. ANANSE is freely available at https://github.com/vanheeringen-lab/ANANSE.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
First, we have improved the prediction of transcription factor binding by using a more detailed model: 1) We now use ATAC-seq as well as H3K27ac ChIP-seq data for the model. This results in improved performance (see Figure 3A in the revised manuscript). Although the combined data works best, a model based on either ATAC-seq alone or H3K27ac ChIP-seq alone can also be used. This means that the approach is more widely applicable, even when only one of the two data types is available. 2) We have trained transcription factor specific models based on all available TF ChIP-seq data (237 TFs) in the REMAP project for 6 commonly used cell lines. A non-specific model (trained on all TFs, comparable to the model in the previous ANANSE version) is available as fallback, for when ChIP-seq of a TF was not available for training. The estimation of the binding performance is now based on these 237 TFs in the 6 cell lines. In addition, the ROC AUC and PR AUC metrics are performed on held-out chromosomes, in held-out cell types. Second, we have added inferred motif activity to the network inference. This, in combination with the improved binding prediction, results in a greatly improved GRN inference performance. We have revisited all benchmarks. We added DoRothEA and TF perturbation references, and now include two additional state-of-the-art reference methods: GRNBoost2 (a GENIE3-like approach) and networks from GRNdb that were inferred on single-cell RNA-seq using SCENIC (Figure 4A and B).