Transcription factor–DNA binding: beyond binding site motifs
Introduction
Sequence-specific transcription factors (TFs) are key regulators of biological processes that function by binding to transcriptional regulatory regions (e.g., promoters, enhancers) to control the expression of their target genes. Each TF typically recognizes a collection of similar DNA sequences, which can be represented as binding site motifs using models such as position weight matrices (PWMs) (reviewed in Ref. [1]; see Box 1). The characterization of motifs is an important first step in understanding the regulatory functions of TFs that consequently shape gene regulatory networks.
Technological developments over the last decade have facilitated the characterization of DNA binding preferences for many TFs. Indeed, multiple large-scale studies in recent years have collectively elucidated motifs for thousands of TFs from a wide range of organisms [2, 3, 4, 5, 6, 7•]. Most of these studies have highlighted the evolutionary conservation of TF binding specificity, allowing the binding preferences of TFs lacking directly measured specificity data to be inferred from highly similar, characterized TFs [4, 6, 7•]. Nevertheless, the current catalog of TF binding site motifs remains incomplete: binding preferences remain unknown — neither experimentally determined nor computationally inferred — for over 40% of the approximately 1400 sequence-specific TFs encoded in the human genome [3, 7•, 8, 9, 10, 11], and several TF families (e.g., those with high mobility group box or Cys2His2 zinc finger (C2H2-zf) DNA binding domains (DBDs)) have disproportionately many uncharacterized TFs. Motif coverage of model organism TFs is similarly sparse [7•], with the exception of Saccharomyces cerevisiae TFs [12]. The completion of motif catalogs remains a priority for bridging the gap between TFs and their regulatory targets.
Recent high-throughput studies have highlighted that there is more to TF–DNA binding than primary nucleotide sequence preferences. Accumulating evidence supports the widespread contributions of sequence context, including flanking sequences and DNA shape, in modulating sequence recognition. Interacting cofactors and TFs can also alter sequence preference [13]. Such additional features that impact TF–DNA recognition, together with differential TF expression and chromatin accessibility, are contributing to our understanding of what determines condition-specific TF binding [14]. In this review, we will discuss methods for identifying TF binding site motifs, emerging knowledge of additional features that influence TF–DNA recognition, and novel approaches in characterizing in vivo, functional consequences of TF binding. Because of space restrictions, we refer readers to recent reviews for discussions on TF binding site accessibility, mapping of regulatory elements to target genes, functional roles of low-affinity binding sites, and structural modeling of TF binding specificity [15, 16, 17, 18].
Section snippets
Methods to identify TF binding site motifs
Methods to characterize TF–DNA binding preferences can be broadly categorized into in vivo and in vitro approaches. In vivo approaches can reveal TF binding events that occur in particular biological conditions (e.g., cell type, treatment, time point), while in vitro methods are well suited for large-scale characterization of intrinsic TF binding sequence preferences.
A widely used in vivo method is chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) (reviewed in
Multiple specificities intrinsic to individual transcription factors
While the majority of TFs demonstrate singular binding specificities, some TFs have the intrinsic capability to recognize multiple motifs [13]. Nakagawa et al. identified several forkhead TFs that each bind two apparently unrelated sequence motifs (5′-RYAAAYA and 5′-GACGC); they found that this multiple binding specificity arose independently in at least two different evolutionary lineages [45]. Siggers et al. found that certain paralogous yeast C2H2-zf TFs in the Msn2 family recognize a core
Characterizing the functional consequences of TF binding
In parallel with emerging knowledge on features that modulate TF recognition, naturally occurring genetic variants have been informative in assessing the functional roles of TF binding site motifs. A handful of disease-associated variants that disrupt or introduce TF binding site motifs have been studied in detail, providing mechanistic insights into pathogenesis (reviewed in Ref. [64]). Over 70% of the thousands of noncoding variants found to be associated with common diseases or traits in
Conclusions and perspectives
Recent large-scale efforts to elucidate TF binding specificities have made great headway in linking TFs to binding sites, yet the catalog of binding specificities remains incomplete. Emerging research has highlighted the involvement of numerous features beyond sequence motifs, including DNA shape and flanking sequences, which modulate binding site recognition. These features complicate TF specificity determination but have been consistently shown to improve predictions of binding sites [61•, 62
Conflict of interest statement
Nothing declared.
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgements
This work was funded by National Institutes of Health/National Human Genome Research Institute grant # R01 HG003985 (M.L.B.) and an A*STAR National Science Scholarship (K.H.K.). We apologize to authors whose work we could not cite because of space restrictions. We thank Julia Rogers and Luca Mariani for critical reading of this manuscript.
References (105)
- et al.
DNA-binding specificities of human transcription factors
Cell
(2013) - et al.
Determination and inference of eukaryotic transcription factor sequence specificity
Cell
(2014) - et al.
Demystifying the secret mission of enhancers: linking distal regulatory elements to target genes
Crit Rev Biochem Mol Biol
(2015) - et al.
Structure-based modeling of protein: DNA specificity
Brief Funct Genom
(2015) - et al.
ChIP-nexus enables improved detection of in vivo transcription factor binding footprints
Nat Biotechnol
(2015) - et al.
A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors
Nat Biotechnol
(2005) - et al.
Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities
Genome Res
(2010) - et al.
Determining the specificity of protein–DNA interactions
Nat Rev Genet
(2010) - et al.
An improved predictive recognition model for Cys(2)-His(2) zinc finger proteins
Nucleic Acids Res
(2014) - et al.
DNA methylation presents distinct binding sites for human transcription factors
eLife
(2013)
CG methylated microarrays identify a novel methylated sequence bound by the CEBPB|ATF4 heterodimer that is active in vivo
Genome Res
A widespread role of the motif environment in transcription factor binding across diverse protein families
Genome Res
Pooled ChIP-seq links variation in transcription factor binding to complex disease risk
Cell
Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants
Genome Res
Massively parallel cis-regulatory analysis in the mammalian central nervous system
Genome Res
Quantification of cooperativity in heterodimer–DNA binding improves the accuracy of binding specificity models
J Biol Chem
APPRIS: annotation of principal and alternative splice isoforms
Nucleic Acids Res
TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes
Nucleic Acids Res
Modeling the specificity of protein–DNA interactions
Quant Biol
DNA-binding specificities of plant transcription factors and their potential to define target genes
Proc Natl Acad Sci U S A
UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein–DNA interactions
Nucleic Acids Res
Mapping and analysis of Caenorhabditis elegans transcription factor sequence specificities
eLife
Conservation of transcription factor binding specificities across 600 million years of bilateria evolution
eLife
A census of human transcription factors: function, expression and evolution
Nat Rev Genet
Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium
Nucleic Acids Res
HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models
Nucleic Acids Res
JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles
Nucleic Acids Res
Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights
Genome Biol
Protein–DNA binding: complexities and multi-protein codes
Nucleic Acids Res
Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors
Genome Res
Absence of a simple code: how transcription factors read the genome
Trends Biochem Sci
The soft touch: low-affinity transcription factor binding sites in development and evolution
Curr Top Dev Biol
ChIP-seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions
Nat Rev Genet
Deep and wide digging for binding motifs in ChIP-Seq data
Bioinformatics
MEME-ChIP: motif analysis of large DNA datasets
Bioinformatics
An integrated encyclopedia of DNA elements in the human genome
Nature
Comprehensive genome-wide protein–DNA interactions detected at single-nucleotide resolution
Cell
High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells
Genome Res
Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position
Nat Methods
Global mapping of protein–DNA interactions in vivo by digital genomic footprinting
Nat Methods
Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity
Genome Res
FAIRE (formaldehyde-assisted isolation of regulatory elements) isolates active regulatory elements from human chromatin
Genome Res
Genome-wide footprinting: ready for prime time?
Nat Methods
Genomic footprinting
Nat Methods
Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification
Nat Methods
DNase footprint signatures are dictated by factor dynamics and DNA sequence
Mol Cell
De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis
Nat Biotechnol
A systems approach to measuring the binding energy landscapes of transcription factors
Science
Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities
Nat Biotechnol
Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape
Cell Rep
Cited by (203)
Unexplored power of CRISPR-Cas9 in neuroscience, a multi-OMICs review
2024, International Journal of Biological MacromoleculesUncovering co-regulatory modules and gene regulatory networks in the heart through machine learning-based analysis of large-scale epigenomic data
2024, Computers in Biology and MedicineThe miR159a-CfMYB37 module regulates xylem development in Chinese cedar (Cryptomeria fortunei Hooibrenk)
2024, Industrial Crops and ProductsProbing the role of the protonation state of a minor groove-linker histidine in Exd-Hox–DNA binding
2024, Biophysical JournalPrioritizing cardiovascular disease-associated variants altering NKX2-5 and TBX5 binding through an integrative computational approach
2023, Journal of Biological Chemistry
- *
These authors contributed equally to this work.