Transcription factor–DNA binding: beyond binding site motifs

https://doi.org/10.1016/j.gde.2017.02.007Get rights and content

Sequence-specific transcription factors (TFs) regulate gene expression by binding to cis-regulatory elements in promoter and enhancer DNA. While studies of TF–DNA binding have focused on TFs’ intrinsic preferences for primary nucleotide sequence motifs, recent studies have elucidated additional layers of complexity that modulate TF–DNA binding. In this review, we discuss technological developments for identifying TF binding preferences and highlight recent discoveries that elaborate how TF interactions, local DNA structure, and genomic features influence TF–DNA binding. We highlight novel approaches for characterizing functional binding site motifs that promise to inform our understanding of how TF binding controls gene expression and ultimately contributes to phenotype.

Introduction

Sequence-specific transcription factors (TFs) are key regulators of biological processes that function by binding to transcriptional regulatory regions (e.g., promoters, enhancers) to control the expression of their target genes. Each TF typically recognizes a collection of similar DNA sequences, which can be represented as binding site motifs using models such as position weight matrices (PWMs) (reviewed in Ref. [1]; see Box 1). The characterization of motifs is an important first step in understanding the regulatory functions of TFs that consequently shape gene regulatory networks.

Technological developments over the last decade have facilitated the characterization of DNA binding preferences for many TFs. Indeed, multiple large-scale studies in recent years have collectively elucidated motifs for thousands of TFs from a wide range of organisms [2, 3, 4, 5, 6, 7•]. Most of these studies have highlighted the evolutionary conservation of TF binding specificity, allowing the binding preferences of TFs lacking directly measured specificity data to be inferred from highly similar, characterized TFs [4, 6, 7•]. Nevertheless, the current catalog of TF binding site motifs remains incomplete: binding preferences remain unknown — neither experimentally determined nor computationally inferred — for over 40% of the approximately 1400 sequence-specific TFs encoded in the human genome [3, 7•, 8, 9, 10, 11], and several TF families (e.g., those with high mobility group box or Cys2His2 zinc finger (C2H2-zf) DNA binding domains (DBDs)) have disproportionately many uncharacterized TFs. Motif coverage of model organism TFs is similarly sparse [7], with the exception of Saccharomyces cerevisiae TFs [12]. The completion of motif catalogs remains a priority for bridging the gap between TFs and their regulatory targets.

Recent high-throughput studies have highlighted that there is more to TF–DNA binding than primary nucleotide sequence preferences. Accumulating evidence supports the widespread contributions of sequence context, including flanking sequences and DNA shape, in modulating sequence recognition. Interacting cofactors and TFs can also alter sequence preference [13]. Such additional features that impact TF–DNA recognition, together with differential TF expression and chromatin accessibility, are contributing to our understanding of what determines condition-specific TF binding [14]. In this review, we will discuss methods for identifying TF binding site motifs, emerging knowledge of additional features that influence TF–DNA recognition, and novel approaches in characterizing in vivo, functional consequences of TF binding. Because of space restrictions, we refer readers to recent reviews for discussions on TF binding site accessibility, mapping of regulatory elements to target genes, functional roles of low-affinity binding sites, and structural modeling of TF binding specificity [15, 16, 17, 18].

Section snippets

Methods to identify TF binding site motifs

Methods to characterize TF–DNA binding preferences can be broadly categorized into in vivo and in vitro approaches. In vivo approaches can reveal TF binding events that occur in particular biological conditions (e.g., cell type, treatment, time point), while in vitro methods are well suited for large-scale characterization of intrinsic TF binding sequence preferences.

A widely used in vivo method is chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) (reviewed in

Multiple specificities intrinsic to individual transcription factors

While the majority of TFs demonstrate singular binding specificities, some TFs have the intrinsic capability to recognize multiple motifs [13]. Nakagawa et al. identified several forkhead TFs that each bind two apparently unrelated sequence motifs (5′-RYAAAYA and 5′-GACGC); they found that this multiple binding specificity arose independently in at least two different evolutionary lineages [45]. Siggers et al. found that certain paralogous yeast C2H2-zf TFs in the Msn2 family recognize a core

Characterizing the functional consequences of TF binding

In parallel with emerging knowledge on features that modulate TF recognition, naturally occurring genetic variants have been informative in assessing the functional roles of TF binding site motifs. A handful of disease-associated variants that disrupt or introduce TF binding site motifs have been studied in detail, providing mechanistic insights into pathogenesis (reviewed in Ref. [64]). Over 70% of the thousands of noncoding variants found to be associated with common diseases or traits in

Conclusions and perspectives

Recent large-scale efforts to elucidate TF binding specificities have made great headway in linking TFs to binding sites, yet the catalog of binding specificities remains incomplete. Emerging research has highlighted the involvement of numerous features beyond sequence motifs, including DNA shape and flanking sequences, which modulate binding site recognition. These features complicate TF specificity determination but have been consistently shown to improve predictions of binding sites [61•, 62

Conflict of interest statement

Nothing declared.

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

Acknowledgements

This work was funded by National Institutes of Health/National Human Genome Research Institute grant # R01 HG003985 (M.L.B.) and an A*STAR National Science Scholarship (K.H.K.). We apologize to authors whose work we could not cite because of space restrictions. We thank Julia Rogers and Luca Mariani for critical reading of this manuscript.

References (105)

  • I.K. Mann et al.

    CG methylated microarrays identify a novel methylated sequence bound by the CEBPB|ATF4 heterodimer that is active in vivo

    Genome Res

    (2013)
  • I. Dror et al.

    A widespread role of the motif environment in transcription factor binding across diverse protein families

    Genome Res

    (2015)
  • A.K. Tehranchi et al.

    Pooled ChIP-seq links variation in transcription factor binding to complex disease risk

    Cell

    (2016)
  • I. Mogno et al.

    Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants

    Genome Res

    (2013)
  • S.Q. Shen et al.

    Massively parallel cis-regulatory analysis in the mammalian central nervous system

    Genome Res

    (2016)
  • A. Isakova et al.

    Quantification of cooperativity in heterodimer–DNA binding improves the accuracy of binding specificity models

    J Biol Chem

    (2016)
  • J.M. Rodriguez et al.

    APPRIS: annotation of principal and alternative splice isoforms

    Nucleic Acids Res

    (2013)
  • V. Matys et al.

    TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes

    Nucleic Acids Res

    (2006)
  • G.D. Stormo

    Modeling the specificity of protein–DNA interactions

    Quant Biol

    (2013)
  • J.M. Franco-Zorrilla et al.

    DNA-binding specificities of plant transcription factors and their potential to define target genes

    Proc Natl Acad Sci U S A

    (2014)
  • M.A. Hume et al.

    UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein–DNA interactions

    Nucleic Acids Res

    (2015)
  • K. Narasimhan et al.

    Mapping and analysis of Caenorhabditis elegans transcription factor sequence specificities

    eLife

    (2015)
  • K.R. Nitta et al.

    Conservation of transcription factor binding specificities across 600 million years of bilateria evolution

    eLife

    (2015)
  • J.M. Vaquerizas et al.

    A census of human transcription factors: function, expression and evolution

    Nat Rev Genet

    (2009)
  • J. Wang et al.

    Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium

    Nucleic Acids Res

    (2013)
  • I.V. Kulakovskiy et al.

    HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models

    Nucleic Acids Res

    (2016)
  • A. Mathelier et al.

    JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles

    Nucleic Acids Res

    (2016)
  • R. Gordân et al.

    Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights

    Genome Biol

    (2011)
  • T. Siggers et al.

    Protein–DNA binding: complexities and multi-protein codes

    Nucleic Acids Res

    (2014)
  • J. Wang et al.

    Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors

    Genome Res

    (2012)
  • M. Slattery et al.

    Absence of a simple code: how transcription factors read the genome

    Trends Biochem Sci

    (2014)
  • J. Crocker et al.

    The soft touch: low-affinity transcription factor binding sites in development and evolution

    Curr Top Dev Biol

    (2016)
  • T.S. Furey

    ChIP-seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions

    Nat Rev Genet

    (2012)
  • I.V. Kulakovskiy et al.

    Deep and wide digging for binding motifs in ChIP-Seq data

    Bioinformatics

    (2010)
  • P. Machanick et al.

    MEME-ChIP: motif analysis of large DNA datasets

    Bioinformatics

    (2011)
  • ENCODE Project Consortium

    An integrated encyclopedia of DNA elements in the human genome

    Nature

    (2012)
  • H.S. Rhee et al.

    Comprehensive genome-wide protein–DNA interactions detected at single-nucleotide resolution

    Cell

    (2011)
  • A.P. Boyle et al.

    High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells

    Genome Res

    (2011)
  • J.D. Buenrostro et al.

    Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position

    Nat Methods

    (2013)
  • J.R. Hesselberth et al.

    Global mapping of protein–DNA interactions in vivo by digital genomic footprinting

    Nat Methods

    (2009)
  • L. Song et al.

    Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity

    Genome Res

    (2011)
  • P.G. Giresi et al.

    FAIRE (formaldehyde-assisted isolation of regulatory elements) isolates active regulatory elements from human chromatin

    Genome Res

    (2007)
  • M.H. Sung et al.

    Genome-wide footprinting: ready for prime time?

    Nat Methods

    (2016)
  • J. Vierstra et al.

    Genomic footprinting

    Nat Methods

    (2016)
  • H.H. He et al.

    Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification

    Nat Methods

    (2014)
  • M.H. Sung et al.

    DNase footprint signatures are dictated by factor dynamics and DNA sequence

    Mol Cell

    (2014)
  • P.M. Fordyce et al.

    De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis

    Nat Biotechnol

    (2010)
  • S.J. Maerkl et al.

    A systems approach to measuring the binding energy landscapes of transcription factors

    Science

    (2007)
  • M.F. Berger et al.

    Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities

    Nat Biotechnol

    (2006)
  • R. Gordân et al.

    Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape

    Cell Rep

    (2013)
  • Cited by (203)

    • Unexplored power of CRISPR-Cas9 in neuroscience, a multi-OMICs review

      2024, International Journal of Biological Macromolecules
    View all citing articles on Scopus
    *

    These authors contributed equally to this work.

    View full text