ABSTRACT
We demonstrate a computational process by which transcription factor binding sites can be elucidated using genome-wide expression and binding profiles. The profiles direct us to the intergenic locations likely to contain the promoter regions for a given factor. These sequences are multiply and locally aligned to give an anchor motif from which further characterization can take place. We present bases for and assumptions about the variability within these motifs which give rise to potentially more accurate motifs, capture complex binding sites built upon the basis motif, and eliminate the constraints of the currently employed promoter searching protocols. We also present a measure of motif quality based on the occurrence of the putative motifs in regions observed to contain the binding sites. The assumptions, motif generation, quality assessment and comparison allow the user as much control as their a priori knowledge allows.
- Bailey T. L., Elkan C., Unsupervised learning of mulitple motifs in biopolymers using expecation maximization. Machine Learning Journal, 1995; 21:51-83. Google ScholarDigital Library
- DeRisi, J. L., Iyer, V. R., Brown, P. O., Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 1997; 278 (5338):680-6Google Scholar
- Hughes J. D., Estep P. W., Tavazoie S., Church G. M. Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. Journal of Molecular Biology 2000; 296(5):1205-14Google Scholar
- Frech, K., Danescu-Mayer, J., Werner T., A novel method to develop highly specific models for regulatory units detects a new LTR in GenBank which contains a functional promoter. Journal of Molecular Biology 1997; 270(5):674-687Google Scholar
- Klingenhoff A., Frech K., Quandt K., Werner T., Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity. Bioinformatics 1999; 15 (3):180-6Google Scholar
- Lawrence, CE, Altschul SF, Boguski, MS, Liu JS, Neuwald, AF, Wootton, JC, Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 1993. 262(5131):208-14Google Scholar
- Liu X., Brutlag D. L., Liu J. S., BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In Proceedings of Pacific Symposium on Bicomputing 2001;127-38Google ScholarCross Ref
- Marmolstein, R., Carey, M., Ptashne, M., Harrison, S., DNA Recognition by Gal4: structure of a protein-DNA complex. Nature, 1992; 356:408-414Google Scholar
- Pesole G., Liuni S., D'Souza M., PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics 2000; 16(5):439-50Google Scholar
- Prestridge, D. S. SIGNAL SCAN: A computer program that scans DNA sequences for eukaryotic transcriptional elements. CABIOS 1991; 7, 203-206Google Scholar
- Prestidge, D. S. Predicting Pol II Promoter Sequences Using Transcirption Factor Binding Sites. Journal of Molecular Biology 1995; 249:923-32.Google Scholar
- Ren, B, Robert, F., Wyrick, J., Aparicio, O., Jennings, E. G., Simon, I., Zeitlinger, J., Shreiber, J., Hannett, N., Kanin, E., Volkert, T., Wilson, C., Bell, S. and Young, R. A., . Genome-wide Location and Function of DNA-binding Proteins. Science 2000; 290:2306-2309Google Scholar
- Rippe R. A., Brenner D. A., Tugores A., Techniques to measure nucleic acid-protein binding and specificity. Nuclear extract preparations, DNase I, footprinting, and mobility shift assays. Methodson Molecular Biology 2001; 160:459-79Google Scholar
- Roth, F. R., Hughes, J. D., Estep, P. E. & G. M. Church, Finding DNA Regulatory Motifs within Unaligned Non-Coding Sequences Clustered by Whole-Genome mRNA Quantitation. Nature Biotechnology 1998; 16(10):939-45Google Scholar
- Schug J., Overton, G. C., TESS: Transcription Element Search Software on the WWW in Technical Report CBIL-TR-1997-1001-v0.0, of the Computational Biology and Informatics Laboratory, School of Medicine, University of Pennsylvania, 1997Google Scholar
- Thompson J. D., Higgins D. G., Gibson T. J. ,CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. ; Nucleic Acids Research 1994; 22:4673-4680Google Scholar
- Waibel, A. H., Hanazawa, T., Hinton, G. E., Shikano, K.,Lang, K. J., . Pheneme Recognition Using Time-Delay Neural Networks. IEEE Transactions on Acoustic, Speech and Signal Processing 1989; 37 (3):328-339Google Scholar
- Wingender E., Chen X., Fricke E., Geffers R., Hehl R., Liebich I., Krull M., Matys V., Michael H., Ohnhauser R., Pruss M., Schacherer F., Thiele S., Urbach S., The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 2001; 29(1):281-3Google Scholar
- Zhu J., Zhang M. Q., SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 1999; 15(7-8):607-11Google Scholar
Index Terms
- A hypothesis driven approach to condition specific transcription factor binding site characterization in S.c.
Recommendations
Novel sequence-based method for identifying transcription factor binding sites in prokaryotic genomes
Motivation: Computational techniques for microbial genomic sequence analysis are becoming increasingly important. With next-generation sequencing technology and the human microbiome project underway, current sequencing capacity is significantly ...
Prediction of Cell Type Specific Transcription Factor Binding Site Occupancy
BCB '16: Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsWe propose a. machine learning approach to predict the particular cell type where a given transcription factor can bind a DNA sequence. The learning models are trained on the DNA sequences provided from the publicly available ChIPseq experiments of the ...
Comments