Skip to main content
Log in

Information and communication theory in molecular biology

  • Original Paper
  • Published:
Electrical Engineering Aims and scope Submit manuscript

Abstract

The DNA sequencing efforts of the past years together with rapid progress in sequencing technology have generated a huge amount of sequence data available in public molecular databases. This recent development makes it statistically feasible to apply universal concepts from Shannon’s information theory to problems in molecular biology, e.g to use mutual information for gene mapping and phylogenetic classification. Additionally, the genetic information in the cell is continuously subject to mutations. However, it has to be passed from generation to generation with high fidelity, raising the question of existence of error protection and correction mechanisms similar to those used in technical communication systems. Finally, better understanding of genetic information processing on the molecular level in the cell can be acquired by looking for parallels to well established models in communication theory, e.g. there exist analogies between gene expression and frame synchronization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. The international hapmap project (2003). Nature 426(6968), 789–796. 1476–4687 (electronic) Journal Article Multicenter Study

    Google Scholar 

  2. Barker R (1953) Group synchronization of binary digital systems. In: Jackson W (ed) Communication theory. Butterworths, London, pp 273–287

    Google Scholar 

  3. Battail G (2006) Introduction to Biosemiotics: information theory and error-correcting codes in genetics and biological evolution. Springer, Heidelberg

    Google Scholar 

  4. Cilibrasi R, Vitani PMB (2005) Clustering by compression. IEEE Trans Inf Theory 51(4):1523-1545

    Article  Google Scholar 

  5. Cox T, Cox M (1994) Multidimensional scaling. Chapman & Hall, London

    MATH  Google Scholar 

  6. Dawy Z, Goebel B, Hagenauer J, Andreoli C, Meitinger T, Mueller JC (2006) Gene mapping and marker clustering using shannon’s mutual information. IEEE/ACM Trans Comput Biol Bioinform 3(1):47–56

    Article  Google Scholar 

  7. Dawy Z, Hagenauer J, Hanus P, Mueller JC (2005) Mutual information based distance measures for classification and content recognition with applications to genetics. In: Proceedings of the ICC 2005

  8. Dermitzakis ET, Reymond A, Antonarakis SE (2005) Conserved non-genic sequences—an unexpected feature of mammalian genomes. Nature Rev Gene 6:151–157 URL http://www.dx.doi.org/10.1038/nrg1527

    Google Scholar 

  9. Dermitzakis ET, Reymond A, Scamuffa N, Ucla C, Kirkness E, Rossier C, Antonarakis SE (2003) Evolutionary discrimination of mammalian conserved non-genic sequences (cngs). Science 302:1033–1035

    Article  Google Scholar 

  10. Freeland S, Wu T, Keulmann N (2003) The case for an error minimizing standard genetic code. Orig Life Evol Biosph 33(4–5):457–77

    Article  Google Scholar 

  11. Hanus P, Dingel J, Hagenauer J, Mueller J (2005) An alternative method for detecting conserved regions in multiple species. German conference on bioinformatics, Hamburg, p 64

  12. Hayes B (1998) The Invention of the genetic code. Am Sci 86(1):8–14

    Article  Google Scholar 

  13. Lewin B (2004) GENES VIII. Pearson Prentice Hall Upper Saddle River, NJ

  14. Li M, Badger JH, Chen X, Kwong S, Kearney P, Zhang H (2001) An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17(2):149–154. doi: 10.1093/bioinformatics/17.2.149

    Article  Google Scholar 

  15. Lolle SJ, Victor JL, Young JM, Pruitt RE (2005) Genome-wide non-mendelian inheritance of extra-genomic information in arabidopsis. Nature 434(7032):505–509

    Article  Google Scholar 

  16. Lueke H (1992) Korrelationssignale. Springer, Berlin

    Google Scholar 

  17. Margulies EH, Blanchette M, Program NCS, Haussler D, Green ED (2003) Identification and characterization of multi-species conserved sequences. Genome Res. 13(12):2507–2518 URL http://www.genome.org/cgi/content/abstract/13/12/2507

    Google Scholar 

  18. Mueller J, Bresch E, Dawy Z, Bettecken T, Meitinger T, Hagenauer J (2003) Shannon’s mutual information applied to population-based gene mapping. Am J Hum Genet 73 (5 suppl) 610

    Google Scholar 

  19. Nirenberg MW, Matthaei JH (1961) The dependence of cell-free protein synthesis in e. coli upon naturally occurring or synthetic polyribonucleotides. Proc Natl Acad Sci USA 47: 1588–602. 0027–8424 (print)

    Google Scholar 

  20. Sarkis M, Goebel B, Dawy Z, Hagenauer J, Hanus P, Mueller JC (2007) Gene mapping of complex diseases—a comparison of methods from statistics informnation theory, and signal processing. IEEE Sign Proc Magaz 24(1):83–90

    Article  Google Scholar 

  21. Shannon CE (1940) An algebra for theoretical genetics. Ph.D. thesis, Massachusetts Institute of Technology, Dept. of Mathematics

  22. Shannon CE (1948) A mathematical theory of communication. Bell Syst Techn J 27:379–423

    MathSciNet  Google Scholar 

  23. Shomer B, Yagil G (1999) Long W tracts are over-represented in the Escherichia coli and Haemophilus influenza genomes. Nucleic Acids Res 27(22):4491–4500

    Article  Google Scholar 

  24. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15(8):1034–1050 URL http://www.genome.org/cgi/content/abstract/15/8/1034

    Google Scholar 

  25. Ueda H, Howson J, Esposito L, Heward J, Snook H, Chamberlain G, Rainbow D, Hunter K, Smith A, Genova GD, Herr M, Dahlmand I, Payne F, Smyth D, Lowe C, Twells R, Howlett S, Healy B, Nutland S, Rance H, Everett V, Smink L, Lam A, Cordell H, Walker N, Bordin C, Hulme J, Motzo C, Cucca F, Hess J, Metzker M, Rogers J, Gregory S, Allahabadia A, Nithiyananthan R, Tuomilehto-Wolf E, Tuomilehto J, Bingley P, Gillespie K, Undlien D, Ronningen K, Guja C, Ionescu-Tirgoviste C, Savage D, Maxwell A, Carson D, Patterson C, Franklyn J, Clayton D, Peterson L, Wicker L, Todd J, Gough S (2003) Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature 423(6939):506–511

    Google Scholar 

  26. Ureta-Vidal A, Ettwiller L, Birney E (2003) Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat Rev Genet 4(4):251–262. 1471-0056 (print)

    Google Scholar 

  27. Wallace DC, Lott MT, Kogelnik AM, Brown MD, Navathe SB (1999) MITOMAP: a human mitochondrial genome database. URL http://www.dhgp.de/

  28. Whelan S, Li P, Goldman N (2001) Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Gene 17(5):262–272

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pavol Hanus.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hanus, P., Goebel, B., Dingel, J. et al. Information and communication theory in molecular biology. Electr Eng 90, 161–173 (2007). https://doi.org/10.1007/s00202-007-0062-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00202-007-0062-6

Keywords

Navigation