Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

High-throughput electronic biology: mining information for drug discovery

A Corrigendum to this article was published on 01 June 2007

Key Points

  • In the past 10 years, the life sciences have seen a proliferation of electronic data, emerging from systems such as databases, text-mining technologies, high-throughput techniques and 'omics' platforms (for example, DNA microarray).

  • In this article, we review some of the recent developments in the field of electronic biology (eBiology), which uses these resources as a substrate for new drug discovery. As extensive reviews on data sets and tools already exist, we highlight how these resources can be applied directly to solve bottlenecks in the industry.

  • A number of approaches to the application of eBiology in drug discovery are identified, ranging from deep 'systems biology' to 'project-specific' analyses and focus on high-throughput techniques.

  • A set of examples are given, which look at the power of applying multiple resources simultaneously to build layers of evidence and end with the identification of novel drug targets. In these scenarios, the expert in that disease area is a key partner, without which the exercise is unlikely to succeed.

  • Although there are an increasing number of examples of target mining in the literature, there is also a need to consider how one then translates these biological hypotheses into drug discovery programmes. To this end, we consider the data sources and techniques that can be used to apply a business focus to the results of this mining.

  • As hypotheses turn into real programmes, we consider further workflows to support these later stages of discovery, looking at issues such as druggability, selectivity and understanding the action of a compound both in vitro and in vivo. Although these areas are traditionally addressed by computational chemists, there is much to be gained by the use of techniques and resources familiar to eBiologists.

  • Last, we discuss the future needs in eBiology and how the area must be primarily led through scientific creativity, rather than technical considerations.

Abstract

The vast range of in silico resources that are available in life sciences research hold much promise towards aiding the drug discovery process. To fully realize this opportunity, computational scientists must consider the practical issues of data integration and identify how best to apply these resources scientifically. In this article we describe in silico approaches that are driven towards the identification of testable laboratory hypotheses; we also address common challenges in the field. We focus on flexible, high-throughput techniques, which may be initiated independently of 'wet-lab' experimentation, and which may be applied to multiple disease areas. The utility of these approaches in drug discovery highlights the contribution that in silico techniques can make and emphasizes the need for collaboration between the areas of disease research and computational science.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Breadth of drug discovery data available to eBiology.
Figure 2: Systematically scanning the human proteome for disease association.
Figure 3: Addressing compound selectivity.

Similar content being viewed by others

References

  1. Searls, D. B. Data integration — connecting the dots. Nature Biotech. 8, 844–845 (2003).

    Google Scholar 

  2. Searls, D. B. Data integration: challenges for drug discovery. Nature Rev. Drug Discov. 4, 45–58 (2005).

    CAS  Google Scholar 

  3. US Government Accountability Office. New Drug Development: Science, Business, Regulatory, and Intellectual Property Issues Cited as Hampering Drug Development Efforts. US Government Accountability Office web site [online], (2006).

  4. Blagosklonny, M. V. & Pardee, A. B. Conceptual biology: unearthing the gems. Nature 416, 373 (2002).

    CAS  PubMed  Google Scholar 

  5. Weeber, M. et al. Generating hypotheses by discovering implicit associations in the literature: a case report of a search for new potential therapeutic uses for thalidomide. J. Am. Med. Inform. Assoc. 10, 252–259 (2003). An important paper that highlights one of the eBiology fundamentals — how new hypotheses can be generated through mining techniques that focus on the identification of previously unknown indirect relationships between entities.

    PubMed  PubMed Central  Google Scholar 

  6. Smith U.M. et al. The transmembrane protein meckelin (MKS3) is mutated in Meckel–Gruber syndrome and the wpk rat. Nature Genet. 38, 191–196 (2006).

    CAS  PubMed  Google Scholar 

  7. Laaksonen R, et al. A systems biology strategy reveals biological pathways and plasma biomarker candidates for potentially toxic statin-induced changes in muscle. PLoS ONE 1, e97 (2006).

    PubMed  PubMed Central  Google Scholar 

  8. Kumar, N. et al. Applying computational modeling to drug discovery and development. Drug Discov. Today 11, 806–811 (2006).

    CAS  PubMed  Google Scholar 

  9. Cho, C. R. et al. The application of systems biology to drug discovery. Curr. Opin. Chem. Biol. 10, 294–302 (2006).

    CAS  PubMed  Google Scholar 

  10. Butte, A. & Kohane, I. Creation and implications of a phenome–genome network. Nature Biotech. 24, 55–62 (2006). Describes how large-scale combinations of gene-expression annotation, phenotype and environmental data can reveal new insights into disease processes.

    CAS  Google Scholar 

  11. Kim, H. & Dafna, B. Modulation of signalling of sprouty: a developing story Nature Rev. Mol. Cell Biol. 5, 441–450 (2004).

    CAS  Google Scholar 

  12. Schoeberl, B. et al. Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors. Nature Biotech. 20, 370–375 (2002). Illustrates how systems approaches can track complex signalling pathways important to cancer. Highlights the essential need for in silico analysis of metabolic networks.

    Google Scholar 

  13. Galperin M. Y. The molecular biology database collection: 2007 update. Nucleic Acids Res. 35 (database issue), D3–D4 (2006).

    PubMed  PubMed Central  Google Scholar 

  14. Stromback, L. et al. Representing, storing and accessing molecular interaction data: a review of models and tools. Brief. Bioinformatics 7, 331–338 (2006).

    PubMed  Google Scholar 

  15. Ashburner, M. et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium Nature Genet. 25, 25–29 (2003).

    Google Scholar 

  16. Torr-Brown, S. Advances in knowledge management for pharmaceutical research and development. Curr. Opin. Drug Discov. Devel. 8, 316–322 (2005).

    CAS  PubMed  Google Scholar 

  17. Neumann, E. K. & Quan, D. Biodash: a semantic web dashboard for drug development. Pac. Symp. Biocomp. 11, 176–187 (2006).

    Google Scholar 

  18. Blake, J. A. & Bult, C. J. Beyond the data deluge: data integration and bio-ontologies. J. Biomed. Inform. 39, 314–320 (2006).

    PubMed  Google Scholar 

  19. Potts S. J., Edwards D. J. & Hoffman R. Challenges of target/compound data integration from disease to chemistry: a case study of dihydrofolate reductase inhibitors. Curr. Drug Discov. Technol. 2, 75–87 (2005).

    CAS  PubMed  Google Scholar 

  20. Erhardt, R. A., Schneider, R. & Blaschke, C. Status of text-mining techniques applied to biomedical text. Drug Discov. Today 11, 315–325 (2006).

    CAS  PubMed  Google Scholar 

  21. Hu, Y. et al. Analysis of genomic and proteomic data using advanced literature mining. J. Proteome Res. 2, 405–412 (2003). Demonstration of literature analysis on a genome-wide scale, which highlights how the use of powerful technologies across large repositories can be used to rapidly identify associations between biological concepts.

    PubMed  Google Scholar 

  22. Wlodek, D. & Gonzales, M. Decreased energy levels can cause and sustain obesity. J. Theor. Biol. 225, 33–44 (2003).

    PubMed  Google Scholar 

  23. Pospisil, P., Iyer, L. K., Adelstein, S. J. & Kassis, A. I. A combined approach to data mining of textual and structured data to identify cancer-related targets. BMC Bioinformatics 7, 354–365 (2006).

    PubMed  PubMed Central  Google Scholar 

  24. Stegmann, J. & Grohmann, G. Hypothesis generation guided by co-word clustering. Scientometrics 56, 111–135 (2003).

    CAS  Google Scholar 

  25. Wren, J. D., Bekeredjian, R., Stewart, J. A., Shohet, R. V. & Garner H. R. Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20, 389–398 (2004).

    CAS  PubMed  Google Scholar 

  26. Carley, D. W. Drug repurposing: identify, develop and commercialize new uses for existing or abandoned drugs. Part I. Drugs 8, 306–309 (2005).

    PubMed  Google Scholar 

  27. Marks, D. J. et al. Defective acute inflammation in Crohn's disease: a clinical investigation. Lancet 367, 668–678 (2006).

    CAS  PubMed  Google Scholar 

  28. Habashi, J. P. et al. Losartan, an AT1 antagonist, prevents aortic aneurysm in a mouse model of Marfan syndrome. Science 312, 117–121 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Ozcan, U. et al. Chemical chaperones reduce ER stress and restore glucose homeostasis in a mouse model of type 2 diabetes. Science 313, 1137–1140 (2006).

    PubMed  PubMed Central  Google Scholar 

  30. Krauthammer, M. et al. Molecular triangulation: bridging linkage and molecular-network information for identifying candidate genes in Alzheimer's disease. Proc. Natl Acad. Sci. USA 101, 15148–15153 (2004).

    CAS  PubMed  Google Scholar 

  31. Tiffin, N. et al. Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes. Nucleic Acids Res. 34, 3067–3081 (2006). A highly pertinent example of how genome mapping and multiple lines of data and evidence can be combined with disease association to identify potential novel gene linkages with diabetes, and thereby support wide-ranging hypothesis generation.

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Owens, D., Grimley, J. & Kirkpatrick, P. Inhaled human insulin. Nature Rev. Drug Discov. 5, 371–372 (2006).

    CAS  Google Scholar 

  33. Fitzgerald, G. A. Anticipating change in drug development: the emerging era of translational medicine and therapeutics. Nature Rev. Drug Discov. 4, 815–818 (2005).

    CAS  Google Scholar 

  34. Liu J. J. et al. Multiclass cancer classification and biomarker discovery using GA-based algorithms. Bioinformatics 21, 2691–2697 (2005).

    CAS  PubMed  Google Scholar 

  35. Mittleman, B. B. Biomarkers for systemic lupus erythematosus: has the right time finally arrived? Arthritis Res. Ther. 6, 223–224 (2004).

    PubMed  PubMed Central  Google Scholar 

  36. Baker, M. In biomarkers we trust? Nature Biotech. 23, 297–304 (2005).

    CAS  Google Scholar 

  37. Lindsay, M. Finding new drug targets in the 21st century. Drug Discov. Today 10, 1683–1687 (2005).

    CAS  PubMed  Google Scholar 

  38. Kling, J. From hypertension to angina to Viagra. Modern Drug Discov. 1, 31–38 (1998).

    CAS  Google Scholar 

  39. Mullins, I. M. et al. Data mining and clinical data repositories: insights from a 667,000 patient data set. Comput. Biol. Med. 36, 1351–1377 (2006).

    PubMed  Google Scholar 

  40. Reynolds G. P. et al. The 5-HT2C receptor and antipsychoticinduced weight gain — mechanisms and genetics. J. Psychopharmacol. 20, 15–18 (2006).

    PubMed  Google Scholar 

  41. Song, J. et al. Development of homogeneous high-affinity agonist binding assays for 5-HT2 receptor subtypes. Assay Drug Dev. Technol. 3, 649–659 (2005).

    CAS  PubMed  Google Scholar 

  42. Jensen, L. J., Saric, J. & Bork, P. Literature mining for the biologist: from information retrieval to biological discovery. Nature Rev. Genet. 7, 119–129 (2006).

    CAS  PubMed  Google Scholar 

  43. Swanson, D. R. On the fragmentation of knowledge, the connection explosion, and assembling other people's ideas. ASIST Award of Merit acceptance speech. Bulletin ASIST 27, 12–14 (2001).

    Google Scholar 

  44. Hopkins, A. L. & Groom, C. R. The druggable genome. Nature Rev. Drug Discov. 1, 727–730 (2002).

    CAS  Google Scholar 

  45. Lui, J. & Rost, B. Target space for structural genomics revisited. Bioinformatics 18, 922–933 (2002).

    Google Scholar 

  46. Vinod, P. K., Konkimalla, B. & Chandra, N. In-silico pharmacodynamics: correlation of adverse effects of H2-antihistamines with histamine N-methyl transferase binding potential. Appl. Bioinformatics 5, 141–150 (2006).

    CAS  PubMed  Google Scholar 

  47. Hajduk, P. J., Huth, J. R. & Fesik, S. W. Druggability indices for protein targets derived from NMR-based screening data. J. Med. Chem. 48, 2518–2525 (2005).

    CAS  PubMed  Google Scholar 

  48. An, J., Totrov, M. & Abagyan R. Comprehensive identification of 'druggable' protein ligand binding sites. Genome Inform. 15, 31–41 (2004).

    CAS  PubMed  Google Scholar 

  49. Cheng, A. et al. Structure-based maximal affinity model predicts small-molecule druggability analysis. Nature Biotech. 25, 71–75 (2007).

    Google Scholar 

  50. Nair, R. & Rost, B. LOCnet and LOCtarget: sub-cellular localization for structural genomics targets. Nucleic Acids Res. 32, W517–W521 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Froloff, N. et al. in Chemogenomics: Knowledge-based Approaches to Drug Discovery (ed. Jacoby, E.) 175–206 (Imperial College Press, London, 2006).

    Google Scholar 

  52. Birault, V. et al. Bringing kinases into focus: efficient drug design through the use of chemogenomic toolkits. Curr. Med. Chem. 13, 1735–1748 (2006).

    CAS  PubMed  Google Scholar 

  53. Hug, H. et al. Ontology-based knowledge management of troglitazone-induced hepatotoxicity. Drug Discov. Today 9, 948–953 (2004).

    PubMed  Google Scholar 

  54. Frye, S. V. Structure–activity relationship homology (SARAH): a conceptual framework for drug discovery in the genomic era. Chem. Biol. 6, R3–R7 (1999). An excellent example of a powerful generic workflow that reuses existing structure–activity relationship data to address the important issue of target selectivity.

    CAS  PubMed  Google Scholar 

  55. Nettles J. H. et al. Bridging chemical and biological space: 'target fishing' using 2D and 3D molecular descriptors. J. Med. Chem. 49, 6802–6810 (2006). Describes a target fishing workflow that uses the wealth of information in large structure–activity relationship databases to understand molecular activities.

    CAS  PubMed  Google Scholar 

  56. Schuffenhauer, A. & Jacoby, E. Annotating and mining the ligand-target chemogenomics knowledge space. Drug Discov. Today: BioSilico 2, 190–200 (2004).

    CAS  Google Scholar 

  57. Eriksson T., Bjorkman S., Roth B. & Hoglund P. Intravenous formulations of the enantiomers of thalidomide: pharmacokinetic and initial pharmacodynamic characterization in man. J. Pharm. Pharmacol. 52, 807–817 (2005).

    Google Scholar 

  58. Kalgutkar A. S. & Soglia J. R. Minimising the potential for metabolic activation in drug discovery. Expert Opin. Drug Metab. Toxicol. 1, 91–142 (2005).

    CAS  PubMed  Google Scholar 

  59. Spinks, D. & Spinks, G. Serotonin reuptake inhibition: an update on current research strategies. Curr. Med. Chem. 9, 799–810 (2002).

    CAS  PubMed  Google Scholar 

  60. Sanderson, D. M. & Earnshaw, C. G. Computer prediction of possible toxic action from chemical structure; The DEREK system. Human Exp. Toxicol. 10, 261–273 (1991).

    CAS  Google Scholar 

  61. Patlewicz, G., Rodford, R. & Walker, J. D. Quantitative structure-activity relationships for predicting mutagenicity and carcinogenicity. Environ. Toxicol. Chem. 22, 1885–1893 (2003).

    CAS  PubMed  Google Scholar 

  62. Greene N. Computer systems for the prediction of toxicity: an update. Adv. Drug Deliv. Rev. 31, 417–431 (2002).

    Google Scholar 

  63. Mayne, J. T., Ku, W. W. & Kennedy, S. P. Informed toxicity assessment in drug discovery: systems-based toxicology. Curr. Opin. Drug Discov. Devel. 9, 75–83 (2006).

    CAS  PubMed  Google Scholar 

  64. Niculescu, S. P., Atkinson, A., Hammond G. & Lewis M. Using fragment chemistry data mining and probabilistic neural networks in screening chemicals for acute toxicity to the fathead minnow. SAR QSAR Environ. Res. 15, 293–309 (2004).

    CAS  PubMed  Google Scholar 

  65. Krejsa, C. M. et al. Predicting ADME properties and side effects: the BioPrint approach. Curr. Opin. Drug Discov. Devel. 6, 470–480 (2003).

    CAS  PubMed  Google Scholar 

  66. White, A. C., Mueller, R. A., Gallavan, R. H., Aaron, S. & Wilson, A. G. A multiple in silico program approach for the prediction of mutagenicity from chemical structure. Mutat. Res. 5, 77–89 (2003).

    Google Scholar 

  67. Dearden, J. C. In silico prediction of drug toxicity. J. Comput. Aided Mol. Des. 17, 119–127 (2003).

    CAS  PubMed  Google Scholar 

  68. Snyder, R. D. et al. Assessment of the sensitivity of the computational programs DEREK, TOPKAT, and MCASE in the prediction of the genotoxicity of pharmaceutical molecules. Environ. Mol. Mutagen. 43, 143–158 (2004).

    CAS  PubMed  Google Scholar 

  69. Fliri, A. F., Loging, W. T., Thadeio, P. & Volkmann, R. A. Biological spectra analysis: linking biological activity profiles to molecular structure. Proc. Natl Acad. Sci. USA 102, 261–266 (2005). Describes a new approach to understanding the 'proteome interaction potential' for small molecules; termed 'biospectra', which allows for the grouping of compounds and their potential properties.

    CAS  PubMed  Google Scholar 

  70. Fliri, A. F., Loging, W. T., Thadeio, P. & Volkmann, R. A. Analysis of drug-induced effect patterns linking structure and side effects of medicine's. Nature Chem. Biol. 1, 389–397 (2005).

    CAS  Google Scholar 

  71. Fliri, A. F., Loging, W. T., Thadeio, P. & Volkmann, R. A. Biospectra analysis: model proteome characterizations for linking molecular structure and biological response. J. Med. Chem. 48, 6918–6925 (2005).

    CAS  PubMed  Google Scholar 

  72. Weggen, S. et al. A subset of NSAIDs lower amyloidogenic Aβ42 independently of cyclooxygenase activity. Nature 414, 212–216 (2001).

    CAS  PubMed  Google Scholar 

  73. Lehmann, J. et al. Redesigning drug discovery. Nature 384 (Suppl. 6604), 1–5 (1996).

    Google Scholar 

  74. Lamb, J. et al. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 29, 1929–1935 (2006). A recent effort designed to make use of genomics data directly in a drug discovery application by providing the ability to screen compounds against genome-wide disease signatures.

    Google Scholar 

  75. Fielden, M. R. & Kolaja, K. L. The state-of-the-art in predictive toxicogenomics. Curr. Opin. Drug Discov. Devel. 9, 84–91 (2006).

    CAS  PubMed  Google Scholar 

  76. Quackenbush, J. Top-down standards will not serve systems biology. Nature 440, 24 (2006).

    CAS  PubMed  Google Scholar 

  77. Southern, E. M. Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol. 98, 503–517 (1975).

    CAS  PubMed  Google Scholar 

  78. Kleppe, K., Ohtsuka, E., Kleppe, R., Molineux, I. & Khorana, H. G. Studies on polynucleotides. XCVI. Repair replications of short synthetic DNA's as catalyzed by DNA polymerases. J. Mol. Biol. 56, 341–361 (1971).

    CAS  PubMed  Google Scholar 

  79. Mullis, K., et al. Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold Spring Harb. Symp. Quant. Biol. 51, 263–273 (1986).

    CAS  PubMed  Google Scholar 

  80. Kell, D. B. & Oliver, S. G. Here is the evidence now what is the hypothesis? The complimentary role of inductive and hypothesis-driven science in the post-genomic era. Bioessays 26, 99–109 (2004).

    Google Scholar 

  81. Pao, M. L. Concepts of Information Retrieval (Libraries Unlimited, Englewood, Colorado, 1989).

    Google Scholar 

  82. Gund, P., Maliski, E. & Brown, F. The evolution of pharmaceutical informatics as a discipline Curr. Opin. Drug Discov. Devel. 9, 301–302 (2006)

    CAS  Google Scholar 

  83. Swanson, D. R. Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7–18 (1986). A fundamental paper that describes how existing medical knowledge can harbour indirect linkages, which can be exploited to identify and generate novel, testable hypotheses.

    CAS  PubMed  Google Scholar 

  84. Nasukawa, T. & Nagano, T. Text analysis and knowledge mining system. IBM Systems J. 40, 967–984 (2001).

    Google Scholar 

  85. Rhodes et al. Mining patents using molecular similarity search. Pac. Symp. Biocomput. 12, 304–315 (2007).

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank T. Turi, J. Lanfear, S. Campbell, I. Harrow, J. Keeling and the Pfizer PharmaMatrix team for their valuable insight and support while preparing this article. We also wish to gratefully acknowledge the input and suggestions from the reviewers. Finally, we wish to acknowledge the contribution of those who develop and maintain a vast landscape of in silico resources and apologize to those who we have been unable to cite in this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William Loging.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

DATABASES

OMIM

Alzheimer disease

Crohn disease

Marfan syndrome

Parkinson disease

type 2 diabetes

FURTHER INFORMATION

BioCarta

Biomarkers.org

Ensembl

Gene Expression Omnibus

Gene Ontology

Mouse Genome Informatics database

Nature Biotechnology Community Consultation

Online Mendelian Inheritance in Man

Protein Data Bank

Simplified Molecular Input Line Entry Specification

Glossary

In silico

A term used to describe experiments or experimental results that are held electronically.

Unified medical language system

Controlled compendium of vocabularies across the spectrum of life sciences.

Medical subject heading

The US National Library of Medicine's controlled vocabulary used for indexing articles for Medline.

Single nucleotide polymorphism

A specific location in a DNA sequence at which different people can have a different DNA base. Differences in a single base could change the protein sequence, leading to disease (for example, sickle-cell disease), or have no known consequences.

Translational medicine

The testing of novel therapeutic strategies (in humans) that were developed through basic laboratory experimentation. Observations taken 'from the bedside to the bench' also constitute translational medicine.

RSS

RSS is a family of data formats used to publish frequently updated digital content, such as news feeds and alerts from many web sites on the internet.

Quantitative structure–activity relationships

(QSAR). Mathematical relationships linking chemical structure and pharmacological activity in a quantitative manner for a series of compounds. Methods that can be used in QSAR include various regression and pattern-recognition techniques.

Ames positivity

A biological assay that assesses DNA damage caused by small molecule drugs in bacterial cells.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Loging, W., Harland, L. & Williams-Jones, B. High-throughput electronic biology: mining information for drug discovery. Nat Rev Drug Discov 6, 220–230 (2007). https://doi.org/10.1038/nrd2265

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrd2265

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing