Key Points
-
In the past 10 years, the life sciences have seen a proliferation of electronic data, emerging from systems such as databases, text-mining technologies, high-throughput techniques and 'omics' platforms (for example, DNA microarray).
-
In this article, we review some of the recent developments in the field of electronic biology (eBiology), which uses these resources as a substrate for new drug discovery. As extensive reviews on data sets and tools already exist, we highlight how these resources can be applied directly to solve bottlenecks in the industry.
-
A number of approaches to the application of eBiology in drug discovery are identified, ranging from deep 'systems biology' to 'project-specific' analyses and focus on high-throughput techniques.
-
A set of examples are given, which look at the power of applying multiple resources simultaneously to build layers of evidence and end with the identification of novel drug targets. In these scenarios, the expert in that disease area is a key partner, without which the exercise is unlikely to succeed.
-
Although there are an increasing number of examples of target mining in the literature, there is also a need to consider how one then translates these biological hypotheses into drug discovery programmes. To this end, we consider the data sources and techniques that can be used to apply a business focus to the results of this mining.
-
As hypotheses turn into real programmes, we consider further workflows to support these later stages of discovery, looking at issues such as druggability, selectivity and understanding the action of a compound both in vitro and in vivo. Although these areas are traditionally addressed by computational chemists, there is much to be gained by the use of techniques and resources familiar to eBiologists.
-
Last, we discuss the future needs in eBiology and how the area must be primarily led through scientific creativity, rather than technical considerations.
Abstract
The vast range of in silico resources that are available in life sciences research hold much promise towards aiding the drug discovery process. To fully realize this opportunity, computational scientists must consider the practical issues of data integration and identify how best to apply these resources scientifically. In this article we describe in silico approaches that are driven towards the identification of testable laboratory hypotheses; we also address common challenges in the field. We focus on flexible, high-throughput techniques, which may be initiated independently of 'wet-lab' experimentation, and which may be applied to multiple disease areas. The utility of these approaches in drug discovery highlights the contribution that in silico techniques can make and emphasizes the need for collaboration between the areas of disease research and computational science.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Searls, D. B. Data integration — connecting the dots. Nature Biotech. 8, 844–845 (2003).
Searls, D. B. Data integration: challenges for drug discovery. Nature Rev. Drug Discov. 4, 45–58 (2005).
US Government Accountability Office. New Drug Development: Science, Business, Regulatory, and Intellectual Property Issues Cited as Hampering Drug Development Efforts. US Government Accountability Office web site [online], (2006).
Blagosklonny, M. V. & Pardee, A. B. Conceptual biology: unearthing the gems. Nature 416, 373 (2002).
Weeber, M. et al. Generating hypotheses by discovering implicit associations in the literature: a case report of a search for new potential therapeutic uses for thalidomide. J. Am. Med. Inform. Assoc. 10, 252–259 (2003). An important paper that highlights one of the eBiology fundamentals — how new hypotheses can be generated through mining techniques that focus on the identification of previously unknown indirect relationships between entities.
Smith U.M. et al. The transmembrane protein meckelin (MKS3) is mutated in Meckel–Gruber syndrome and the wpk rat. Nature Genet. 38, 191–196 (2006).
Laaksonen R, et al. A systems biology strategy reveals biological pathways and plasma biomarker candidates for potentially toxic statin-induced changes in muscle. PLoS ONE 1, e97 (2006).
Kumar, N. et al. Applying computational modeling to drug discovery and development. Drug Discov. Today 11, 806–811 (2006).
Cho, C. R. et al. The application of systems biology to drug discovery. Curr. Opin. Chem. Biol. 10, 294–302 (2006).
Butte, A. & Kohane, I. Creation and implications of a phenome–genome network. Nature Biotech. 24, 55–62 (2006). Describes how large-scale combinations of gene-expression annotation, phenotype and environmental data can reveal new insights into disease processes.
Kim, H. & Dafna, B. Modulation of signalling of sprouty: a developing story Nature Rev. Mol. Cell Biol. 5, 441–450 (2004).
Schoeberl, B. et al. Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors. Nature Biotech. 20, 370–375 (2002). Illustrates how systems approaches can track complex signalling pathways important to cancer. Highlights the essential need for in silico analysis of metabolic networks.
Galperin M. Y. The molecular biology database collection: 2007 update. Nucleic Acids Res. 35 (database issue), D3–D4 (2006).
Stromback, L. et al. Representing, storing and accessing molecular interaction data: a review of models and tools. Brief. Bioinformatics 7, 331–338 (2006).
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium Nature Genet. 25, 25–29 (2003).
Torr-Brown, S. Advances in knowledge management for pharmaceutical research and development. Curr. Opin. Drug Discov. Devel. 8, 316–322 (2005).
Neumann, E. K. & Quan, D. Biodash: a semantic web dashboard for drug development. Pac. Symp. Biocomp. 11, 176–187 (2006).
Blake, J. A. & Bult, C. J. Beyond the data deluge: data integration and bio-ontologies. J. Biomed. Inform. 39, 314–320 (2006).
Potts S. J., Edwards D. J. & Hoffman R. Challenges of target/compound data integration from disease to chemistry: a case study of dihydrofolate reductase inhibitors. Curr. Drug Discov. Technol. 2, 75–87 (2005).
Erhardt, R. A., Schneider, R. & Blaschke, C. Status of text-mining techniques applied to biomedical text. Drug Discov. Today 11, 315–325 (2006).
Hu, Y. et al. Analysis of genomic and proteomic data using advanced literature mining. J. Proteome Res. 2, 405–412 (2003). Demonstration of literature analysis on a genome-wide scale, which highlights how the use of powerful technologies across large repositories can be used to rapidly identify associations between biological concepts.
Wlodek, D. & Gonzales, M. Decreased energy levels can cause and sustain obesity. J. Theor. Biol. 225, 33–44 (2003).
Pospisil, P., Iyer, L. K., Adelstein, S. J. & Kassis, A. I. A combined approach to data mining of textual and structured data to identify cancer-related targets. BMC Bioinformatics 7, 354–365 (2006).
Stegmann, J. & Grohmann, G. Hypothesis generation guided by co-word clustering. Scientometrics 56, 111–135 (2003).
Wren, J. D., Bekeredjian, R., Stewart, J. A., Shohet, R. V. & Garner H. R. Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics 20, 389–398 (2004).
Carley, D. W. Drug repurposing: identify, develop and commercialize new uses for existing or abandoned drugs. Part I. Drugs 8, 306–309 (2005).
Marks, D. J. et al. Defective acute inflammation in Crohn's disease: a clinical investigation. Lancet 367, 668–678 (2006).
Habashi, J. P. et al. Losartan, an AT1 antagonist, prevents aortic aneurysm in a mouse model of Marfan syndrome. Science 312, 117–121 (2006).
Ozcan, U. et al. Chemical chaperones reduce ER stress and restore glucose homeostasis in a mouse model of type 2 diabetes. Science 313, 1137–1140 (2006).
Krauthammer, M. et al. Molecular triangulation: bridging linkage and molecular-network information for identifying candidate genes in Alzheimer's disease. Proc. Natl Acad. Sci. USA 101, 15148–15153 (2004).
Tiffin, N. et al. Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes. Nucleic Acids Res. 34, 3067–3081 (2006). A highly pertinent example of how genome mapping and multiple lines of data and evidence can be combined with disease association to identify potential novel gene linkages with diabetes, and thereby support wide-ranging hypothesis generation.
Owens, D., Grimley, J. & Kirkpatrick, P. Inhaled human insulin. Nature Rev. Drug Discov. 5, 371–372 (2006).
Fitzgerald, G. A. Anticipating change in drug development: the emerging era of translational medicine and therapeutics. Nature Rev. Drug Discov. 4, 815–818 (2005).
Liu J. J. et al. Multiclass cancer classification and biomarker discovery using GA-based algorithms. Bioinformatics 21, 2691–2697 (2005).
Mittleman, B. B. Biomarkers for systemic lupus erythematosus: has the right time finally arrived? Arthritis Res. Ther. 6, 223–224 (2004).
Baker, M. In biomarkers we trust? Nature Biotech. 23, 297–304 (2005).
Lindsay, M. Finding new drug targets in the 21st century. Drug Discov. Today 10, 1683–1687 (2005).
Kling, J. From hypertension to angina to Viagra. Modern Drug Discov. 1, 31–38 (1998).
Mullins, I. M. et al. Data mining and clinical data repositories: insights from a 667,000 patient data set. Comput. Biol. Med. 36, 1351–1377 (2006).
Reynolds G. P. et al. The 5-HT2C receptor and antipsychoticinduced weight gain — mechanisms and genetics. J. Psychopharmacol. 20, 15–18 (2006).
Song, J. et al. Development of homogeneous high-affinity agonist binding assays for 5-HT2 receptor subtypes. Assay Drug Dev. Technol. 3, 649–659 (2005).
Jensen, L. J., Saric, J. & Bork, P. Literature mining for the biologist: from information retrieval to biological discovery. Nature Rev. Genet. 7, 119–129 (2006).
Swanson, D. R. On the fragmentation of knowledge, the connection explosion, and assembling other people's ideas. ASIST Award of Merit acceptance speech. Bulletin ASIST 27, 12–14 (2001).
Hopkins, A. L. & Groom, C. R. The druggable genome. Nature Rev. Drug Discov. 1, 727–730 (2002).
Lui, J. & Rost, B. Target space for structural genomics revisited. Bioinformatics 18, 922–933 (2002).
Vinod, P. K., Konkimalla, B. & Chandra, N. In-silico pharmacodynamics: correlation of adverse effects of H2-antihistamines with histamine N-methyl transferase binding potential. Appl. Bioinformatics 5, 141–150 (2006).
Hajduk, P. J., Huth, J. R. & Fesik, S. W. Druggability indices for protein targets derived from NMR-based screening data. J. Med. Chem. 48, 2518–2525 (2005).
An, J., Totrov, M. & Abagyan R. Comprehensive identification of 'druggable' protein ligand binding sites. Genome Inform. 15, 31–41 (2004).
Cheng, A. et al. Structure-based maximal affinity model predicts small-molecule druggability analysis. Nature Biotech. 25, 71–75 (2007).
Nair, R. & Rost, B. LOCnet and LOCtarget: sub-cellular localization for structural genomics targets. Nucleic Acids Res. 32, W517–W521 (2004).
Froloff, N. et al. in Chemogenomics: Knowledge-based Approaches to Drug Discovery (ed. Jacoby, E.) 175–206 (Imperial College Press, London, 2006).
Birault, V. et al. Bringing kinases into focus: efficient drug design through the use of chemogenomic toolkits. Curr. Med. Chem. 13, 1735–1748 (2006).
Hug, H. et al. Ontology-based knowledge management of troglitazone-induced hepatotoxicity. Drug Discov. Today 9, 948–953 (2004).
Frye, S. V. Structure–activity relationship homology (SARAH): a conceptual framework for drug discovery in the genomic era. Chem. Biol. 6, R3–R7 (1999). An excellent example of a powerful generic workflow that reuses existing structure–activity relationship data to address the important issue of target selectivity.
Nettles J. H. et al. Bridging chemical and biological space: 'target fishing' using 2D and 3D molecular descriptors. J. Med. Chem. 49, 6802–6810 (2006). Describes a target fishing workflow that uses the wealth of information in large structure–activity relationship databases to understand molecular activities.
Schuffenhauer, A. & Jacoby, E. Annotating and mining the ligand-target chemogenomics knowledge space. Drug Discov. Today: BioSilico 2, 190–200 (2004).
Eriksson T., Bjorkman S., Roth B. & Hoglund P. Intravenous formulations of the enantiomers of thalidomide: pharmacokinetic and initial pharmacodynamic characterization in man. J. Pharm. Pharmacol. 52, 807–817 (2005).
Kalgutkar A. S. & Soglia J. R. Minimising the potential for metabolic activation in drug discovery. Expert Opin. Drug Metab. Toxicol. 1, 91–142 (2005).
Spinks, D. & Spinks, G. Serotonin reuptake inhibition: an update on current research strategies. Curr. Med. Chem. 9, 799–810 (2002).
Sanderson, D. M. & Earnshaw, C. G. Computer prediction of possible toxic action from chemical structure; The DEREK system. Human Exp. Toxicol. 10, 261–273 (1991).
Patlewicz, G., Rodford, R. & Walker, J. D. Quantitative structure-activity relationships for predicting mutagenicity and carcinogenicity. Environ. Toxicol. Chem. 22, 1885–1893 (2003).
Greene N. Computer systems for the prediction of toxicity: an update. Adv. Drug Deliv. Rev. 31, 417–431 (2002).
Mayne, J. T., Ku, W. W. & Kennedy, S. P. Informed toxicity assessment in drug discovery: systems-based toxicology. Curr. Opin. Drug Discov. Devel. 9, 75–83 (2006).
Niculescu, S. P., Atkinson, A., Hammond G. & Lewis M. Using fragment chemistry data mining and probabilistic neural networks in screening chemicals for acute toxicity to the fathead minnow. SAR QSAR Environ. Res. 15, 293–309 (2004).
Krejsa, C. M. et al. Predicting ADME properties and side effects: the BioPrint approach. Curr. Opin. Drug Discov. Devel. 6, 470–480 (2003).
White, A. C., Mueller, R. A., Gallavan, R. H., Aaron, S. & Wilson, A. G. A multiple in silico program approach for the prediction of mutagenicity from chemical structure. Mutat. Res. 5, 77–89 (2003).
Dearden, J. C. In silico prediction of drug toxicity. J. Comput. Aided Mol. Des. 17, 119–127 (2003).
Snyder, R. D. et al. Assessment of the sensitivity of the computational programs DEREK, TOPKAT, and MCASE in the prediction of the genotoxicity of pharmaceutical molecules. Environ. Mol. Mutagen. 43, 143–158 (2004).
Fliri, A. F., Loging, W. T., Thadeio, P. & Volkmann, R. A. Biological spectra analysis: linking biological activity profiles to molecular structure. Proc. Natl Acad. Sci. USA 102, 261–266 (2005). Describes a new approach to understanding the 'proteome interaction potential' for small molecules; termed 'biospectra', which allows for the grouping of compounds and their potential properties.
Fliri, A. F., Loging, W. T., Thadeio, P. & Volkmann, R. A. Analysis of drug-induced effect patterns linking structure and side effects of medicine's. Nature Chem. Biol. 1, 389–397 (2005).
Fliri, A. F., Loging, W. T., Thadeio, P. & Volkmann, R. A. Biospectra analysis: model proteome characterizations for linking molecular structure and biological response. J. Med. Chem. 48, 6918–6925 (2005).
Weggen, S. et al. A subset of NSAIDs lower amyloidogenic Aβ42 independently of cyclooxygenase activity. Nature 414, 212–216 (2001).
Lehmann, J. et al. Redesigning drug discovery. Nature 384 (Suppl. 6604), 1–5 (1996).
Lamb, J. et al. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 29, 1929–1935 (2006). A recent effort designed to make use of genomics data directly in a drug discovery application by providing the ability to screen compounds against genome-wide disease signatures.
Fielden, M. R. & Kolaja, K. L. The state-of-the-art in predictive toxicogenomics. Curr. Opin. Drug Discov. Devel. 9, 84–91 (2006).
Quackenbush, J. Top-down standards will not serve systems biology. Nature 440, 24 (2006).
Southern, E. M. Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol. 98, 503–517 (1975).
Kleppe, K., Ohtsuka, E., Kleppe, R., Molineux, I. & Khorana, H. G. Studies on polynucleotides. XCVI. Repair replications of short synthetic DNA's as catalyzed by DNA polymerases. J. Mol. Biol. 56, 341–361 (1971).
Mullis, K., et al. Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold Spring Harb. Symp. Quant. Biol. 51, 263–273 (1986).
Kell, D. B. & Oliver, S. G. Here is the evidence now what is the hypothesis? The complimentary role of inductive and hypothesis-driven science in the post-genomic era. Bioessays 26, 99–109 (2004).
Pao, M. L. Concepts of Information Retrieval (Libraries Unlimited, Englewood, Colorado, 1989).
Gund, P., Maliski, E. & Brown, F. The evolution of pharmaceutical informatics as a discipline Curr. Opin. Drug Discov. Devel. 9, 301–302 (2006)
Swanson, D. R. Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30, 7–18 (1986). A fundamental paper that describes how existing medical knowledge can harbour indirect linkages, which can be exploited to identify and generate novel, testable hypotheses.
Nasukawa, T. & Nagano, T. Text analysis and knowledge mining system. IBM Systems J. 40, 967–984 (2001).
Rhodes et al. Mining patents using molecular similarity search. Pac. Symp. Biocomput. 12, 304–315 (2007).
Acknowledgements
The authors would like to thank T. Turi, J. Lanfear, S. Campbell, I. Harrow, J. Keeling and the Pfizer PharmaMatrix team for their valuable insight and support while preparing this article. We also wish to gratefully acknowledge the input and suggestions from the reviewers. Finally, we wish to acknowledge the contribution of those who develop and maintain a vast landscape of in silico resources and apologize to those who we have been unable to cite in this article.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
Related links
DATABASES
OMIM
FURTHER INFORMATION
Mouse Genome Informatics database
Nature Biotechnology Community Consultation
Glossary
- In silico
-
A term used to describe experiments or experimental results that are held electronically.
- Unified medical language system
-
Controlled compendium of vocabularies across the spectrum of life sciences.
- Medical subject heading
-
The US National Library of Medicine's controlled vocabulary used for indexing articles for Medline.
- Single nucleotide polymorphism
-
A specific location in a DNA sequence at which different people can have a different DNA base. Differences in a single base could change the protein sequence, leading to disease (for example, sickle-cell disease), or have no known consequences.
- Translational medicine
-
The testing of novel therapeutic strategies (in humans) that were developed through basic laboratory experimentation. Observations taken 'from the bedside to the bench' also constitute translational medicine.
- RSS
-
RSS is a family of data formats used to publish frequently updated digital content, such as news feeds and alerts from many web sites on the internet.
- Quantitative structure–activity relationships
-
(QSAR). Mathematical relationships linking chemical structure and pharmacological activity in a quantitative manner for a series of compounds. Methods that can be used in QSAR include various regression and pattern-recognition techniques.
- Ames positivity
-
A biological assay that assesses DNA damage caused by small molecule drugs in bacterial cells.
Rights and permissions
About this article
Cite this article
Loging, W., Harland, L. & Williams-Jones, B. High-throughput electronic biology: mining information for drug discovery. Nat Rev Drug Discov 6, 220–230 (2007). https://doi.org/10.1038/nrd2265
Issue Date:
DOI: https://doi.org/10.1038/nrd2265
This article is cited by
-
DPubChem: a web tool for QSAR modeling and high-throughput virtual screening
Scientific Reports (2018)
-
Differential gene expression in disease: a comparison between high-throughput studies and the literature
BMC Medical Genomics (2017)
-
Characterization of anti-leukemia components from Indigo naturalis using comprehensive two-dimensional K562/cell membrane chromatography and in silico target identification
Scientific Reports (2016)
-
Identifying Tinnitus‐Related Genes Based on a Side‐Effect Network Analysis
CPT: Pharmacometrics & Systems Pharmacology (2014)
-
Rational drug repositioning guided by an integrated pharmacological network of protein, disease and drug
BMC Systems Biology (2012)