The Cinderella of Biological Data Integration: Addressing Some of the Challenges of Entity and Relationship Mining from Patent Sources

Suriyawongkul, Ithipol; Southan, Christopher; Muresan, Sorel

doi:10.1007/978-3-642-15120-0_9

The Cinderella of Biological Data Integration: Addressing Some of the Challenges of Entity and Relationship Mining from Patent Sources

Ithipol Suriyawongkul^21,22,
Christopher Southan²² &
Sorel Muresan²²

Conference paper

561 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6254))

Abstract

Most of the global corpus of medicinal chemistry data is only published in patents. However, extracting this from patent documents and subsequent integration with literature and database sources poses unique challenges. This work presents the investigation of an extensive full-text patent resource, including automated name-to-chemical structure conversion, licensed by AstraZeneca via a consortium arrangement with IBM. Our initial focus was identifying protein targets in patent titles linked to extracted bioactive compounds. We benchmarked target recognition strategies against target-assay-compound relationships manually curated from patents by GVKBIO. By analysis of word frequencies and protein names we assessed the false-negative problem of targets not specified in titles and false-positives from non-target proteins in titles. We also examined the time-signals for selected target and non-target names by year of patent publication. Our results exemplify problems and some solutions for extracting data from this source.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, P., Searls, D.: Can literature analysis identify innovation drivers in drug discovery? Nature reviews. Drug discovery 8, 865–878 (2009)
Article Google Scholar
Southan, C., Várkonyi, P., Muresan, S.: Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds. Journal of cheminformatics 1, 10 (2009)
Google Scholar
Patent Abstracts, http://srs.ebi.ac.uk
CiteXplore, http://www.ebi.ac.uk/citexplore
Free Patents Online, http://www.freepatentsonline.com
Google Patents, http://www.google.com/patents
SureChem, http://www.surechem.org
Webber, P.: A guide to drug discovery. Protecting your inventions: the patent system. Nature reviews. Drug discovery 2, 823–830 (2003)
Google Scholar
Grandjean, N., Charpiot, B., Pena, C., Peitsch, M.: Competitive intelligence and patent analysis in drug discovery: Mining the competitive knowledge bases and patents. Drug Discovery Today: Technologies 2, 211–215 (2005)
Article Google Scholar
Granstrand, O.: The economics and managment of intellectual property: towards intellectual capitalism. Edward Elgar Publishing Limited (2000)
Google Scholar
Grubb, P.: Patents for chemicals, phamaceuticals, and biotechnology. Oxford Univ. Press, New York (2004)
Google Scholar
Chen, Y., Spangler, S., Kreulen, J., Boyer, S., Griffin, T.D., Alba, A., Behal, A., He, B., Kato, L., Lelescu, A., Zhang, L., Kieliszewski, C.: SIMPLE: A Strategic Information Mining Platform for IP Excellence, San Jose, CA, USA (2009)
Google Scholar
Brecher, J.: Name=struct: A practical approach to the sorry state of real-life chemical nomenclature. Journal of Chemical Information and Computer Science 39, 943–950 (1999)
Google Scholar
Rhodes, J., Boyer, S., Kreulen, J., Chen, Y., Ordonez, P.: Mining patents using molecular similarity search. In: Pacific Symposium on Biocomputing 2007, Maui, Hawaii, p. 304 (2007)
Google Scholar
Sarma, J., Radha, K.: Database systems for knowledge-based discovery. In: Chemogenomics: Methods and Applications, vol. 575, pp. 159–172 (2009)
Google Scholar
Wordle, http://www.wordle.net
Stop words. Department of Computer Science, Cornell Univesity, ftp://ftp.cs.cornell.edu/pub/smart/english.stop
Banville, D.: Mining chemical and biological information from the drug literature. Current Opinion in Drug Discovery & Development 12(3), 376–387 (2009)
Google Scholar
Krallinger, M., Morgan, A., Smith, L., Leitner, F., Tanabe, L., Wilbur, J., Hirschman, L., Valencia, A.: Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge. Genome Biology 9, S1 (2008)
Article Google Scholar
Cohen, A., Hersh, W.: A survey of current work in biomedical text mining. Briefings in Bioinformatics 6(2), 57–71 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Chalmers University of Technology, Gothenburg, Sweden
Ithipol Suriyawongkul
Computational Chemistry Section, Global Compound Sciences, DECS, AstraZeneca R&D, Mölndal, Sweden
Ithipol Suriyawongkul, Christopher Southan & Sorel Muresan

Authors

Ithipol Suriyawongkul
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Southan
View author publications
You can also search for this author in PubMed Google Scholar
Sorel Muresan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Science, Linköpings universitet, 581 83, Linköping, Sweden
Patrick Lambrix
Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, 412 96, Gothenburg,, Sweden
Graham Kemp

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suriyawongkul, I., Southan, C., Muresan, S. (2010). The Cinderella of Biological Data Integration: Addressing Some of the Challenges of Entity and Relationship Mining from Patent Sources. In: Lambrix, P., Kemp, G. (eds) Data Integration in the Life Sciences. DILS 2010. Lecture Notes in Computer Science(), vol 6254. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15120-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-15120-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15119-4
Online ISBN: 978-3-642-15120-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics