Automatic Inference of Graph Transformation Rules Using the Cyclic Nature of Chemical Reactions

Flamm, Christoph; Merkle, Daniel; Stadler, Peter F.; Thorsen, Uffe

doi:10.1007/978-3-319-40530-8_13

Automatic Inference of Graph Transformation Rules Using the Cyclic Nature of Chemical Reactions

Christoph Flamm^16,22,
Daniel Merkle¹⁵,
Peter F. Stadler^{16,17,18,19,20,21} &
…
Uffe Thorsen¹⁵

Conference paper
First Online: 22 June 2016

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9761))

Abstract

Graph transformation systems have the potential to be realistic models of chemistry, provided a comprehensive collection of reaction rules can be extracted from the body of chemical knowledge. A first key step for rule learning is the computation of atom-atom mappings, i.e., the atom-wise correspondence between products and educts of all published chemical reactions. This can be phrased as a maximum common edge subgraph problem with the constraint that transition states must have cyclic structure. We describe a search tree method well suited for small edit distance and an integer linear program best suited for general instances and demonstrate that it is feasible to compute atom-atom maps at large scales using a manually curated database of biochemical reactions as an example. In this context we address the network completion problem.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Akutsu, T.: Efficient extraction of mapping rules of atoms from enzymatic reaction data. J. Comp. Biol. 11, 449–462 (2004)
Article Google Scholar
Akutsu, T., Tamura, T.: A polynomial-time algorithm for computing the maximum common connected edge subgraph of outerplanar graphs of bounded degree. Algorithms 6(1), 119 (2013)
Article MathSciNet Google Scholar
Andersen, J.L., Flamm, C., Merkle, D., Stadler, P.F.: 50 shades of rule composition. In: Fages, F., Piazza, C. (eds.) FMMB 2014. LNCS, vol. 8738, pp. 117–135. Springer, Heidelberg (2014)
Google Scholar
Bahiense, L., Mani, G., Piva, B., de Souza, C.C.: The maximum common edge subgraph problem: a polyhedral investigation. Discrete Appl. Math. 160(18), 2523–2541 (2012). v Latin American Algorithms, Graphs, and Optimization Symposium Gramado, Brazil, 2009
Article MathSciNet MATH Google Scholar
Benkö, G., Flamm, C., Stadler, P.F.: A graph-based toy model of chemistry. J. Chem. Inf. Comput. Sci. 43, 1085–1093 (2003). presented at MCC 2002, Dubrovnik CRO, June 2002; SFI # 02–09-045
Article Google Scholar
Biggs, M.B., Papin, J.A.: Metabolic network-guided binning of metagenomic sequence fragments. Bioinformatics (2015)
Google Scholar
Breitling, R., Vitkup, D., Barrett, M.P.: New surveyor tools for charting microbial metabolic maps. Nat. Rev. Microbiol. 6, 156–161 (2008)
Article Google Scholar
Burkard, R., ela, E., Pardalos, P., Pitsoulis, L.: The quadratic assignment problem. In: Du, D.Z., Pardalos, P. (eds.) Handbook of Combinatorial Optimization, pp. 1713–1809. Springer, US (1999)
Google Scholar
Chen, W.L., Chen, D.Z., Taylor, K.T.: Automatic reaction mapping and reaction center detection. WIREs Comput. Mol. Sci. 3, 560–593 (2013)
Article Google Scholar
Cordella, L.P., Pasquale, F., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)
Article Google Scholar
Crabtree, J., Mehta, D., Kouri, T.: An open-source java platform for automated reaction mapping. J. Chem. Inf. Model 50, 1751–1756 (2010)
Article Google Scholar
Degenhardt, J., Köllner, T.G., Gershenzon, J.: Monoterpene and sesquiterpene synthases and the origin of terpene skeletal diversity in plants. Phytochem 70, 1621–1637 (2009)
Article Google Scholar
Ehrlich, H.C., Rarey, M.: Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review. WIREs Comput. Mol. Sci. 1, 68–79 (2011). doi:10.1002/wcms.5
Article Google Scholar
Feist, A.M., Herrgøard, M.J., Thiele, I., Reed, J.L., Palsson, B.Ø.: Reconstruction of biochemical networks in microorganisms. Nat. Rev. Microbiol. 7, 129–143 (2009)
Article Google Scholar
First, E.L., Gounaris, C.E., Floudas, C.A.: Stereochemically consistent reaction mapping and identification of multiple reaction mechanisms through integer linear optimization. J. Chem. Inf. Model 52, 84–92 (2012)
Article Google Scholar
Fujita, S.: Description of organic reactions based on imaginary transition structures. 1. Introduction of new concepts. J. Chem. Inf. Comput. Sci. 26, 205–212 (1986)
Article Google Scholar
Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113–129 (2010)
Article MathSciNet Google Scholar
Hendrickson, J.B.: Comprehensive system for classification and nomenclature of organic reactions. J. Chem. Inf. Comput. Sci. 37, 852–860 (1997)
Article Google Scholar
Herges, R.: Organizing principle of complex reactions and theory of coarctate transition states. Angew. Chem. Int. Ed. 33, 255–276 (1994)
Article Google Scholar
Jeltsch, E., Kreowski, H.J.: Grammatical inference based on hyperedge replacement. In: Ehrig, H., Kreowski, H.-J., Rozenberg, G. (eds.) Graph Grammars 1990. LNCS, vol. 532, pp. 461–474. Springer, Heidelberg (1991)
Chapter Google Scholar
Justice, D., Hero, A.: A binary linear programming formulation of the graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1200–1214 (2006)
Article Google Scholar
Latendresse, M., Malerich, J.P., Travers, M., Karp, P.D.: Accurate atom-mapping computation for biochemical reactions. J. Chem. Inf. Model 52, 2970–2982 (2012)
Article Google Scholar
Mann, M., Nahar, F., Schnorr, N., Backofen, R., Stadler, P.F., Flamm, C.: Atom mapping with constraint programming. Alg. Mol. Biol. 9, 23 (2014)
Article Google Scholar
Morgat, A., Axelsen, K.B., Lombardot, T., Alcntara, R., Aimo, L., Zerara, M., Niknejad, A., Belda, E., Hyka-Nouspikel, N., Coudert, E., Redaschi, N., Bougueleret, L., Steinbeck, C., Xenarios, I., Bridge, A.: Updates in rhea a manually curated resource of biochemical reactions. Nucleic Acids Res. 43(D1), 459–464 (2015)
Article Google Scholar
Prigent, S., Collet, G., Dittami, S.M., Delage, L., Ethis de Corny, F., Dameron, O., Eveillard, D., Thiele, S., Cambefort, J., Boyen, C., Siegel, A., Tonon, T.: The genome-scale metabolic network of Ectocarpus siliculosus (EctoGEM): a resource to study brown algal physiology and beyond. Plant J. 80(2), 367–381 (2014)
Article Google Scholar
Raymond, J.W., Willett, P.: Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comput. Aided Mol. Des. 16(7), 521–533 (2002)
Article Google Scholar
Riesen, K., Bunke, H.: Approximate graph edit distance computation by means of bipartite graph matching. Image Vis. Comput. 27(7), 950–959 (2009). 7th IAPR-TC15 Workshop on Graph-based Representations (GbR 2007)
Article Google Scholar
Schaub, T., Thiele, S.: Metabolic network expansion with answer set programming. In: Hill, P.M., Warren, D.S. (eds.) ICLP 2009. LNCS, vol. 5649, pp. 312–326. Springer, Heidelberg (2009)
Chapter Google Scholar
Veblen, O.: An application of modular equations in analysis situs. Ann. Math. 14, 86–94 (1912)
Article MathSciNet MATH Google Scholar
Warr, W.A.: A short review of chemical reaction database systems, computer-aided synthesis design, reaction prediction and synthetic feasibility. Mol. Inform. 33, 469–476 (2014)
Article Google Scholar
Wittig, U., Rey, M., Kania, R., Bittkowski, M., Shi, L., Golebiewski, M., Weidemann, A., Müller, W., Rojas, I.: Challenges for an enzymatic reaction kinetics database. FEBS J. 281, 572–582 (2014)
Article Google Scholar
Yadav, M.K., Kelley, B.P., Silverman, S.M.: The potential of a chemical graph transformation system. In: Ehrig, H., Engels, G., Parisi-Presicce, F., Rozenberg, G. (eds.) ICGT 2004. LNCS, vol. 3256, pp. 83–95. Springer, Heidelberg (2004)
Chapter Google Scholar
Yoder, R.A., Johnston, J.N.: A case study in biomimetic total synthesis: polyolefin carbocyclizations to terpenes and steroids. Chem. Rev. 105, 4730–4756 (2005)
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by the Volkswagen Stiftung proj. no. I/82719, and the COST-Action CM1304 “Systems Chemistry” and by the Danish Council for Independent Research, Natural Sciences.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Southern Denmark, DK-5230, Odense, Denmark
Daniel Merkle & Uffe Thorsen
Institute for Theoretical Chemistry, University of Vienna, 1090, Wien, Austria
Christoph Flamm & Peter F. Stadler
Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, 04107, Leipzig, Germany
Peter F. Stadler
Max Planck Institute for Mathematics in the Sciences, 04103, Leipzig, Germany
Peter F. Stadler
Fraunhofer Institute for Cell Therapy and Immunology, 04103, Leipzig, Germany
Peter F. Stadler
Center for Non-coding RNA in Technology and Health, University of Copenhagen, DK-1870, Frederiksberg, Denmark
Peter F. Stadler
Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe Nm, 87501, USA
Peter F. Stadler
Research Network Chemistry Meets Microbiology, University of Vienna, 1090, Wien, Austria
Christoph Flamm

Authors

Christoph Flamm
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Merkle
View author publications
You can also search for this author in PubMed Google Scholar
Peter F. Stadler
View author publications
You can also search for this author in PubMed Google Scholar
Uffe Thorsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Merkle .

Editor information

Editors and Affiliations

Laboratoire LIG, Université Grenoble Alpes, Grenoble, France
Rachid Echahed
Universität der Bundeswehr München, Neubiberg, Germany
Mark Minas

Appendices

A Statistical Analysis of Rhea

Of the \(M=3786\) non-isomorphic molecular graphs in RHEA, 2204 are identified uniquely by their sum formula. While 2030 of the molecules appear only in a minimum of 4 reactions, some compounds take part in a very large fraction of all reactions in RHEA, e.g., \({\mathrm{H}}^+\) participates in 11,1147 reactions, some of which are different descriptions of similar reactions where only the direction of the reaction differs, 5055 of these are truly distinct, adenosine di-, and tri-phosphate (and it derivatives), water, and dioxide each participate in more than 2000 reactions (depicted as red dots in Fig. 4 (right)). The maximum number of isomers (i.e., compounds that have the same sum formula but a non-isomorphic graph representation) is 63. The corresponding sum formula is \(\mathrm{C}_15\mathrm{H}_24\). Interestingly, most of the large sets of isomers in RHEA are terpenes, condensates of identical five carbon atom building blocks. The terpenes form a combinatorial class of polycyclic ring-systems via complex sequences of cyclisation and isomerization reactions. Figure 4 (left) summarizes the results (terpenes marked with red).

B Analysis of Runtime

As we are mainly interested in single step reactions, we restricted our algorithms to only look for connected, vertex-disjoint transition states during the comparison. Figure 5 shows the fraction of instances where AltCyc, ILP2 and a naïve ILP-implementation with \(O(n^4)\) constraints, ILP4, are able to enumerate all non-equivalent atom-atom mappings for different instance size categories as well as absolute number of instances solved divided by solution size.

Only very few instances that are not completely solved within the first 60 s are solved within reasonable time (one hour). So there seems to be a sharp divide between easy and hard instances. From the plot in Fig. 5 (left) of the fraction of instances solved fast we observe an exponential decline in ratio of solved instances. This corresponds well with the expected exponential runtime of the algorithms.

As we restricted the solution set certain instances are proven infeasible by ILP2, while AltCyc will continue searching for solutions until the parameter k, the number of weight changes, gets arbitrarily high. We chose to deem instances where AltCyc found no solutions for \(k\le 10\) infeasible and terminate the search. These two classes of solutions are marked in the rightmost column in Fig. 5. Note that the performance of AltCyc on the infeasible class of instances depends heavily on the somewhat arbitrary choice of maximum k.

Both ILP models are implemented using CPLEX, an efficient state of the art MIP-solver. AltCyc and ILP2 has been tested on a total of 4295 Rhea instances, while ILP4 has only been tested on a subset of these of size 250.

C Algorithmic Details

For completeness we include pseudo-code for the sub-procedures used in the paper.

Pseudo-code for WeightAlongPath: In AltCyc \(^*\) (see Algorithm 2) we need to find all previous changes to an edge \(\{i,j\}\) currently under examination, \(w_P(\{i,j\})\).

In Algorithm 3 we show how to do this in time O(|P|), where \(|P|\in O(k)\). It is possible to find \(w_P(e)\) in constant time, but this would require much more complicated data structures or making changes to the graphs we work on and as k is in practice very small, this method is preferred.

To find \(w_P(e)\) for a list of paths, add \(w_P(e)\) for all paths in the list.

Pseudo-code for Complete : When a transition state candidate \(\psi '\) is found we need to ensure it can be extended into a complete atom-atom mapping. This can be done as described in Algorithm 4. Note that the two graphs \(G_1\) and \(G_2\) are assumed implicitly known. The algorithm works both for a single path, P, or where P represents a list of paths.

The only non-trivial detail in Algorithm 4 is that it is not correct to remove all edges in the induced subgraph on the domain of \(\psi '\), the weight change needs to be sufficient, and there may be unchanged cords to consider.

Finding 2-to-2 Candidates in \(O(n^2\log n)\) Comparisons. In order to generate all \(O(n^4)\) candidate reactions with no more than two molecules in the educts or products we use Algorithm 5. A set of molecules, M, is given, as well as a method to obtain the distribution of atoms and charges of the molecules h, in practice some implementation of sparse vectors. We assume we keep pointers to the original molecules that resulted in each distribution, and we get these with the function mol.

The algorithm is dominated by one of two things, either the sorting of the length \(n^2\) array H (where \(n = |M|\)), or the time to output candidates \(k\in O(n^4)\), the resulting runtime is then \(O(n^2\log n + k)\).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Flamm, C., Merkle, D., Stadler, P.F., Thorsen, U. (2016). Automatic Inference of Graph Transformation Rules Using the Cyclic Nature of Chemical Reactions. In: Echahed, R., Minas, M. (eds) Graph Transformation. ICGT 2016. Lecture Notes in Computer Science(), vol 9761. Springer, Cham. https://doi.org/10.1007/978-3-319-40530-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-40530-8_13
Published: 22 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40529-2
Online ISBN: 978-3-319-40530-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics