Skip to main content

Automatic Inference of Graph Transformation Rules Using the Cyclic Nature of Chemical Reactions

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9761))

Abstract

Graph transformation systems have the potential to be realistic models of chemistry, provided a comprehensive collection of reaction rules can be extracted from the body of chemical knowledge. A first key step for rule learning is the computation of atom-atom mappings, i.e., the atom-wise correspondence between products and educts of all published chemical reactions. This can be phrased as a maximum common edge subgraph problem with the constraint that transition states must have cyclic structure. We describe a search tree method well suited for small edit distance and an integer linear program best suited for general instances and demonstrate that it is feasible to compute atom-atom maps at large scales using a manually curated database of biochemical reactions as an example. In this context we address the network completion problem.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Akutsu, T.: Efficient extraction of mapping rules of atoms from enzymatic reaction data. J. Comp. Biol. 11, 449–462 (2004)

    Article  Google Scholar 

  2. Akutsu, T., Tamura, T.: A polynomial-time algorithm for computing the maximum common connected edge subgraph of outerplanar graphs of bounded degree. Algorithms 6(1), 119 (2013)

    Article  MathSciNet  Google Scholar 

  3. Andersen, J.L., Flamm, C., Merkle, D., Stadler, P.F.: 50 shades of rule composition. In: Fages, F., Piazza, C. (eds.) FMMB 2014. LNCS, vol. 8738, pp. 117–135. Springer, Heidelberg (2014)

    Google Scholar 

  4. Bahiense, L., Mani, G., Piva, B., de Souza, C.C.: The maximum common edge subgraph problem: a polyhedral investigation. Discrete Appl. Math. 160(18), 2523–2541 (2012). v Latin American Algorithms, Graphs, and Optimization Symposium Gramado, Brazil, 2009

    Article  MathSciNet  MATH  Google Scholar 

  5. Benkö, G., Flamm, C., Stadler, P.F.: A graph-based toy model of chemistry. J. Chem. Inf. Comput. Sci. 43, 1085–1093 (2003). presented at MCC 2002, Dubrovnik CRO, June 2002; SFI # 02–09-045

    Article  Google Scholar 

  6. Biggs, M.B., Papin, J.A.: Metabolic network-guided binning of metagenomic sequence fragments. Bioinformatics (2015)

    Google Scholar 

  7. Breitling, R., Vitkup, D., Barrett, M.P.: New surveyor tools for charting microbial metabolic maps. Nat. Rev. Microbiol. 6, 156–161 (2008)

    Article  Google Scholar 

  8. Burkard, R., ela, E., Pardalos, P., Pitsoulis, L.: The quadratic assignment problem. In: Du, D.Z., Pardalos, P. (eds.) Handbook of Combinatorial Optimization, pp. 1713–1809. Springer, US (1999)

    Google Scholar 

  9. Chen, W.L., Chen, D.Z., Taylor, K.T.: Automatic reaction mapping and reaction center detection. WIREs Comput. Mol. Sci. 3, 560–593 (2013)

    Article  Google Scholar 

  10. Cordella, L.P., Pasquale, F., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)

    Article  Google Scholar 

  11. Crabtree, J., Mehta, D., Kouri, T.: An open-source java platform for automated reaction mapping. J. Chem. Inf. Model 50, 1751–1756 (2010)

    Article  Google Scholar 

  12. Degenhardt, J., Köllner, T.G., Gershenzon, J.: Monoterpene and sesquiterpene synthases and the origin of terpene skeletal diversity in plants. Phytochem 70, 1621–1637 (2009)

    Article  Google Scholar 

  13. Ehrlich, H.C., Rarey, M.: Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review. WIREs Comput. Mol. Sci. 1, 68–79 (2011). doi:10.1002/wcms.5

    Article  Google Scholar 

  14. Feist, A.M., Herrgøard, M.J., Thiele, I., Reed, J.L., Palsson, B.Ø.: Reconstruction of biochemical networks in microorganisms. Nat. Rev. Microbiol. 7, 129–143 (2009)

    Article  Google Scholar 

  15. First, E.L., Gounaris, C.E., Floudas, C.A.: Stereochemically consistent reaction mapping and identification of multiple reaction mechanisms through integer linear optimization. J. Chem. Inf. Model 52, 84–92 (2012)

    Article  Google Scholar 

  16. Fujita, S.: Description of organic reactions based on imaginary transition structures. 1. Introduction of new concepts. J. Chem. Inf. Comput. Sci. 26, 205–212 (1986)

    Article  Google Scholar 

  17. Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113–129 (2010)

    Article  MathSciNet  Google Scholar 

  18. Hendrickson, J.B.: Comprehensive system for classification and nomenclature of organic reactions. J. Chem. Inf. Comput. Sci. 37, 852–860 (1997)

    Article  Google Scholar 

  19. Herges, R.: Organizing principle of complex reactions and theory of coarctate transition states. Angew. Chem. Int. Ed. 33, 255–276 (1994)

    Article  Google Scholar 

  20. Jeltsch, E., Kreowski, H.J.: Grammatical inference based on hyperedge replacement. In: Ehrig, H., Kreowski, H.-J., Rozenberg, G. (eds.) Graph Grammars 1990. LNCS, vol. 532, pp. 461–474. Springer, Heidelberg (1991)

    Chapter  Google Scholar 

  21. Justice, D., Hero, A.: A binary linear programming formulation of the graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1200–1214 (2006)

    Article  Google Scholar 

  22. Latendresse, M., Malerich, J.P., Travers, M., Karp, P.D.: Accurate atom-mapping computation for biochemical reactions. J. Chem. Inf. Model 52, 2970–2982 (2012)

    Article  Google Scholar 

  23. Mann, M., Nahar, F., Schnorr, N., Backofen, R., Stadler, P.F., Flamm, C.: Atom mapping with constraint programming. Alg. Mol. Biol. 9, 23 (2014)

    Article  Google Scholar 

  24. Morgat, A., Axelsen, K.B., Lombardot, T., Alcntara, R., Aimo, L., Zerara, M., Niknejad, A., Belda, E., Hyka-Nouspikel, N., Coudert, E., Redaschi, N., Bougueleret, L., Steinbeck, C., Xenarios, I., Bridge, A.: Updates in rhea a manually curated resource of biochemical reactions. Nucleic Acids Res. 43(D1), 459–464 (2015)

    Article  Google Scholar 

  25. Prigent, S., Collet, G., Dittami, S.M., Delage, L., Ethis de Corny, F., Dameron, O., Eveillard, D., Thiele, S., Cambefort, J., Boyen, C., Siegel, A., Tonon, T.: The genome-scale metabolic network of Ectocarpus siliculosus (EctoGEM): a resource to study brown algal physiology and beyond. Plant J. 80(2), 367–381 (2014)

    Article  Google Scholar 

  26. Raymond, J.W., Willett, P.: Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comput. Aided Mol. Des. 16(7), 521–533 (2002)

    Article  Google Scholar 

  27. Riesen, K., Bunke, H.: Approximate graph edit distance computation by means of bipartite graph matching. Image Vis. Comput. 27(7), 950–959 (2009). 7th IAPR-TC15 Workshop on Graph-based Representations (GbR 2007)

    Article  Google Scholar 

  28. Schaub, T., Thiele, S.: Metabolic network expansion with answer set programming. In: Hill, P.M., Warren, D.S. (eds.) ICLP 2009. LNCS, vol. 5649, pp. 312–326. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  29. Veblen, O.: An application of modular equations in analysis situs. Ann. Math. 14, 86–94 (1912)

    Article  MathSciNet  MATH  Google Scholar 

  30. Warr, W.A.: A short review of chemical reaction database systems, computer-aided synthesis design, reaction prediction and synthetic feasibility. Mol. Inform. 33, 469–476 (2014)

    Article  Google Scholar 

  31. Wittig, U., Rey, M., Kania, R., Bittkowski, M., Shi, L., Golebiewski, M., Weidemann, A., Müller, W., Rojas, I.: Challenges for an enzymatic reaction kinetics database. FEBS J. 281, 572–582 (2014)

    Article  Google Scholar 

  32. Yadav, M.K., Kelley, B.P., Silverman, S.M.: The potential of a chemical graph transformation system. In: Ehrig, H., Engels, G., Parisi-Presicce, F., Rozenberg, G. (eds.) ICGT 2004. LNCS, vol. 3256, pp. 83–95. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  33. Yoder, R.A., Johnston, J.N.: A case study in biomimetic total synthesis: polyolefin carbocyclizations to terpenes and steroids. Chem. Rev. 105, 4730–4756 (2005)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the Volkswagen Stiftung proj. no. I/82719, and the COST-Action CM1304 “Systems Chemistry” and by the Danish Council for Independent Research, Natural Sciences.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Merkle .

Editor information

Editors and Affiliations

Appendices

A Statistical Analysis of Rhea

Of the \(M=3786\) non-isomorphic molecular graphs in RHEA, 2204 are identified uniquely by their sum formula. While 2030 of the molecules appear only in a minimum of 4 reactions, some compounds take part in a very large fraction of all reactions in RHEA, e.g., \({\mathrm{H}}^+\) participates in 11,1147 reactions, some of which are different descriptions of similar reactions where only the direction of the reaction differs, 5055 of these are truly distinct, adenosine di-, and tri-phosphate (and it derivatives), water, and dioxide each participate in more than 2000 reactions (depicted as red dots in Fig. 4 (right)). The maximum number of isomers (i.e., compounds that have the same sum formula but a non-isomorphic graph representation) is 63. The corresponding sum formula is \(\mathrm{C}_15\mathrm{H}_24\). Interestingly, most of the large sets of isomers in RHEA are terpenes, condensates of identical five carbon atom building blocks. The terpenes form a combinatorial class of polycyclic ring-systems via complex sequences of cyclisation and isomerization reactions. Figure 4 (left) summarizes the results (terpenes marked with red).

Fig. 4.
figure 4

Distribution of isomers and frequency of participation in reactions in Rhea. Left plot shows a few sets of isomers are very large, while most compounds in Rhea are unique up to sum formula of those compounds. Right plot shows the frequency with which a compound participates in reactions. (Color figure online)

B Analysis of Runtime

As we are mainly interested in single step reactions, we restricted our algorithms to only look for connected, vertex-disjoint transition states during the comparison. Figure 5 shows the fraction of instances where AltCyc, ILP2 and a naïve ILP-implementation with \(O(n^4)\) constraints, ILP4, are able to enumerate all non-equivalent atom-atom mappings for different instance size categories as well as absolute number of instances solved divided by solution size.

Only very few instances that are not completely solved within the first 60 s are solved within reasonable time (one hour). So there seems to be a sharp divide between easy and hard instances. From the plot in Fig. 5 (left) of the fraction of instances solved fast we observe an exponential decline in ratio of solved instances. This corresponds well with the expected exponential runtime of the algorithms.

Fig. 5.
figure 5

Fraction and number of instances where all optimal atom-atom maps are found in 60 s (user time) by instance size and optimal solution cost for AltCyc (magenta), ILP2 (cyan) and ILP4 (gray). (Color figure online)

As we restricted the solution set certain instances are proven infeasible by ILP2, while AltCyc will continue searching for solutions until the parameter k, the number of weight changes, gets arbitrarily high. We chose to deem instances where AltCyc found no solutions for \(k\le 10\) infeasible and terminate the search. These two classes of solutions are marked in the rightmost column in Fig. 5. Note that the performance of AltCyc on the infeasible class of instances depends heavily on the somewhat arbitrary choice of maximum k.

Both ILP models are implemented using CPLEX, an efficient state of the art MIP-solver. AltCyc and ILP2 has been tested on a total of 4295 Rhea instances, while ILP4 has only been tested on a subset of these of size 250.

C Algorithmic Details

For completeness we include pseudo-code for the sub-procedures used in the paper.

Pseudo-code for WeightAlongPath: In AltCyc \(^*\) (see Algorithm 2) we need to find all previous changes to an edge \(\{i,j\}\) currently under examination, \(w_P(\{i,j\})\).

figure c

In Algorithm 3 we show how to do this in time O(|P|), where \(|P|\in O(k)\). It is possible to find \(w_P(e)\) in constant time, but this would require much more complicated data structures or making changes to the graphs we work on and as k is in practice very small, this method is preferred.

To find \(w_P(e)\) for a list of paths, add \(w_P(e)\) for all paths in the list.

Pseudo-code for Complete : When a transition state candidate \(\psi '\) is found we need to ensure it can be extended into a complete atom-atom mapping. This can be done as described in Algorithm 4. Note that the two graphs \(G_1\) and \(G_2\) are assumed implicitly known. The algorithm works both for a single path, P, or where P represents a list of paths.

The only non-trivial detail in Algorithm 4 is that it is not correct to remove all edges in the induced subgraph on the domain of \(\psi '\), the weight change needs to be sufficient, and there may be unchanged cords to consider.

figure d

Finding 2-to-2 Candidates in \(O(n^2\log n)\) Comparisons. In order to generate all \(O(n^4)\) candidate reactions with no more than two molecules in the educts or products we use Algorithm 5. A set of molecules, M, is given, as well as a method to obtain the distribution of atoms and charges of the molecules h, in practice some implementation of sparse vectors. We assume we keep pointers to the original molecules that resulted in each distribution, and we get these with the function mol.

figure e

The algorithm is dominated by one of two things, either the sorting of the length \(n^2\) array H (where \(n = |M|\)), or the time to output candidates \(k\in O(n^4)\), the resulting runtime is then \(O(n^2\log n + k)\).

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Flamm, C., Merkle, D., Stadler, P.F., Thorsen, U. (2016). Automatic Inference of Graph Transformation Rules Using the Cyclic Nature of Chemical Reactions. In: Echahed, R., Minas, M. (eds) Graph Transformation. ICGT 2016. Lecture Notes in Computer Science(), vol 9761. Springer, Cham. https://doi.org/10.1007/978-3-319-40530-8_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40530-8_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40529-2

  • Online ISBN: 978-3-319-40530-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics