Abstract
Graph transformation systems have the potential to be realistic models of chemistry, provided a comprehensive collection of reaction rules can be extracted from the body of chemical knowledge. A first key step for rule learning is the computation of atom-atom mappings, i.e., the atom-wise correspondence between products and educts of all published chemical reactions. This can be phrased as a maximum common edge subgraph problem with the constraint that transition states must have cyclic structure. We describe a search tree method well suited for small edit distance and an integer linear program best suited for general instances and demonstrate that it is feasible to compute atom-atom maps at large scales using a manually curated database of biochemical reactions as an example. In this context we address the network completion problem.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Akutsu, T.: Efficient extraction of mapping rules of atoms from enzymatic reaction data. J. Comp. Biol. 11, 449–462 (2004)
Akutsu, T., Tamura, T.: A polynomial-time algorithm for computing the maximum common connected edge subgraph of outerplanar graphs of bounded degree. Algorithms 6(1), 119 (2013)
Andersen, J.L., Flamm, C., Merkle, D., Stadler, P.F.: 50 shades of rule composition. In: Fages, F., Piazza, C. (eds.) FMMB 2014. LNCS, vol. 8738, pp. 117–135. Springer, Heidelberg (2014)
Bahiense, L., Mani, G., Piva, B., de Souza, C.C.: The maximum common edge subgraph problem: a polyhedral investigation. Discrete Appl. Math. 160(18), 2523–2541 (2012). v Latin American Algorithms, Graphs, and Optimization Symposium Gramado, Brazil, 2009
Benkö, G., Flamm, C., Stadler, P.F.: A graph-based toy model of chemistry. J. Chem. Inf. Comput. Sci. 43, 1085–1093 (2003). presented at MCC 2002, Dubrovnik CRO, June 2002; SFI # 02–09-045
Biggs, M.B., Papin, J.A.: Metabolic network-guided binning of metagenomic sequence fragments. Bioinformatics (2015)
Breitling, R., Vitkup, D., Barrett, M.P.: New surveyor tools for charting microbial metabolic maps. Nat. Rev. Microbiol. 6, 156–161 (2008)
Burkard, R., ela, E., Pardalos, P., Pitsoulis, L.: The quadratic assignment problem. In: Du, D.Z., Pardalos, P. (eds.) Handbook of Combinatorial Optimization, pp. 1713–1809. Springer, US (1999)
Chen, W.L., Chen, D.Z., Taylor, K.T.: Automatic reaction mapping and reaction center detection. WIREs Comput. Mol. Sci. 3, 560–593 (2013)
Cordella, L.P., Pasquale, F., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1367–1372 (2004)
Crabtree, J., Mehta, D., Kouri, T.: An open-source java platform for automated reaction mapping. J. Chem. Inf. Model 50, 1751–1756 (2010)
Degenhardt, J., Köllner, T.G., Gershenzon, J.: Monoterpene and sesquiterpene synthases and the origin of terpene skeletal diversity in plants. Phytochem 70, 1621–1637 (2009)
Ehrlich, H.C., Rarey, M.: Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review. WIREs Comput. Mol. Sci. 1, 68–79 (2011). doi:10.1002/wcms.5
Feist, A.M., Herrgøard, M.J., Thiele, I., Reed, J.L., Palsson, B.Ø.: Reconstruction of biochemical networks in microorganisms. Nat. Rev. Microbiol. 7, 129–143 (2009)
First, E.L., Gounaris, C.E., Floudas, C.A.: Stereochemically consistent reaction mapping and identification of multiple reaction mechanisms through integer linear optimization. J. Chem. Inf. Model 52, 84–92 (2012)
Fujita, S.: Description of organic reactions based on imaginary transition structures. 1. Introduction of new concepts. J. Chem. Inf. Comput. Sci. 26, 205–212 (1986)
Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113–129 (2010)
Hendrickson, J.B.: Comprehensive system for classification and nomenclature of organic reactions. J. Chem. Inf. Comput. Sci. 37, 852–860 (1997)
Herges, R.: Organizing principle of complex reactions and theory of coarctate transition states. Angew. Chem. Int. Ed. 33, 255–276 (1994)
Jeltsch, E., Kreowski, H.J.: Grammatical inference based on hyperedge replacement. In: Ehrig, H., Kreowski, H.-J., Rozenberg, G. (eds.) Graph Grammars 1990. LNCS, vol. 532, pp. 461–474. Springer, Heidelberg (1991)
Justice, D., Hero, A.: A binary linear programming formulation of the graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 28(8), 1200–1214 (2006)
Latendresse, M., Malerich, J.P., Travers, M., Karp, P.D.: Accurate atom-mapping computation for biochemical reactions. J. Chem. Inf. Model 52, 2970–2982 (2012)
Mann, M., Nahar, F., Schnorr, N., Backofen, R., Stadler, P.F., Flamm, C.: Atom mapping with constraint programming. Alg. Mol. Biol. 9, 23 (2014)
Morgat, A., Axelsen, K.B., Lombardot, T., Alcntara, R., Aimo, L., Zerara, M., Niknejad, A., Belda, E., Hyka-Nouspikel, N., Coudert, E., Redaschi, N., Bougueleret, L., Steinbeck, C., Xenarios, I., Bridge, A.: Updates in rhea a manually curated resource of biochemical reactions. Nucleic Acids Res. 43(D1), 459–464 (2015)
Prigent, S., Collet, G., Dittami, S.M., Delage, L., Ethis de Corny, F., Dameron, O., Eveillard, D., Thiele, S., Cambefort, J., Boyen, C., Siegel, A., Tonon, T.: The genome-scale metabolic network of Ectocarpus siliculosus (EctoGEM): a resource to study brown algal physiology and beyond. Plant J. 80(2), 367–381 (2014)
Raymond, J.W., Willett, P.: Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comput. Aided Mol. Des. 16(7), 521–533 (2002)
Riesen, K., Bunke, H.: Approximate graph edit distance computation by means of bipartite graph matching. Image Vis. Comput. 27(7), 950–959 (2009). 7th IAPR-TC15 Workshop on Graph-based Representations (GbR 2007)
Schaub, T., Thiele, S.: Metabolic network expansion with answer set programming. In: Hill, P.M., Warren, D.S. (eds.) ICLP 2009. LNCS, vol. 5649, pp. 312–326. Springer, Heidelberg (2009)
Veblen, O.: An application of modular equations in analysis situs. Ann. Math. 14, 86–94 (1912)
Warr, W.A.: A short review of chemical reaction database systems, computer-aided synthesis design, reaction prediction and synthetic feasibility. Mol. Inform. 33, 469–476 (2014)
Wittig, U., Rey, M., Kania, R., Bittkowski, M., Shi, L., Golebiewski, M., Weidemann, A., Müller, W., Rojas, I.: Challenges for an enzymatic reaction kinetics database. FEBS J. 281, 572–582 (2014)
Yadav, M.K., Kelley, B.P., Silverman, S.M.: The potential of a chemical graph transformation system. In: Ehrig, H., Engels, G., Parisi-Presicce, F., Rozenberg, G. (eds.) ICGT 2004. LNCS, vol. 3256, pp. 83–95. Springer, Heidelberg (2004)
Yoder, R.A., Johnston, J.N.: A case study in biomimetic total synthesis: polyolefin carbocyclizations to terpenes and steroids. Chem. Rev. 105, 4730–4756 (2005)
Acknowledgments
This work was supported in part by the Volkswagen Stiftung proj. no. I/82719, and the COST-Action CM1304 “Systems Chemistry” and by the Danish Council for Independent Research, Natural Sciences.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Statistical Analysis of Rhea
Of the \(M=3786\) non-isomorphic molecular graphs in RHEA, 2204 are identified uniquely by their sum formula. While 2030 of the molecules appear only in a minimum of 4 reactions, some compounds take part in a very large fraction of all reactions in RHEA, e.g., \({\mathrm{H}}^+\) participates in 11,1147 reactions, some of which are different descriptions of similar reactions where only the direction of the reaction differs, 5055 of these are truly distinct, adenosine di-, and tri-phosphate (and it derivatives), water, and dioxide each participate in more than 2000 reactions (depicted as red dots in Fig. 4 (right)). The maximum number of isomers (i.e., compounds that have the same sum formula but a non-isomorphic graph representation) is 63. The corresponding sum formula is \(\mathrm{C}_15\mathrm{H}_24\). Interestingly, most of the large sets of isomers in RHEA are terpenes, condensates of identical five carbon atom building blocks. The terpenes form a combinatorial class of polycyclic ring-systems via complex sequences of cyclisation and isomerization reactions. Figure 4 (left) summarizes the results (terpenes marked with red).
B Analysis of Runtime
As we are mainly interested in single step reactions, we restricted our algorithms to only look for connected, vertex-disjoint transition states during the comparison. Figure 5 shows the fraction of instances where AltCyc, ILP2 and a naïve ILP-implementation with \(O(n^4)\) constraints, ILP4, are able to enumerate all non-equivalent atom-atom mappings for different instance size categories as well as absolute number of instances solved divided by solution size.
Only very few instances that are not completely solved within the first 60 s are solved within reasonable time (one hour). So there seems to be a sharp divide between easy and hard instances. From the plot in Fig. 5 (left) of the fraction of instances solved fast we observe an exponential decline in ratio of solved instances. This corresponds well with the expected exponential runtime of the algorithms.
As we restricted the solution set certain instances are proven infeasible by ILP2, while AltCyc will continue searching for solutions until the parameter k, the number of weight changes, gets arbitrarily high. We chose to deem instances where AltCyc found no solutions for \(k\le 10\) infeasible and terminate the search. These two classes of solutions are marked in the rightmost column in Fig. 5. Note that the performance of AltCyc on the infeasible class of instances depends heavily on the somewhat arbitrary choice of maximum k.
Both ILP models are implemented using CPLEX, an efficient state of the art MIP-solver. AltCyc and ILP2 has been tested on a total of 4295 Rhea instances, while ILP4 has only been tested on a subset of these of size 250.
C Algorithmic Details
For completeness we include pseudo-code for the sub-procedures used in the paper.
Pseudo-code for WeightAlongPath: In AltCyc \(^*\) (see Algorithm 2) we need to find all previous changes to an edge \(\{i,j\}\) currently under examination, \(w_P(\{i,j\})\).
In Algorithm 3 we show how to do this in time O(|P|), where \(|P|\in O(k)\). It is possible to find \(w_P(e)\) in constant time, but this would require much more complicated data structures or making changes to the graphs we work on and as k is in practice very small, this method is preferred.
To find \(w_P(e)\) for a list of paths, add \(w_P(e)\) for all paths in the list.
Pseudo-code for Complete : When a transition state candidate \(\psi '\) is found we need to ensure it can be extended into a complete atom-atom mapping. This can be done as described in Algorithm 4. Note that the two graphs \(G_1\) and \(G_2\) are assumed implicitly known. The algorithm works both for a single path, P, or where P represents a list of paths.
The only non-trivial detail in Algorithm 4 is that it is not correct to remove all edges in the induced subgraph on the domain of \(\psi '\), the weight change needs to be sufficient, and there may be unchanged cords to consider.
Finding 2-to-2 Candidates in \(O(n^2\log n)\) Comparisons. In order to generate all \(O(n^4)\) candidate reactions with no more than two molecules in the educts or products we use Algorithm 5. A set of molecules, M, is given, as well as a method to obtain the distribution of atoms and charges of the molecules h, in practice some implementation of sparse vectors. We assume we keep pointers to the original molecules that resulted in each distribution, and we get these with the function mol.
The algorithm is dominated by one of two things, either the sorting of the length \(n^2\) array H (where \(n = |M|\)), or the time to output candidates \(k\in O(n^4)\), the resulting runtime is then \(O(n^2\log n + k)\).
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Flamm, C., Merkle, D., Stadler, P.F., Thorsen, U. (2016). Automatic Inference of Graph Transformation Rules Using the Cyclic Nature of Chemical Reactions. In: Echahed, R., Minas, M. (eds) Graph Transformation. ICGT 2016. Lecture Notes in Computer Science(), vol 9761. Springer, Cham. https://doi.org/10.1007/978-3-319-40530-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-40530-8_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40529-2
Online ISBN: 978-3-319-40530-8
eBook Packages: Computer ScienceComputer Science (R0)