ABSTRACT
This paper investigates a simple, yet effective method for regression on graphs, in particular for applications in chem-informatics and for quantitative structure-activity relationships (QSARs). The method combines Locally Weighted Learning (LWL) with Maximum Common Subgraph (MCS) based graph distances. More specifically, we investigate a variant of locally weighted regression on graphs (structures) that uses the maximum common subgraph for determining and weighting the neighborhood of a graph and feature vectors for the actual regression model. We show that this combination, LWL-MCS, outperforms other methods that use the local neighborhood of graphs for regression. The performance of this method on graphs suggests it might be useful for other types of structured data as well.
- E. Alphonse, T. Girschick, F. Buchwald, and S. Kramer. A numerical refinement operator based on multi-instance learning. In Proceedings of the 20th International Conference on Inductive Logic Programming, pages 14--21. Springer, 2011. Google ScholarDigital Library
- C. Atkeson, A. Moore, and S. Schaal. Locally weighted learning. AI Review, 11: 11--73, 1997. Google ScholarDigital Library
- S. Bickel and T. Scheffer. Multi-view clustering. In Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM '04, pages 19--26, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
- F. Buchwald, T. Girschick, M. Seeland, and S. Kramer. Using local models to improve (Q)SAR predictivity. Molecular Informatics, 30(2--3): 205--218, 2011.Google Scholar
- Y. Cao, T. Jiang, and T. Girke. A maximum common substructure-based algorithm for searching and predicting drug-like compounds. Bioinformatics, 24(13): i366--i374, 2008. Google ScholarDigital Library
- W. Cleveland. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74: 829--836, 1979.Google ScholarCross Ref
- D. Conte, P. Foggia, C. Sansone, and M. Vento. Thirty years of graph matching in pattern recognition. International Journal of Pattern Recognition and Artificial Intelligence, pages 265--298, 2004.Google ScholarCross Ref
- L. De Raedt. Logical and Relational Learning. Springer, 2008. Google ScholarDigital Library
- T. Gärtner. Kernels for Structured Data. PhD thesis, Universität Bonn, 2005.Google Scholar
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: an update. SIGKDD Explorations, 11(1): 10--18, 2009. Google ScholarDigital Library
- M. Kloft, U. Rückert, and P. L. Bartlett. A unifying view of multiple kernel learning. In Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases, pages 66--81, 2010. Google ScholarDigital Library
- Y. C. Martin, J. L. Kofron, and L. M. Traphagen. Do structurally similar molecules have similar biological activity? Journal of Medicinal Chemistry, 45(19): 4350--4358, 2002.Google ScholarCross Ref
- U. Rückert, T. Girschick, F. Buchwald, and S. Kramer. Adapted transfer of distance measures for quantitative structure-activity relationships. In Proceedings of the 13th International Conference on Discovery Science, DS'10, pages 341--355, 2010. Google ScholarDigital Library
- S. Rüping. Globalization of local models with SVMs. In LeGo-08 - From Local Patterns to Global Models, Workshop at ECML/PKDD, 2008.Google Scholar
- L. Schietgat, F. Costa, J. Ramon, and L. De Raedt. Effective feature construction by maximum common subgraph sampling. Machine Learning, 83: 137--161, 2011. Google ScholarDigital Library
- M. Seeland, S. A. Berger, A. Stamatakis, and S. Kramer. Parallel structural graph clustering. In Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases, ECML PKDD '11, pages 256--272, 2011. Google ScholarDigital Library
- K. Tsuda. Support vector classifier with asymmetric kernel functions. In Proceedings of the Seventh European Symposium on Artificial Neural Networks, ESANN'99, pages 183--188, 1999.Google Scholar
- W. D. Wallis, P. Shoubridge, M. Kraetz, and D. Ray. Graph distances using graph union. Pattern Recognition Letters, 22: 701--704, 2001. Google ScholarDigital Library
Recommendations
Approximating the maximum common subgraph isomorphism problem with a weighted graph
The maximum common subgraph isomorphism problem is a difficult graph problem, and the problem of finding the maximum common subgraph isomorphism problem is NP-hard. This means there is likely no algorithm that will be able to find the maximal isomorphic ...
Mean and maximum common subgraph of two graphs
A mean of a pair of graphs, g1 and g2, is formally defined as a graph that minimizes the sum of edit distances to g1 and g2. The edit distance of two graphs g and g' is the minimum cost taken over all sequences of edit operations that transform g into g'...
Using locally weighted learning to improve SMOreg for regression
PRICAI'06: Proceedings of the 9th Pacific Rim international conference on Artificial intelligenceShevade et al.[1] are successful in extending some improved ideas to Smola and Scholkopf's SMO algorithm[2] for solving regression problems, simply named SMOreg. In this paper, we use SMOreg in exactly the same way as linear regression(LR) is used in ...
Comments