On structured output training: hard cases and an efficient alternative

Gärtner, Thomas; Vembu, Shankar

doi:10.1007/s10994-009-5129-3

On structured output training: hard cases and an efficient alternative

Published: 23 July 2009

Volume 76, pages 227–242, (2009)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

On structured output training: hard cases and an efficient alternative

Download PDF

Thomas Gärtner¹ &
Shankar Vembu¹

539 Accesses
14 Citations
Explore all metrics

Abstract

We consider a class of structured prediction problems for which the assumptions made by state-of-the-art algorithms fail. To deal with exponentially sized output sets, these algorithms assume, for instance, that the best output for a given input can be found efficiently. While this holds for many important real world problems, there are also many relevant and seemingly simple problems where these assumptions do not hold. In this paper, we consider route prediction, which is the problem of finding a cyclic permutation of some points of interest, as an example and show that state-of-the-art approaches cannot guarantee polynomial runtime for this output set. We then present a novel formulation of the learning problem that can be trained efficiently whenever a particular ‘super-structure counting’ problem can be solved efficiently for the output set. We also list several output sets for which this assumption holds and report experimental results.

References

Altun, Y., Hofmann, T., & Johnson, M. (2002). Discriminative learning for label sequences via boosting. In Advances in neural information processing systems (Vol. 15).
Bakir, G., Zien, A., & Tsuda, K. (2004). Learning to find graph pre-images. In Proceedings of the annual symposium of the German association for pattern recognition (DAGM).
Björklund, A., Husfeldt, T., & Khanna, S. (2004). Approximating longest directed paths and cycles. In Proceedings of the international colloquium on automata, languages and programming.
Cesa-Bianchi, N., Gentile, C., Tironi, A., & Zaniboni, L. (2004). Incremental algorithms for hierarchical classification. In Advances in Neural Information Processing Systems (Vol. 17).
Collins, M. (2002). Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Proceedings of the conference on empirical methods in natural language processing.
Cortes, C., Mohri, M., & Weston, J. (2005). A general regression technique for learning transductions. In Proceedings of the international conference on machine learning.
Daumé, H. III, & Marcu, D. (2005). Learning as search optimization: approximate large margin methods for structured prediction. In Proceedings of the international conference on machine learning.
Elisseeff, A., & Weston, J. (2001). A kernel method for multi-labelled classification. In Advances in neural information processing systems (Vol. 14).
Erhan, D., L’heureux, P. J., Yue, S. Y., & Bengio, Y. (2006). Collaborative filtering on a family of biological targets. Journal of Chemical Information and Modeling, 46(2), 626–635.
Article Google Scholar
Finley, T., & Joachims, T. (2008). Training structural SVMs when exact inference is intractable. In Proceedings of the international conference on machine learning.
Froehlich, J., & Krumm, J. (2008). Route prediction from trip observations. In Society of automotive engineers world congress.
Gärtner, T., & Vembu, S. (2008). Learning to predict combinatorial structures. Unpublished manuscript.
Guruswami, V., Manokaran, R., & Raghavendra, P. (2008). Beating the random ordering is hard: Inapproximability of maximum acyclic subgraph. In Proceedings of the annual IEEE symposium on foundations of computer science.
Jacob, L., & Vert, J. P. (2008). Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics, 24(19), 2149–2156.
Article Google Scholar
Jerrum, M., & Sinclair, A. (1996). The Markov chain Monte Carlo method: an approach to approximate counting and integration. In D. S. Hochbaum (Ed.), Approximation algorithms for NP-hard problems (pp. 482–520). Boston: PWS-Kent.
Google Scholar
Kalai, A. T., & Vempala, S. (2006). Simulated annealing for convex optimization. Mathematics of Operations Research, 31(2), 253–266.
Article MATH MathSciNet Google Scholar
Korte, B., & Vygen, J. (2008). Algorithms and combinatorics: Vol. 21. Combinatorial optimization: theory and algorithms. Berlin: Springer.
Google Scholar
Papadimitriou, C. H. (1994). Computational complexity. Addison-Wesley: Reading
MATH Google Scholar
Randall, D. (2003). Mixing. In Proceedings of the annual IEEE symposium on foundations of computer sciences.
Ricci, E., Bie, T. D., & Cristianini, N. (2007). Discriminative sequence labeling by Z-score optimization. In Proceedings of the European conference on machine learning.
Ricci, E., Bie, T. D., & Cristianini, N. (2008). Magic moments for structured output prediction. Journal of Machine Learning Research, 9, 2803–2846.
Google Scholar
Rousu, J., Saunders, C., Szedmák, S., & Shawe-Taylor, J. (2005). Learning hierarchical multi-category text classification models. In Proceedings of the international conference on machine learning.
Schölkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Müller, K. R., Rätsch, G., & Smola, A. J. (1999). Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5), 1000–1017.
Article Google Scholar
Schölkopf, B., Herbrich, R., & Smola, A. J. (2001). A generalized representer theorem. In Proceedings of the annual conference on learning theory.
Suykens, J. (1999). Multiclass least squares support vector machines. In Proceedings of the international joint conference on neural networks.
Suykens, J., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3), 293–300.
Article MathSciNet Google Scholar
Taskar, B., Chatalbashev, V., Koller, D., & Guestrin, C. (2005). Learning structured prediction models: A large margin approach. In Proceedings of the international conference on machine learning.
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.
MathSciNet Google Scholar
Vembu, S., Gärtner, T., & Boley, M. (2009). Probabilistic structured predictors. In Proceedings of the annual conference on uncertainty in artificial intelligence.
Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., & Vapnik, V. (2002). Kernel dependency estimation. In Advances in neural information processing systems (Vol. 15).
Yannakakis, M. (1978). Node- and edge-deletion NP-complete problems. In Proceedings of the annual ACM symposium on theory of computing.

Download references

Author information

Authors and Affiliations

Fraunhofer IAIS, Schloß Birlinghoven, 53754, Sankt Augustin, Germany
Thomas Gärtner & Shankar Vembu

Authors

Thomas Gärtner
View author publications
You can also search for this author in PubMed Google Scholar
Shankar Vembu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shankar Vembu.

Additional information

Editors: Aleksander Kołcz, Dunja Mladenić, Wray Buntine, Marko Grobelnik, and John Shawe-Taylor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gärtner, T., Vembu, S. On structured output training: hard cases and an efficient alternative. Mach Learn 76, 227–242 (2009). https://doi.org/10.1007/s10994-009-5129-3

Download citation

Received: 12 June 2009
Revised: 12 June 2009
Accepted: 16 June 2009
Published: 23 July 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/s10994-009-5129-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On structured output training: hard cases and an efficient alternative

Abstract

Article PDF

Similar content being viewed by others

Exact Learning of Multitrees and Almost-Trees Using Path Queries

Learned Data Structures

Empirical Behavior of Bayesian Network Structure Learning Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On structured output training: hard cases and an efficient alternative

Abstract

Article PDF

Similar content being viewed by others

Exact Learning of Multitrees and Almost-Trees Using Path Queries

Learned Data Structures

Empirical Behavior of Bayesian Network Structure Learning Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation