Abstract
We consider a class of structured prediction problems for which the assumptions made by state-of-the-art algorithms fail. To deal with exponentially sized output sets, these algorithms assume, for instance, that the best output for a given input can be found efficiently. While this holds for many important real world problems, there are also many relevant and seemingly simple problems where these assumptions do not hold. In this paper, we consider route prediction, which is the problem of finding a cyclic permutation of some points of interest, as an example and show that state-of-the-art approaches cannot guarantee polynomial runtime for this output set. We then present a novel formulation of the learning problem that can be trained efficiently whenever a particular ‘super-structure counting’ problem can be solved efficiently for the output set. We also list several output sets for which this assumption holds and report experimental results.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Altun, Y., Hofmann, T., & Johnson, M. (2002). Discriminative learning for label sequences via boosting. In Advances in neural information processing systems (Vol. 15).
Bakir, G., Zien, A., & Tsuda, K. (2004). Learning to find graph pre-images. In Proceedings of the annual symposium of the German association for pattern recognition (DAGM).
Björklund, A., Husfeldt, T., & Khanna, S. (2004). Approximating longest directed paths and cycles. In Proceedings of the international colloquium on automata, languages and programming.
Cesa-Bianchi, N., Gentile, C., Tironi, A., & Zaniboni, L. (2004). Incremental algorithms for hierarchical classification. In Advances in Neural Information Processing Systems (Vol. 17).
Collins, M. (2002). Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In Proceedings of the conference on empirical methods in natural language processing.
Cortes, C., Mohri, M., & Weston, J. (2005). A general regression technique for learning transductions. In Proceedings of the international conference on machine learning.
Daumé, H. III, & Marcu, D. (2005). Learning as search optimization: approximate large margin methods for structured prediction. In Proceedings of the international conference on machine learning.
Elisseeff, A., & Weston, J. (2001). A kernel method for multi-labelled classification. In Advances in neural information processing systems (Vol. 14).
Erhan, D., L’heureux, P. J., Yue, S. Y., & Bengio, Y. (2006). Collaborative filtering on a family of biological targets. Journal of Chemical Information and Modeling, 46(2), 626–635.
Finley, T., & Joachims, T. (2008). Training structural SVMs when exact inference is intractable. In Proceedings of the international conference on machine learning.
Froehlich, J., & Krumm, J. (2008). Route prediction from trip observations. In Society of automotive engineers world congress.
Gärtner, T., & Vembu, S. (2008). Learning to predict combinatorial structures. Unpublished manuscript.
Guruswami, V., Manokaran, R., & Raghavendra, P. (2008). Beating the random ordering is hard: Inapproximability of maximum acyclic subgraph. In Proceedings of the annual IEEE symposium on foundations of computer science.
Jacob, L., & Vert, J. P. (2008). Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics, 24(19), 2149–2156.
Jerrum, M., & Sinclair, A. (1996). The Markov chain Monte Carlo method: an approach to approximate counting and integration. In D. S. Hochbaum (Ed.), Approximation algorithms for NP-hard problems (pp. 482–520). Boston: PWS-Kent.
Kalai, A. T., & Vempala, S. (2006). Simulated annealing for convex optimization. Mathematics of Operations Research, 31(2), 253–266.
Korte, B., & Vygen, J. (2008). Algorithms and combinatorics: Vol. 21. Combinatorial optimization: theory and algorithms. Berlin: Springer.
Papadimitriou, C. H. (1994). Computational complexity. Addison-Wesley: Reading
Randall, D. (2003). Mixing. In Proceedings of the annual IEEE symposium on foundations of computer sciences.
Ricci, E., Bie, T. D., & Cristianini, N. (2007). Discriminative sequence labeling by Z-score optimization. In Proceedings of the European conference on machine learning.
Ricci, E., Bie, T. D., & Cristianini, N. (2008). Magic moments for structured output prediction. Journal of Machine Learning Research, 9, 2803–2846.
Rousu, J., Saunders, C., Szedmák, S., & Shawe-Taylor, J. (2005). Learning hierarchical multi-category text classification models. In Proceedings of the international conference on machine learning.
Schölkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Müller, K. R., Rätsch, G., & Smola, A. J. (1999). Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5), 1000–1017.
Schölkopf, B., Herbrich, R., & Smola, A. J. (2001). A generalized representer theorem. In Proceedings of the annual conference on learning theory.
Suykens, J. (1999). Multiclass least squares support vector machines. In Proceedings of the international joint conference on neural networks.
Suykens, J., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9(3), 293–300.
Taskar, B., Chatalbashev, V., Koller, D., & Guestrin, C. (2005). Learning structured prediction models: A large margin approach. In Proceedings of the international conference on machine learning.
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.
Vembu, S., Gärtner, T., & Boley, M. (2009). Probabilistic structured predictors. In Proceedings of the annual conference on uncertainty in artificial intelligence.
Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B., & Vapnik, V. (2002). Kernel dependency estimation. In Advances in neural information processing systems (Vol. 15).
Yannakakis, M. (1978). Node- and edge-deletion NP-complete problems. In Proceedings of the annual ACM symposium on theory of computing.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Aleksander Kołcz, Dunja Mladenić, Wray Buntine, Marko Grobelnik, and John Shawe-Taylor.
Rights and permissions
About this article
Cite this article
Gärtner, T., Vembu, S. On structured output training: hard cases and an efficient alternative. Mach Learn 76, 227–242 (2009). https://doi.org/10.1007/s10994-009-5129-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5129-3