The Relaxed Online Maximum Margin Algorithm

Li, Yi; Long, Philip M.

doi:10.1023/A:1012435301888

The Relaxed Online Maximum Margin Algorithm

Published: January 2002

Volume 46, pages 361–387, (2002)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

The Relaxed Online Maximum Margin Algorithm

Download PDF

Yi Li¹ &
Philip M. Long²

1025 Accesses
102 Citations
Explore all metrics

Abstract

We describe a new incremental algorithm for training linear threshold functions: the Relaxed Online Maximum Margin Algorithm, or ROMMA. ROMMA can be viewed as an approximation to the algorithm that repeatedly chooses the hyperplane that classifies previously seen examples correctly with the maximum margin. It is known that such a maximum-margin hypothesis can be computed by minimizing the length of the weight vector subject to a number of linear constraints. ROMMA works by maintaining a relatively simple relaxation of these constraints that can be efficiently updated. We prove a mistake bound for ROMMA that is the same as that proved for the perceptron algorithm. Our analysis implies that the maximum-margin algorithm also satisfies this mistake bound; this is the first worst-case performance guarantee for this algorithm. We describe some experiments using ROMMA and a variant that updates its hypothesis more aggressively as batch algorithms to recognize handwritten digits. The computational complexity and simplicity of these algorithms is similar to that of perceptron algorithm, but their generalization is much better. We show that a batch algorithm based on aggressive ROMMA converges to the fixed threshold SVM hypothesis.

References

Aizerman, M. A., Braverman, E. M., & Rozonoer, L. I. (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25, 821-837.
Google Scholar
Anthony, M. & Bartlett, P. L. (1999). Neural network learning: Theoretical foundations. Cambridge, UK: Cambridge University Press.
Google Scholar
Block, H. D. (1962). The perceptron: A model for brain functioning. Reviews of Modern Physics, 34, 123-135.
Google Scholar
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Workshop on Computational Learning Theory (pp. 144-152).
Burges, C. & Crisp, D. J. (1999). Uniqueness of the SVM solution. In Advances in neural information processing systems, 12.
Campbell, C. & Cristianini, N. (1998). Simple learning algorithms for training support vector machines. Technical report, University of Bristol.
Chapelle, O. & Vapnik,V. (1999). Model selection for support vector machines. In Advances in Neural Information Processing Systems.
Cortes, C. & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20:3, 273-297.
Google Scholar
Cristianini, N. & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge, UK: Cambridge University Press.
Google Scholar
Fletcher, R. (1987). Practical methods of optimization. (2nd edn.). New York: John Wiley and Sons.
Google Scholar
Freund, Y. & Schapire, R. E. (1998). Large margin classification using the perceptron algorithm. In Proceedings of the Eleventh Conference on Computational Learning Theory (pp. 209-217).
Friedman, J. H. (1996). Another approach to polychotomous classification. Technical report, Department of Statistics, Stanford, CA: Stanford University.
Google Scholar
Friess, T. T., Cristianini, N., & Campbell, C. (1998). The kernel adatron algorithm: A fast and simple learning procedure for support vector machines. In Proceedings of the Fifteenth International Conference on Machine Learning.
Gallant, S. I. (1986). Optimal linear discriminants. In Proceedings of the Eighth International Conference on Pattern Recognition. Paris, France (pp. 849-852).
Gilbert, E. G. (1996). Minimizing the quadratic form on a convex set. SIAM J. Control, 4, 61-79.
Google Scholar
Guo, Y., Bartlett, P. L., Shawe-Taylor, J., & Williamson, R. (1999). Covering numbers for support vector machines. In Proceedings of the 1999 Conference on Computational Learning Theory (pp. 267-277.)
Helmbold, D. & Warmuth, M. K. (1995). On weak learning. Journal of Computer and System Sciences, 50, 551-573.
Google Scholar
Hertz, J. A., Krogh, A., & Palmer, R. (1991). Introduction to the theory of neural computation. Redwood, CA: Addison-Wesley.
Google Scholar
Joachims, T. (1998). Making large-scale support vector machines learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.). Advances in kernel methods: Support vector machines (pp. 169-184).
Kaufman, L. (1998). Solving the quardratic programming problem arising in support vector classification. In B. Sch¨olkopf, C. Burges, & A. Smola (Eds.). Advances in kernel methods: Support vector machines.
Kearns, M., Li, M., Pitt, L., & Valiant, L. G. (1987). On the learnability of Boolean formulae. In Proceedings of the 19th Annual Symposium on the Theory of Computation (pp. 285-295).
Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (1999).Afast iterative nearest point algorithm for support vector machine classifier design. Technical report, Indian Institute of Science. TR-ISL-99-03.
Klasner, N. & Simon, H. U. (1995). From noise-free to noise-tolerant and from on-line to batch learning. In Proceedings of the 1995 Conference on Computational Learning Theory (pp. 250-257).
Knerr, S., Personnaz, L., & Dreyfus, G. (1990). Single-layer learning revisited: A stepwise procedure for building and training a neural network. In Fogelman-Soulie & Herault (Eds.). Neurocomputing: Algorithms, architectures and applications. NATO ASI: Springer.
Google Scholar
Kowalczyk, A. (1999). Maximal margin perceptron. In A. Smola, P. Bartlett, B. Schölkopf, & O. Schuurmans (Eds.). Advances in large margin classifiers. Cambridge, MA: MIT Press.
Google Scholar
LeCun, Y., Jackel, L., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker, H., Guyon, I., Muller, U., Sackinger, E., Simard, P., & Vapnik, V. (1995). Comparison of learning algorithms for handwritten digit recognition. In Proceedings of the Fifth International Conference on Artificial Neural Networks (pp. 53-60).
Li, Y. (2000). Selective voting for perceptron-like online learning. In Proceedings of the 17th International Conference on Machine Learning (pp. 559-566).
Littlestone, N. (1998). Learning quickly when irrelevant attributes abound: A new lenear-threshold algorithm. Machine Learning, 2, 285-318.
Google Scholar
Littlestone, N. (1989a). From on-line to batch learning. In Proceedings of the SecondWorkshop on Computational Learning Theory (pp. 269-284).
Littlestone, N. (1989b). Mistake-bounds and logarithmic linear-threshold learning algorithms. Ph.D. thesis, UC Santa Cruz.
Minsky, M. & Papert, S. (1969). expanded edition 1988, Perceptrons. Cambridge, MA: MIT Press.
Google Scholar
Mitchell, B. F., Dem'yanov, V. F., & Malozemov, V. N. (1974). Finding the point of a polyhedron closet to the origin. SIAM J. Control, 12, 19-26.
Google Scholar
Novikoff, A. B. J. (1962). On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata (pp. 615-622).
Opper, M. & Winther, O. (1999). Gaussian processes and SVM: Mean field results and leave-one-out. In Smola, Bartlett, Schölkopf, & Schuurmans (Eds.). Advances in large margin Classifiers. Cambridge, MA: MIT Press
Google Scholar
Osuna, E., Freund R., & Girosi, F. (1997). An improved training algorithm for support vector machines. In J. Principle, L. Gile, N. Margan, & E. Wilson (Eds.). Neural networks for signal processing VII-Proceedings of the 1997 IEEE workshop (pp. 276-285).
Platt, J. C. (1998). Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. Burges, & A. Smola (Eds.). Advances in kernel methods: Support vector machines. Cambridge, MA: MIT Press.
Google Scholar
Platt, J., Cristianini, N., & Shawe-Taylor, J. (1999). Large margin DAGs for multiclass classification. In Advances in Neural Information Processing Systems, 12.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386-407.
Google Scholar
Rosenblatt, F. (1962). Principles of neurodynamics: Perceptrons and the theory of brain mechanisms.Washington, D. C.: Spartan Books.
Google Scholar
Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the Margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26:5, 1651-1686.
Google Scholar
Shawe-Taylor, J., Bartlett, P., Williamson, R., & Ony, M. A. (1998). Structural risk minimization over datadependent hierarchies. IEEE Transactions on Information Theory, 44:5, 1926-1940.
Google Scholar
Smola, A., Óvári, Z., & Williamson, R. (2000). Regularization with dot-product kernels. submitted to NIPS00.
Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
Google Scholar
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
Google Scholar
Wahba, G. (1999). Support vector machines, reproducing kernel hilbert spaces and the randomized GACV. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.). Advances in kernel methods-Support vector learning (pp.69-88). Cambridge, MA: MIT Press.
Google Scholar
Williams, C. K. I (1998). Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In M. I. Jordan (Ed.). Learning and inference in graphical models. Dordrecht: Kluwer.
Google Scholar
Williamson, R. C., Smola, A., & Scholkpof, B. (1998). Generalization bounds for regularization networks and support vector machines via entropy numbers of compact operators. IEEE Transactions on Information Theory.

Download references

Author information

Authors and Affiliations

Department of Engineering Mathematics, University of Bristol, Queen's Building, Bristol, BS8 1TR, UK
Yi Li
Department of Computer Science, National University of Singapore, Singapore, 117543, Republic of Singapore
Philip M. Long

Authors

Yi Li
View author publications
You can also search for this author in PubMed Google Scholar
Philip M. Long
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Long, P.M. The Relaxed Online Maximum Margin Algorithm. Machine Learning 46, 361–387 (2002). https://doi.org/10.1023/A:1012435301888

Download citation

Issue Date: January 2002
DOI: https://doi.org/10.1023/A:1012435301888

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Relaxed Online Maximum Margin Algorithm

Abstract

Article PDF

Similar content being viewed by others

The Stochastic Gradient Descent for the Primal L1-SVM Optimization Revisited

Incremental maximum margin clustering

Fast Hyperparameter Tuning for Support Vector Machines with Stochastic Gradient Descent

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

The Relaxed Online Maximum Margin Algorithm

Abstract

Article PDF

Similar content being viewed by others

The Stochastic Gradient Descent for the Primal L1-SVM Optimization Revisited

Incremental maximum margin clustering

Fast Hyperparameter Tuning for Support Vector Machines with Stochastic Gradient Descent

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation