Keywords and Synonyms
Learning with irrelevant attributes
Problem Definition
Given here is a basic formulation using the online mistake bound model, which was used by Littlestone [9] in his seminal work.
Fix a class C of Boolean functions over n variables. To start a learning scenario, a target function \( { f_\ast\in C } \) is chosen but not revealed to the learning algorithm. Learning then proceeds in a sequence of trials. At trial t, an input \( { \boldsymbol{x}_t \in \{0,1 \}^n } \) is first given to the learning algorithm. The learning algorithm then produces its prediction \( { \hat{y}_t } \), which is its guess as to the unknown value \( { f_\ast(x_t) } \). The correct value \( { y_t=f_\ast(x_t) } \) is then revealed to the learner. If \( { y_t\neq\hat{y}_t } \), the learning algorithm made a mistake. The learning algorithm learns C with mistake bound m, if the number of mistakes never exceeds m, no matter how many trials are made and how f ∗ and \( {...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Auer, P., Warmuth, M.K.: Tracking the best disjunction. Mach. Learn. 32(2), 127–150 (1998)
Blum, A., Hellerstein, L., Littlestone, N.: Learning in the presence of finitely or infinitely many irrelevant attributes. J. Comp. Syst. Sci. 50(1), 32–40 (1995)
Bshouty, N., Hellerstein, L.: Attribute‐efficient learning in query and mistake-bound models. J. Comp. Syst. Sci. 56(3), 310–319 (1998)
Dhagat, A., Hellerstein, L.: PAC learning with irrelevant attributes. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, pp 64–74. IEEE Computer Society, Los Alamitos (1994)
Gentile, C., Warmuth, M.K.: Linear hinge loss and average margin. In: Kearns, M.J., Solla, S.A., Cohn, D.A. (eds.) Advances in neural information processing systems 11, p. 225–231. MIT Press, Cambridge (1999)
Khardon, R., Roth, D., Servedio, R.A.: Efficiency versus convergence of boolean kernels for on-line learning algorithms. J. Artif. Intell. Res. 24, 341–356 (2005)
Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gradient descent for linear predictors. Inf. Comp. 132(1), 1–64 (1997)
Klivans, A.R. Servedio, R.A.: Toward attribute efficient learning of decision lists and parities. J. Mach. Learn. Res. 7(Apr), 587–602 (2006)
Littlestone, N.: Learning quickly when irrelevant attributes abound: A new linear threshold algorithm. Mach. Learn. 2(4), 285–318 (1988)
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comp. 108(2), 212–261 (1994)
Martin, R.K., Sethares, W.A., Williamson, R.C., Johnson, Jr., C.R.: Exploiting sparsity in adaptive filters. IEEE Trans. Signal Process. 50(8), 1883–1894 (2002)
Mossel, E., O'Donnell, R., Servedio, R.A.: Learning functions of k relevant variables. J. Comp. Syst. Sci. 69(3), 421–434 (2004)
Ng, A.Y.: Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: Greiner, R., Schuurmans, D. (eds.) Proceedings of the 21st International Conference on Machine Learning, pp 615–622. The International Machine Learning Society, Princeton (2004)
Vovk, V.: Aggregating strategies. In: Fulk, M., Case, J. (eds.) Proceedings of the 3rd Annual Workshop on Computational Learning Theory, p. 371–383. Morgan Kaufmann, San Mateo (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag
About this entry
Cite this entry
Kivinen, J. (2008). Attribute-Efficient Learning. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30162-4_43
Download citation
DOI: https://doi.org/10.1007/978-0-387-30162-4_43
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30770-1
Online ISBN: 978-0-387-30162-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering