Abstract
This paper connects hard-core set construction, a type of hardness amplification from computational complexity, and boosting, a technique from computational learning theory. Using this connection we give fruitful applications of complexity-theoretic techniques to learning theory and vice versa. We show that the hard-core set construction of Impagliazzo (1995), which establishes the existence of distributions under which boolean functions are highly inapproximable, may be viewed as a boosting algorithm. Using alternate boosting methods we give an improved bound for hard-core set construction which matches known lower bounds from boosting and thus is optimal within this class of techniques. We then show how to apply techniques from Impagliazzo (1995) to give a new version of Jackson's celebrated Harmonic Sieve algorithm for learning DNF formulae under the uniform distribution using membership queries. Our new version has a significant asymptotic improvement in running time. Critical to our arguments is a careful analysis of the distributions which are employed in both boosting and hard-core set constructions.
Article PDF
Similar content being viewed by others
References
Babai, L., Fortnow, L., Nisan, N., & Wigderson, A. (1993). BPP has subexponential time simulations unless exptime has publishable proofs. Computational Complexity, 3, 307–318.
Blum, A., Furst, M., Jackson, J., Kearns, M., Mansour, Y., & Rudich, S. (1994). Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In Proceedings of the Twenty-Sixth Annual Symposium on Theory of Computing (pp. 253–262). ACM.
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36:4, 929–965.
Boneh, D., & Lipton, R. (1993). Amplification of weak learning over the uniform distribution. In Proceedings of the Sixth Annual Workshop on Computational Learning Theory (pp. 347–351). ACM.
Bshouty, N., Jackson, J., & Tamon, C. (1999). More efficient PAC learning of DNF with membership queries under the uniform distribution. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 286–295).
Drucker, H., & Cortes, C. (1996). Boosting decision trees. In Advances in Neural Information Processing Systems 8 (pp. 479–485).
Drucker, H., Cortes, C., Jackel, L. D., Lecun, Y., & Vapnik, V. (1994). Boosting and other ensemble methods. Neural Computation, 6:6, 1289–1301.
Drucker, H., Schapire, R., & Simard, P. (1993a). Boosting performance in neural networks. International Journal of Pattern Recognition and Machine Intelligence, 7:4, 705–719.
Drucker, H., Schapire, R., & Simard, P. (1993b). Improving performance in neural networks using a boosting algorithm. In Advances in Neural Information Processing Systems 5 (pp. 42–49).
Freund, Y. (1990). Boosting a weak learning algorithm by majority. In Proceedings of the Third Annual Workshop on Computational Learning Theory (pp. 202–216).
Freund, Y. (1992). An improved boosting algorithm and its implications on learning complexity. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory (pp. 391–398).
Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121:2, 256–285.
Freund, Y., & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:1, 119–139.
Goldreich, O., Nisan, N., & Wigderson, A. (1995). On Yao's xor-lemma. Electronic Colloquium on Computational Complexity, TR95-050.
Impagliazzo, R. (1995). Hard-core distributions for somewhat hard problems. In Proceedings of the Thirty-Sixth Annual Symposium on Foundations of Computer Science (pp. 538–545). IEEE.
Impagliazzo, R., & Widgerson, A. (1997). P = BPP unless E has subexponential circuits: Derandomizing the xor lemma. In Proceedings of the Twenty-Ninth Annual Symposium on Theory of Computing (pp. 220–229).
Jackson, J. (1995). The Harmonic sieve: A novel application of Fourier analysis to machine learning theory and practice. Ph.D. Thesis, Carnegie Mellon University.
Jackson, J. (1997). An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. Journal of Computer and System Sciences, 55, 414–440.
Jackson, J. (2002). Personal communication.
Jackson, J., & Craven, M. (1996). Learning sparse perceptrons. In Advances in Neural Information Processing Systems 8 (pp. 654–660).
Kearns, M., & Valiant, L. (1994). Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM, 41:1, 67–95.
Klivans, A., & Servedio, R. (2001). Learning DNF in time \(2^{\tilde O(n^{{1 \mathord{\left/ {\vphantom {1 3}} \right. \kern-\nulldelimiterspace} 3}} )} \). In Proceedings of the Twenty-Sixth Annual Symposium on Theory of Computing (pp. 258–265).
Levin, L. (1986). Average case complete problems. SIAM Journal on Computing, 15:1, 285–286.
Muller, D., & Preparata, F. (1975). Bounds to complexities of networks for sorting and for switching. Journal of the ACM, 22:2, 195–201.
Nisan, N., & Wigderson, A. (1994). Hardness versus randomness. Journal of Computer and System Sciences, 49, 149–167.
Schapire, R. (1990). The strength of weak learnability. Machine Learning, 5:2, 197–227.
Schapire, R., & Singer,Y. (1998). Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory (pp. 80–91).
Servedio, R. (2001). Smooth boosting and learning with malicious noise. In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory (pp. 473–489).
Shaltiel, R. (2001). Towards proving strong direct product theorems. In Proceedings of the Sixteenth Conference on Computational Complexity (pp. 107–117).
Sudan, M., Trevisan, L., & Vadhan, S. (2001). Pseudorandom generators without the xor lemma. Journal of Computer and System Sciences, 62:2, 236–266.
Valiant, L. (1984). A theory of the learnable. Communications of the ACM, 27:11, 1134–1142.
Wigderson, A. (1999). Personal communication.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Klivans, A.R., Servedio, R.A. Boosting and Hard-Core Set Construction. Machine Learning 51, 217–238 (2003). https://doi.org/10.1023/A:1022949332276
Issue Date:
DOI: https://doi.org/10.1023/A:1022949332276