Statistical analysis of kernel-based least-squares density-ratio estimation

Kanamori, Takafumi; Suzuki, Taiji; Sugiyama, Masashi

doi:10.1007/s10994-011-5266-3

Statistical analysis of kernel-based least-squares density-ratio estimation

Published: 01 November 2011

Volume 86, pages 335–367, (2012)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Statistical analysis of kernel-based least-squares density-ratio estimation

Download PDF

Takafumi Kanamori¹,
Taiji Suzuki² &
Masashi Sugiyama³

2337 Accesses
51 Citations
4 Altmetric
Explore all metrics

Abstract

The ratio of two probability densities can be used for solving various machine learning tasks such as covariate shift adaptation (importance sampling), outlier detection (likelihood-ratio test), feature selection (mutual information), and conditional probability estimation. Several methods of directly estimating the density ratio have recently been developed, e.g., moment matching estimation, maximum-likelihood density-ratio estimation, and least-squares density-ratio fitting. In this paper, we propose a kernelized variant of the least-squares method for density-ratio estimation, which is called kernel unconstrained least-squares importance fitting (KuLSIF). We investigate its fundamental statistical properties including a non-parametric convergence rate, an analytic-form solution, and a leave-one-out cross-validation score. We further study its relation to other kernel-based density-ratio estimators. In experiments, we numerically compare various kernel-based density-ratio estimation methods, and show that KuLSIF compares favorably with other approaches.

References

Ali, S. M., & Silvey, S. D. (1966). A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society, Series B, 28, 131–142.
MathSciNet Google Scholar
Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 337–404.
Article MathSciNet Google Scholar
Bartlett, P. L., & Tewari, A. (2007). Sparseness vs estimating conditional probabilities: some asymptotic results. Journal of Machine Learning Research, 8, 775–790.
MathSciNet Google Scholar
Bartlett, P. L., Jordan, M. I., & McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101, 138–156.
Article MathSciNet Google Scholar
Bickel, S., Brückner, M., & Scheffer, T. (2007). Discriminative learning for differing training and test distributions. In Proceedings of the 24th international conference on machine learning (pp. 81–88).
Chapter Google Scholar
Bickel, S., Brückner, M., & Scheffer, T. (2009). Discriminative learning under covariate shift. Journal of Machine Learning Research, 10, 2137–2155.
Google Scholar
Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observation. Studia Scientiarum Mathematicarum Hungarica, 2, 229–318.
Google Scholar
Cucker, F., & Smale, S. (2002). On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39, 1–49.
Article MathSciNet Google Scholar
Golub, G. H., & Loan, C. F. V. (1996). Matrix computations. Baltimore: Johns Hopkins University Press.
Google Scholar
Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. J. (2006). A kernel method for the two-sample-problem. Advances in Neural Information Processing Systems, 19, 513–520.
Google Scholar
Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., & Schölkopf, B. (2009). Covariate shift by kernel mean matching. In J. Quiñonero-Candela, M. Sugiyama, A. Schwaighofer, & N. Lawrence (Eds.), Dataset shift in machine learning, Chap. 8 (pp. 131–160). Cambridge: MIT Press.
Google Scholar
Härdle, W., Müller, M., Sperlich, S., & Werwatz, A. (2004). Nonparametric and semiparametric models. Springer series in statistics. Berlin: Springer.
Book Google Scholar
Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., & Kanamori, T. (2008). Inlier-based outlier detection via direct density ratio estimation. In Proceedings of IEEE international conference on data mining (ICDM2008) (pp. 223–232), Pisa, Italy.
Chapter Google Scholar
Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., & Kanamori, T. (2011). Statistical outlier detection using direct density ratio estimation. Knowledge and Information Systems, 26, 309–336.
Article Google Scholar
Huang, J., Smola, A., Gretton, A., Borgwardt, K. M., & Schölkopf, B. (2007). Correcting sample selection bias by unlabeled data. Advances in neural information processing systems, vol. 19 (pp. 601–608). Cambridge: MIT Press.
Google Scholar
Kanamori, T., Hido, S., & Sugiyama, M. (2009). A least-squares approach to direct importance estimation. Journal of Machine Learning Research, 10, 1391–1445.
MathSciNet Google Scholar
Kanamori, T., Suzuki, T., & Sugiyama, M. (2010). Theoretical analysis of density ratio estimation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E93-A, 787–798.
Article Google Scholar
Kanamori, T., Suzuki, T., & Sugiyama, M. (2011, submitted). Kernel-based density ratio estimation. Part II. Condition number analysis. Machine Learning. .
Kawahar, Y., & Sugiyama, M. (2011, to appear) Sequential change-point detection based on direct density-ratio estimation. Statistical Analysis and Data Mining.
Keerthi, S. S., Duan, K., Shevade, S. K., & Poo, A. N. (2005). A fast dual algorithm for kernel logistic regression. Machine Learning, 61, 151–165.
Article Google Scholar
Kimeldorf, G. S., & Wahba, G. (1971). Some results on Tchebycheffian spline functions. Journal of Mathematical Analysis and Applications, 33, 82–95.
Article MathSciNet Google Scholar
Luenberger, D., & Ye, Y. (2008). Linear and nonlinear programming. Berlin: Springer.
Google Scholar
Nguyen, X., Wainwright, M. J., & Jordan, M. I. (2010). Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56, 5847–5861.
Article MathSciNet Google Scholar
Park, C. (2009). Convergence rates of generalization errors for margin-based classification. Journal of Statistical Planning and Inference, 139, 2543–2551.
Article MathSciNet Google Scholar
Platt, J. C. (2000). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in large margin classifiers (pp. 61–74).
Google Scholar
Qin, J. (1998). Inferences for case-control and semiparametric two-sample density ratio models. Biometrika, 85, 619–639.
Article MathSciNet Google Scholar
Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. (Eds.) (2009). Dataset shift in machine learning. Cambridge: MIT Press.
Google Scholar
R Development Core Team (2009). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
Google Scholar
Rätsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for adaboost. Machine Learning, 42, 287–320.
Article Google Scholar
Reed, M., & Simon, B. (1972). Functional analysis. New York: Academic Press.
Google Scholar
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics, 27, 832–837.
Article MathSciNet Google Scholar
Rüping, S. (2003). myklr—kernel logistic regression. Dortmund: University of Dortmund, Department of Computer Science.
Google Scholar
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.
Google Scholar
Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90, 227–244.
Article MathSciNet Google Scholar
Smola, A., Song, L., & Teo, C. H. (2009). Relative novelty detection. In Twelfth international conference on artificial intelligence and statistics (pp. 536–543).
Google Scholar
Steinwart, I. (2001). On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2, 67–93.
MathSciNet Google Scholar
Steinwart, I. (2005). Consistency of support vector machines and other regularized kernel classifiers. IEEE Transactions on Information Theory, 51, 128–142.
Article MathSciNet Google Scholar
Sugiyama, M. (2010). Superfast-trainable multi-class probabilistic classifier by least-squares posterior fitting. IEICE Transactions on Information and Systems, E93-D, 2690–2701.
Article Google Scholar
Sugiyama, M., & Müller, K.-R. (2005). Input-dependent estimation of generalization error under covariate shift. Statistics & Decisions, 23, 249–279.
Article MathSciNet Google Scholar
Sugiyama, M., Krauledat, M., & Müller, K.-R. (2007). Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8, 985–1005.
Google Scholar
Sugiyama, M., Kanamori, T., Suzuki, T., Hido, S., Sese, J., Takeuchi, I., & Wang, L. (2009). A density-ratio framework for statistical data processing. IPSJ Transactions on Computer Vision and Applications, 1, 183–208.
Article Google Scholar
Sugiyama, M., Nakajima, S., Kashima, H., von Bünau, P., & Kawanabe, M. (2008a). Direct importance estimation with model selection and its application to covariate shift adaptation. Advances in Neural information processing systems, vol. 20 (pp. 1433–1440). Cambridge: MIT Press.
Google Scholar
Sugiyama, M., Suzuki, T., Nakajima, S., Kashima, H., von Bünau, P., & Kawanabe, M. (2008b). Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics, 60, 699–746.
Article MathSciNet Google Scholar
Sugiyama, M., Takeuchi, I., Suzuki, T., Kanamori, T., Hachiya, H., & Okanohara, D. (2010). Least-squares conditional density estimation. IEICE Transactions on Information and Systems, E93-D, 583–594.
Article Google Scholar
Sugiyama, M., & Kawanabe, M. (2011, to appear). Machine learning in non-stationary environments: introduction to covariate shift adaptation. Cambridge: MIT Press.
Google Scholar
Sugiyama, M., Suzuki, T., & Kanamori, T. (2012, to appear). Density ratio estimation in machine learning. Cambridge: Cambridge University Press.
Google Scholar
Suzuki, T., Sugiyama, M., Sese, J., & Kanamori, T. (2008). Approximating mutual information by maximum likelihood density ratio estimation. In JMLR workshop and conference proceedings (pp. 5–20).
Google Scholar
Suzuki, T., Sugiyama, M., & Tanaka, T. (2009). Mutual information approximation via maximum likelihood estimation of density ratio. In Proceedings of 2009 IEEE international symposium on information theory (ISIT2009) (pp. 463–467), Seoul, Korea.
Chapter Google Scholar
Tsuboi, Y., Kashima, H., Hido, S., Bickel, S., & Sugiyama, M. (2008). Direct density ratio estimation for large-scale covariate shift adaptation. In SDM (pp. 443–454).
Google Scholar
van de Geer, S. (2000). Empirical processes in M-estimation. Cambridge: Cambridge University Press.
Google Scholar
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
Google Scholar
Wahba, G., Gu, C., Wang, Y., & Chappel, R. (1993). Soft classification, a.k.a. risk estimation, via penalized log likelihood and smoothing spline analysis of variance. The mathematics of generalization. Reading: Addison-Wesley.
Google Scholar
Yamada, M., Suzuki, T., Kanamori, T., Hachiya, H., & Sugiyama, M. (2011, to appear). Relative density-ratio estimation for robust distribution comparison. In Advances in neural information processing systems vol. 24.
Google Scholar
Zadrozny, B. (2004). Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on machine learning. New York: ACM Press.
Google Scholar
Zeidler, E. (1986). Nonlinear functional analysis and its applications. In Fixed-point theorems. Berlin: Springer.
Google Scholar
Zhu, J., & Hastie, T. (2001). Kernel logistic regression and the import vector machine. Journal of Computational and Graphical Statistics, 14, 1081–1088.
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Nagoya University, Nagoya, Japan
Takafumi Kanamori
University of Tokyo, Tokyo, Japan
Taiji Suzuki
Tokyo Institute of Technology, Tokyo, Japan
Masashi Sugiyama

Authors

Takafumi Kanamori
View author publications
You can also search for this author in PubMed Google Scholar
Taiji Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Masashi Sugiyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takafumi Kanamori.

Additional information

Editor: Massimiliano Pontil

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kanamori, T., Suzuki, T. & Sugiyama, M. Statistical analysis of kernel-based least-squares density-ratio estimation. Mach Learn 86, 335–367 (2012). https://doi.org/10.1007/s10994-011-5266-3

Download citation

Received: 08 April 2009
Accepted: 30 September 2011
Published: 01 November 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s10994-011-5266-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Statistical analysis of kernel-based least-squares density-ratio estimation

Abstract

Article PDF

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

A review of unsupervised feature selection methods

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Statistical analysis of kernel-based least-squares density-ratio estimation

Abstract

Article PDF

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

A review of unsupervised feature selection methods

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation