Information Theoretic Learning for Pixel-Based Visual Agents

Gori, Marco; Melacci, Stefano; Lippi, Marco; Maggini, Marco

doi:10.1007/978-3-642-33783-3_62

Marco Gori²¹,
Stefano Melacci²¹,
Marco Lippi²¹ &
…
Marco Maggini²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7577))

Included in the following conference series:

European Conference on Computer Vision

9353 Accesses
5 Citations

Abstract

In this paper we promote the idea of using pixel-based models not only for low level vision, but also to extract high level symbolic representations. We use a deep architecture which has the distinctive property of relying on computational units that incorporate classic computer vision invariances and, especially, the scale invariance. The learning algorithm that is proposed, which is based on information theory principles, develops the parameters of the computational units and, at the same time, makes it possible to detect the optimal scale for each pixel. We give experimental evidence of the mechanism of feature extraction at the first level of the hierarchy, which is very much related to SIFT-like features. The comparison shows clearly that, whenever we can rely on the massive availability of training data, the proposed model leads to better performances with respect to SIFT.

Download to read the full chapter text

Chapter PDF

Integrating Deep Learning Based Perception with Probabilistic Logic via Frequent Pattern Mining

Application and Perspectives of Convolutional Neural Networks in Digital Intelligence

Superstitious Perception: Comparing Perceptual Prediction by Humans and Neural Networks

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Sermanet, P., LeCun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: Int. Joint Conference on Neural Networks, pp. 2809–2813. IEEE (2011)
Google Scholar
Kavukcuoglu, K., Sermanet, P., Boureau, Y., Gregor, K., Mathieu, M., LeCun, Y.: Learning convolutional feature hierarchies for visual recognition. In: Advances in Neural Information Processing Systems (2010)
Google Scholar
Lee, H., Grosse, R., Ranganath, R., Ng, A.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: International Conference on Machine Learning, pp. 609–616. ACM (2009)
Google Scholar
Kavukcuoglu, K., Ranzato, M., Fergus, R., LeCun, Y.: Learning invariant features through topographic filter maps. In: CVPR, pp. 1605–1612. IEEE (2009)
Google Scholar
Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: CVPR, pp. 2146–2153. IEEE (2009)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Article Google Scholar
Fukushima, K.: Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Networks 1(2), 119–130 (1988)
Article Google Scholar
Principe, J.: Information theoretic learning: Renyi’s entropy and kernel perspectives. Springer (2010)
Google Scholar
Melacci, S., Gori, M.: Kernel methods for minimum entropy encoding. In: International Conference on Machine Learning and Applications, pp. 352–357. IEEE (2011)
Google Scholar
Riedmiller, M., Braun, H.: A direct algorithm method for faster backpropagation learning: the RPROP algorithm. In: Proceedings of the IEEE International Conference on Neural Networks, vol. 1, pp. 586–591 (1993)
Google Scholar
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding 106(1), 59–70 (2007)
Article Google Scholar
van Gemert, J., Veenman, C., Smeulders, A., Geusebroek, J.: Visual word ambiguity. IEEE TPAMI 32(7), 1271–1283 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, University of Siena, Via Roma 56, 53100, Siena, Italy
Marco Gori, Stefano Melacci, Marco Lippi & Marco Maggini

Authors

Marco Gori
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Melacci
View author publications
You can also search for this author in PubMed Google Scholar
Marco Lippi
View author publications
You can also search for this author in PubMed Google Scholar
Marco Maggini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research Ltd., CB3 0FB, Cambridge, UK
Andrew Fitzgibbon
Dept. of Computer Science, University of North Carolina, 27599, Chapel Hill, NC, USA
Svetlana Lazebnik
California Institute of Technology, 91125, Pasadena, CA, USA
Pietro Perona
Institute of Industrial Science, The University of Tokyo, 153-8505, Tokyo, Japan
Yoichi Sato
INRIA, 38330, Montbonnot, France
Cordelia Schmid

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gori, M., Melacci, S., Lippi, M., Maggini, M. (2012). Information Theoretic Learning for Pixel-Based Visual Agents. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33783-3_62

Download citation

DOI: https://doi.org/10.1007/978-3-642-33783-3_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33782-6
Online ISBN: 978-3-642-33783-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Information Theoretic Learning for Pixel-Based Visual Agents

Abstract

Chapter PDF

Similar content being viewed by others

Integrating Deep Learning Based Perception with Probabilistic Logic via Frequent Pattern Mining

Application and Perspectives of Convolutional Neural Networks in Digital Intelligence

Superstitious Perception: Comparing Perceptual Prediction by Humans and Neural Networks

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Information Theoretic Learning for Pixel-Based Visual Agents

Abstract

Chapter PDF

Similar content being viewed by others

Integrating Deep Learning Based Perception with Probabilistic Logic via Frequent Pattern Mining

Application and Perspectives of Convolutional Neural Networks in Digital Intelligence

Superstitious Perception: Comparing Perceptual Prediction by Humans and Neural Networks

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation