Elsevier

Pattern Recognition

Volume 86, February 2019, Pages 201-208
Pattern Recognition

Symmetry-adapted representation learning

https://doi.org/10.1016/j.patcog.2018.07.025Get rights and content

Abstract

In this paper, we propose the use of data symmetries, in the sense of equivalences under signal transformations, as priors for learning symmetry-adapted data representations, i.e., representations that are equivariant to these transformations. We rely on a group-theoretic definition of equivariance and provide conditions for enforcing a learned representation, for example the weights in a neural network layer or the atoms in a dictionary, to have the structure of a group and specifically the group structure in the distribution of the input. By reducing the analysis of generic group symmetries to permutation symmetries, we devise a regularization scheme for representation learning algorithm, using an unlabeled training set. The proposed regularization is aimed to be a conceptual, theoretical and computational proof of concept for symmetry-adapted representation learning, where the learned data representations are equivariant or invariant to transformations, without explicit knowledge of the underlying symmetries in the data.

Introduction

Symmetry is ubiquitous from subatomic particles to natural patterns, man-made design, art and mathematics. Invariance to symmetries in pattern recognition and computational neuroscience is an old challenging problem [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. More recently, in the context of machine learning, data symmetries have been used to derive data representations with the properties of equivariance and invariance [13], [14], [15], [16], [17] to unknown, symmetry-generating transformations, for example geometric transformations. These properties are reflected as structure in the representation atoms and can be explicitly used for reducing the complexity of downstream supervised learning. This is achieved, for example, by constructing representations that are invariant to transformations irrelevant for the learning task, that preserve the data distribution and prediction function [18], [19], [20]. For example, for image classification, object position or scale are data symmetries that are irrelevant.

Representations that reflect symmetries inherent in the data distribution define a quotient representation space where points are equivalent up to transformations [21]. In this space, the sample complexity of learning (the size of the labeled training set) [16], [22], [23] can be reduced by pooling on the representation coefficients. Indeed, the pooling operation has a crucial role in Convolutional Neural Networks (CNNs) for enforcing stability to small, local perturbations [24], [25]. On the other hand, learning symmetry-adapted representations is in the direction of (a) generalizing CNNs to arbitrary weight sharing schemes and invariances by learning the symmetry group from the data and (b) learning, as opposed to informed designing, network architectures and feature map properties, such as locality, connectivity patterns and weight-sharing topologies.

CNNs and Convolutional Sparse Coding schemes [26] have an explicit parameterization for equivariance and robustness to shifts in the input (translations) through convolutions and pooling respectively (see also [27]). However, data symmetries extend to more general transformations depending on the data domain, for example geometric changes such as scaling, rotation or affine maps for the case of images, which, in general are unknown or complex to model. A symmetry-blind data representation will have to compensate for transformation variability using more parameters, and as a result an increased demand for labeled examples or data augmentation and adaptation assuming simple and known transformation models [28], [29]. Extensions of CNNs to known transformations, beyond translations, were explored with scale-space pooling [30], convolutional maxout networks [31], pooling over neighboring values and similar filters [32], tiled CNNs [33], cyclic weight sharing and pooling [34], and wavelet scattering networks [35], [36], (but see also [37]). In particular symmetry networks [23] and group-equivariant networks [17], [38] highlighted the complexity gains of incorporating other symmetries in the representations at each layer of CNNs. In [39] the parameters of a neural network are tied to achieve equivariance with respect to a known group. Weight sharing over transformations capturing perceptual changes, such as speaker characteristics, were used for speech representations through group-CNNs [40].

The more general problem of learning symmetries has been previously approached as estimating the infinitesimal generators of Lie groups generating data transformations [41], [42], [43], [44]. Symmetries in learning have been used in the context of categorizing symmetry groups (mirror, roto-translation) in random patterns [45]. The relations between typical, e.g., l1, l2, l, and group-based regularization schemes, for known groups, have been explored in [46].

The contributions of this work are: (1) Outlining principles for learning symmetries in data, and learning equivariant representations, without explicit knowledge of the symmetry group. As opposed to learning with known symmetries, like many existing methods, we propose learning the symmetries; (2) Reducing the analysis of generic group symmetries to permutation symmetries; (3) Deriving an analytic expression for a regularization term acting on the representation matrix that promotes a group structure and specifically the group structure in an unlabeled observation set.

The rest of the paper is organized as follows: In Section 2, we briefly recall the setting of representations with equivariance or invariance to transformations captured by group symmetries [14], [16], [17], [21], [47]. In Section 3 we formulate the problem of symmetry-adapted representation learning and provide a general principle (Section 4) for designing the regularization term. The main theoretical contributions are stated in Section 5 along with a computable, analytic form for a regularization term. Section 6 provides proof of concept results on learning exact, analytic, group-transformations.

Section snippets

Equivariant and invariant representations

We briefly recall the setting of constructing data representations with equivariance or invariance to transformations using group symmetries [14], [16], [17], [21], [47]. Intuitively, a representation is equivariant with respect to a transformation on the input space, if it can be equivalently expressed as some transformation on the representation space. If this is the identity, the representation is invariant to the transformation.

Let the input space be a vector space endowed with dot product

Problem formulation

In the above sense, deriving equivariant and invariant representations is conditioned upon having access to an orbit set W, or learning a set W such that it reflects the same generating symmetries of the target group. In this paper, we focus on learning such a symmetry set, without supervision, from an observation set for which we make the following simplifying assumption:

Assumption 1

The observation set SN={xi}i=1NX is a finite collection of Q orbits in X=Rd w.r.t. a finite group GSN={xi}i=1N={gxj,gG}j=

The Gram matrix of orbits

The simple observation for designing R(W) comes from the inspection of the matrix of all inner products of the vectors of an orbit (the so called Gram matrix). If the columns of W correspond to an orbit of a vector tRd: W=[g1t,,g|G|t],then the associated Gram matrix G=WTWR|G|×|G| has entries of the form: (G)ij=git,gjt=t,gi*gjt=νt(gi1gj)where gi* the conjugate transpose of gi and νt:G×GR is an injective function that depends on the vector t. Assuming a unitary group, i.e. g*=g1, a

Analytic expressions for regularization

In this section we provide closed-form expressions for two conditions: a) the permuted matrix condition, from Proposition 1, enforcing a group orbit structure on the representation matrix W and b) the same-symmetry condition penalizing groups different than the generating group G of an observation set SN.

Results on unsupervised orbit learning

As a proof of concept, we pose the following unsupervised learning problem: Given an unlabeled observation set as in (9), namely a union of orbits of the same, finite, unknown group G, learn a single orbit W, of an arbitrary vector tRd, with respect to the (latent) group that generated the data.

Conclusion

We studied the problem of learning data symmetries, in particular group symmetries, as a prior on structure due to transformations in the data. Our motivation was to derive representations that are adapted to symmetries and reduce the sample complexity of downstream supervised learning. In particular we explored mathematical conditions that can drive, in an unsupervised way, a learned representation to reflect the symmetries in an unlabeled training set.The approach is particularly relevant for

Acknowledgments

This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216 and the Italian Institute of Technology. The authors gratefully acknowledge Alan Yuille, Maximilian Nickel, Silvia Villa and Carlo Cilliberto, for the insightful discussions and feedback on early versions of this work, and Saverio Salzo for pointing to the proof of Lemma 3.

Anselmi Fabio is a postdoctoral fellow in the Istituto Italiano di Tecnologia and the Laboratory for Computational and Statistical Learning at MIT and part of the Center for Brains, Minds, and Machines.

References (61)

  • J. Schmidhuber

    Deep learning in neural networks: an overview

    Neural Netw.

    (2015)
  • S. Zhang et al.

    Constructing deep sparse coding network for image classification

    Pattern Recognit.

    (2017)
  • M.I. Khalil et al.

    Invariant 2d object recognition using the wavelet modulus maxima

    Pattern Recognit. Lett.

    (2000)
  • T.J. Sejnowski et al.

    Learning symmetry groups with hidden units: beyond the perceptron

    Physica D

    (1986)
  • Y. Eldar

    Least-squares inner product shaping

    Linear Algebra Appl.

    (2002)
  • W.S. McCulloch et al.

    A logical calculus of the ideas immanent in nervous activity

    Bull. Math. Biophys.

    (1943)
  • K. Fukushima

    Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position

    Biol. Cybern.

    (1980)
  • P. Foldiak

    Learning invariance from transformation sequences

    Neural Comput.

    (1991)
  • B.A. Olshausen et al.

    A multiscale dynamic routing circuit for forming size- and position-invariant object representations

    J. Comput. Neurosci.

    (1995)
  • L. Van Goola et al.

    Vision and Lie’s approach to invariance

    Image Vis. Comput.

    (1995)
  • M. Riesenhuber et al.

    Hierarchical models of object recognition in cortex

    Nat. Neurosci.

    (1999)
  • K. Lenc et al.

    Understanding image representations by measuring their equivariance and equivalence

    IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)

    (2015)
  • F. Anselmi et al.

    On invariance and selectivity in representation learning

    Inf. Inference

    (2015)
  • T.S. Cohen et al.

    Group equivariant convolutional networks

    International Conference on Machine Learning (ICML)

    (2016)
  • A. Achille et al.

    Emergence of invariance and disentangling in deep representations

    Proceedings of the ICML Workshop on Principled Approaches to Deep Learning

    (2017)
  • S. Soatto, Steps towards a theory of visual information: active perception, signal-to-symbol conversion and the...
  • B. Haasdonk et al.

    Invariant kernel functions for pattern analysis and machine learning

    Mach. Learn.

    (2007)
  • R. Gens et al.

    Deep symmetry networks

    Advances in Neural Information Processing System (NIPS)

    (2014)
  • Y. Bengio et al.

    Representation learning: a review and new perspectives

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • V. Papyan et al.

    Convolutional neural networks analyzed via convolutional sparse coding

    J. Mach. Learn. Res.

    (2017)
  • Cited by (45)

    • A Novel Quasi-Newton Method for Composite Convex Minimization

      2022, Pattern Recognition
      Citation Excerpt :

      However, it is a good technique for large problems as it does not need many copies of matrix and vector values to be saved in the storage for parallel computing. A lot of works has shown that second derivative information is useful for optimization [29,30]. In this paper, we proposed a method which integrated both first and second-order techniques to boost the optimization performance.

    View all citing articles on Scopus

    Anselmi Fabio is a postdoctoral fellow in the Istituto Italiano di Tecnologia and the Laboratory for Computational and Statistical Learning at MIT and part of the Center for Brains, Minds, and Machines.

    1

    Current affiliation: X (Alphabet Inc.).

    View full text