Symmetry-adapted representation learning
Introduction
Symmetry is ubiquitous from subatomic particles to natural patterns, man-made design, art and mathematics. Invariance to symmetries in pattern recognition and computational neuroscience is an old challenging problem [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. More recently, in the context of machine learning, data symmetries have been used to derive data representations with the properties of equivariance and invariance [13], [14], [15], [16], [17] to unknown, symmetry-generating transformations, for example geometric transformations. These properties are reflected as structure in the representation atoms and can be explicitly used for reducing the complexity of downstream supervised learning. This is achieved, for example, by constructing representations that are invariant to transformations irrelevant for the learning task, that preserve the data distribution and prediction function [18], [19], [20]. For example, for image classification, object position or scale are data symmetries that are irrelevant.
Representations that reflect symmetries inherent in the data distribution define a quotient representation space where points are equivalent up to transformations [21]. In this space, the sample complexity of learning (the size of the labeled training set) [16], [22], [23] can be reduced by pooling on the representation coefficients. Indeed, the pooling operation has a crucial role in Convolutional Neural Networks (CNNs) for enforcing stability to small, local perturbations [24], [25]. On the other hand, learning symmetry-adapted representations is in the direction of (a) generalizing CNNs to arbitrary weight sharing schemes and invariances by learning the symmetry group from the data and (b) learning, as opposed to informed designing, network architectures and feature map properties, such as locality, connectivity patterns and weight-sharing topologies.
CNNs and Convolutional Sparse Coding schemes [26] have an explicit parameterization for equivariance and robustness to shifts in the input (translations) through convolutions and pooling respectively (see also [27]). However, data symmetries extend to more general transformations depending on the data domain, for example geometric changes such as scaling, rotation or affine maps for the case of images, which, in general are unknown or complex to model. A symmetry-blind data representation will have to compensate for transformation variability using more parameters, and as a result an increased demand for labeled examples or data augmentation and adaptation assuming simple and known transformation models [28], [29]. Extensions of CNNs to known transformations, beyond translations, were explored with scale-space pooling [30], convolutional maxout networks [31], pooling over neighboring values and similar filters [32], tiled CNNs [33], cyclic weight sharing and pooling [34], and wavelet scattering networks [35], [36], (but see also [37]). In particular symmetry networks [23] and group-equivariant networks [17], [38] highlighted the complexity gains of incorporating other symmetries in the representations at each layer of CNNs. In [39] the parameters of a neural network are tied to achieve equivariance with respect to a known group. Weight sharing over transformations capturing perceptual changes, such as speaker characteristics, were used for speech representations through group-CNNs [40].
The more general problem of learning symmetries has been previously approached as estimating the infinitesimal generators of Lie groups generating data transformations [41], [42], [43], [44]. Symmetries in learning have been used in the context of categorizing symmetry groups (mirror, roto-translation) in random patterns [45]. The relations between typical, e.g., l1, l2, l∞, and group-based regularization schemes, for known groups, have been explored in [46].
The contributions of this work are: (1) Outlining principles for learning symmetries in data, and learning equivariant representations, without explicit knowledge of the symmetry group. As opposed to learning with known symmetries, like many existing methods, we propose learning the symmetries; (2) Reducing the analysis of generic group symmetries to permutation symmetries; (3) Deriving an analytic expression for a regularization term acting on the representation matrix that promotes a group structure and specifically the group structure in an unlabeled observation set.
The rest of the paper is organized as follows: In Section 2, we briefly recall the setting of representations with equivariance or invariance to transformations captured by group symmetries [14], [16], [17], [21], [47]. In Section 3 we formulate the problem of symmetry-adapted representation learning and provide a general principle (Section 4) for designing the regularization term. The main theoretical contributions are stated in Section 5 along with a computable, analytic form for a regularization term. Section 6 provides proof of concept results on learning exact, analytic, group-transformations.
Section snippets
Equivariant and invariant representations
We briefly recall the setting of constructing data representations with equivariance or invariance to transformations using group symmetries [14], [16], [17], [21], [47]. Intuitively, a representation is equivariant with respect to a transformation on the input space, if it can be equivalently expressed as some transformation on the representation space. If this is the identity, the representation is invariant to the transformation.
Let the input space be a vector space endowed with dot product
Problem formulation
In the above sense, deriving equivariant and invariant representations is conditioned upon having access to an orbit set W, or learning a set W such that it reflects the same generating symmetries of the target group. In this paper, we focus on learning such a symmetry set, without supervision, from an observation set for which we make the following simplifying assumption:
Assumption 1 The observation set is a finite collection of Q orbits in w.r.t. a finite group
The Gram matrix of orbits
The simple observation for designing comes from the inspection of the matrix of all inner products of the vectors of an orbit (the so called Gram matrix). If the columns of W correspond to an orbit of a vector : then the associated Gram matrix has entries of the form: where the conjugate transpose of gi and is an injective function that depends on the vector t. Assuming a unitary group, i.e. a
Analytic expressions for regularization
In this section we provide closed-form expressions for two conditions: a) the permuted matrix condition, from Proposition 1, enforcing a group orbit structure on the representation matrix W and b) the same-symmetry condition penalizing groups different than the generating group of an observation set SN.
Results on unsupervised orbit learning
As a proof of concept, we pose the following unsupervised learning problem: Given an unlabeled observation set as in (9), namely a union of orbits of the same, finite, unknown group learn a single orbit W, of an arbitrary vector with respect to the (latent) group that generated the data.
Conclusion
We studied the problem of learning data symmetries, in particular group symmetries, as a prior on structure due to transformations in the data. Our motivation was to derive representations that are adapted to symmetries and reduce the sample complexity of downstream supervised learning. In particular we explored mathematical conditions that can drive, in an unsupervised way, a learned representation to reflect the symmetries in an unlabeled training set.The approach is particularly relevant for
Acknowledgments
This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216 and the Italian Institute of Technology. The authors gratefully acknowledge Alan Yuille, Maximilian Nickel, Silvia Villa and Carlo Cilliberto, for the insightful discussions and feedback on early versions of this work, and Saverio Salzo for pointing to the proof of Lemma 3.
Anselmi Fabio is a postdoctoral fellow in the Istituto Italiano di Tecnologia and the Laboratory for Computational and Statistical Learning at MIT and part of the Center for Brains, Minds, and Machines.
References (61)
- et al.
On the minimum number of templates required for shift, rotation and size invariant pattern recognition
Pattern Recognit.
(1988) Group invariant pattern recognition
Pattern Recognit.
(1990)- et al.
A comparison between Fourier-Mellin descriptors and moment based features for invariant object recognition using neural networks
Pattern Recognit. Lett.
(1991) - et al.
Pattern recognition by affine moment invariants
Pattern Recognit.
(1993) - et al.
A Lie group approach to steerable filters
Pattern Recognit. Lett.
(1995) Invariant pattern recognition: a review
Pattern Recognit.
(1996)- et al.
Learning invariant object recognition from temporal correlation in a hierarchical network
Neural Netw.
(2014) - et al.
Integral invariants for space motion trajectory matching and recognition
Pattern Recognit.
(2015) - et al.
Unsupervised learning of invariant representations
Theor. Comput. Sci.
(2016) Learning from hints in neural networks
J. Complex.
(1990)
Deep learning in neural networks: an overview
Neural Netw.
Constructing deep sparse coding network for image classification
Pattern Recognit.
Invariant 2d object recognition using the wavelet modulus maxima
Pattern Recognit. Lett.
Learning symmetry groups with hidden units: beyond the perceptron
Physica D
Least-squares inner product shaping
Linear Algebra Appl.
A logical calculus of the ideas immanent in nervous activity
Bull. Math. Biophys.
Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position
Biol. Cybern.
Learning invariance from transformation sequences
Neural Comput.
A multiscale dynamic routing circuit for forming size- and position-invariant object representations
J. Comput. Neurosci.
Vision and Lie’s approach to invariance
Image Vis. Comput.
Hierarchical models of object recognition in cortex
Nat. Neurosci.
Understanding image representations by measuring their equivariance and equivalence
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)
On invariance and selectivity in representation learning
Inf. Inference
Group equivariant convolutional networks
International Conference on Machine Learning (ICML)
Emergence of invariance and disentangling in deep representations
Proceedings of the ICML Workshop on Principled Approaches to Deep Learning
Invariant kernel functions for pattern analysis and machine learning
Mach. Learn.
Deep symmetry networks
Advances in Neural Information Processing System (NIPS)
Representation learning: a review and new perspectives
IEEE Trans. Pattern Anal. Mach. Intell.
Convolutional neural networks analyzed via convolutional sparse coding
J. Mach. Learn. Res.
Cited by (45)
A Novel Quasi-Newton Method for Composite Convex Minimization
2022, Pattern RecognitionCitation Excerpt :However, it is a good technique for large problems as it does not need many copies of matrix and vector values to be saved in the storage for parallel computing. A lot of works has shown that second derivative information is useful for optimization [29,30]. In this paper, we proposed a method which integrated both first and second-order techniques to boost the optimization performance.
Neural Discovery of Permutation Subgroups
2023, arXivOn the finite representation of linear group equivariant operators via permutant measures
2023, Annals of Mathematics and Artificial Intelligence
Anselmi Fabio is a postdoctoral fellow in the Istituto Italiano di Tecnologia and the Laboratory for Computational and Statistical Learning at MIT and part of the Center for Brains, Minds, and Machines.
- 1
Current affiliation: X (Alphabet Inc.).