ABSTRACT
Training robust supervised deep learning models for many geospatial applications of computer vision is difficult due to dearth of class-balanced and diverse training data. Conversely, obtaining enough training data for many applications is financially prohibitive or may be infeasible, especially when the application involves modeling rare or extreme events. Synthetically generating data (and labels) using a generative model that can sample from a target distribution and exploit the multi-scale nature of images can be an inexpensive solution to address scarcity of labeled data. Towards this goal, we present a deep conditional generative model, called VAE-Info-cGAN, that combines a Variational Autoencoder (VAE) with a conditional Information Maximizing Generative Adversarial Network (InfoGAN), for synthesizing semantically rich images simultaneously conditioned on a pixel-level condition (PLC) and a macroscopic feature-level condition (FLC). Dimensionally, the PLC can only vary in the channel dimension from the synthesized image and is meant to be a task-specific input. The FLC is modeled as an attribute vector, a, in the latent space of the generated image which controls the contributions of various characteristic attributes germane to the target distribution. During generation, a is sampled from U[0, 1], while it is learned directly from the ground truth during training. An interpretation of a to systematically generate synthetic images by varying a chosen binary macroscopic feature is explored by training a linear binary classifier in the latent space. Experiments on a GPS trajectories dataset show that the proposed model can accurately generate various forms of spatio-temporal aggregates across different geographic locations while conditioned only on a raster representation of the road network. The primary intended application of the VAE-Info-cGAN is synthetic data (and label) generation for targeted data augmentation for computer vision-based modeling of problems relevant to geospatial analysis and remote sensing.
- C. Bowles et al. 2018. GAN augmentation: Augmenting training data using Generative Adversarial Networks. arXiv preprint arXiv:1810.10863 (2018).Google Scholar
- A. Brock et al. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).Google Scholar
- C. P. Burgess et al. 2018. Understanding disentangling in β-VAE. arXiv preprint arXiv:1804.03599 (2018).Google Scholar
- C. Chan et al. 2019. Everybody dance now. In Proceedings of the IEEE International Conference on Computer Vision. 5933--5942.Google ScholarCross Ref
- X. Chen et al. 2016. InfoGAN: Interpretable representation learning by information maximizing Generative Adversarial Nets.Google Scholar
- J. Engel et al. 2019. GANSynth: Adversarial neural audio synthesis. arXiv preprint arXiv:1902.08710.Google Scholar
- Y. Gal et al. 2017. Deep Bayesian active learning with image data. arXiv preprint arXiv:1703.02910.Google Scholar
- S. Ganguli, P. Garzon, and N. Glaser. 2019. GeoGAN: A conditional GAN with reconstruction and style loss to generate standard layer of maps from satellite images. arXiv preprint arXiv:1902.05611 (2019).Google Scholar
- T. Kaneko et al. 2017. Generative attribute controller with conditional filtered Generative Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6089--6098.Google ScholarCross Ref
- T. Karras et al. 2018. A style-based generator architecture for Generative Adversarial Networks. arXiv preprint arXiv:1812.04948.Google Scholar
- A. Kumar et al. 2017. Variational inference of disentangled latent concepts from unlabeled observations. arXiv preprint arXiv:1711.00848 (2017).Google Scholar
- A. B. L. Larsen et al. 2015. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300 (2015).Google Scholar
- K. Lee and D. Moloney. 2017. Evaluation of synthetic data for deep learning stereo depth algorithms on embedded platforms. In 4th International Conference on Systems and Informatics (ICSAI). 170--176.Google Scholar
- Liqian Ma et al. 2019. A novel bilevel paradigm for image-to-image translation. arXiv preprint arXiv:1904.09028 (2019).Google Scholar
- M. Mirza and S. Osindero. 2014. Conditional Generative Adversarial Nets. arXiv preprint arXiv:1411.1784 (2014).Google Scholar
- L. Moreira-Matias et al. 2013. Predicting taxi-passenger demand using streaming data. IEEE Transactions on Intelligent Transportation Systems 14 (2013), 1393--1402.Google ScholarDigital Library
- S. I Nikolenko. 2019. Synthetic data for deep learning. arXiv preprint arXiv:1909.11512 (2019).Google Scholar
- A. Odena et al. 2016. Deconvolution and checkerboard artifacts. Distill (2016). http://distill.pub/2016/deconv-checkerboardGoogle Scholar
- A. Perez et al. 2019. Semi-supervised multitask learning on multispectral satellite images using Wasserstein Generative Adversarial Networks (GANs) for predicting poverty. arXiv preprint arXiv:1902.11110 (2019).Google Scholar
- EPSG Geodetic Parameter Registry. 2020. Official entry of EPSG:3857 spherical Mercator projection coordinate system (Date Accessed: Apr 16, 2020). http://www.epsg-registry.org.Google Scholar
- Microsoft Research. 2011. T-Drive trajectory data sample. https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/Google Scholar
- Y. Shen et al. 2019. Interpreting the latent space of GANs for semantic face editing. arXiv preprint arXiv:1907.10786 (2019).Google Scholar
- K. Sohn et al. 2015. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems. 3483--3491.Google Scholar
- C. Szegedy et al. 2017. Inception-v4, Inception-Resnet, and the Impact of Residual Connections on Learning. In 31st AAAI conference on AI.Google Scholar
- TechCrunch. 2018. Apple is rebuilding Maps from the ground up. https://techcrunch.com/2018/06/29/apple-is-rebuilding-maps-from-the-ground-up/Google Scholar
- L. Theis et al. 2015. A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844 (2015).Google Scholar
- Q. Xie et al. 2019. Unsupervised data augmentation. arXiv preprint arXiv:1904.12848 (2019).Google Scholar
- Y. Zhang et al. 2017. Adversarial feature matching for text generation. In Proceedings of the 34th International Conference on Machine Learning. JMLR, 4006--4015.Google Scholar
- J. Y. Zhu et al. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017).Google Scholar
Index Terms
- VAE-Info-cGAN: generating synthetic images by combining pixel-level and feature-level geospatial conditional inputs
Recommendations
Unsupervised meta-learning via spherical latent representations and dual VAE-GAN
AbstractUnsupervised learning and meta-learning share a common goal of enhancing learning efficiency compared to starting from scratch. However, meta-learning methods are predominantly employed in supervised settings, where acquiring labels for meta-...
Learning Disentangled Representations of Satellite Image Time Series
Machine Learning and Knowledge Discovery in DatabasesAbstractIn this paper, we investigate how to learn a suitable representation of satellite image time series in an unsupervised manner by leveraging large amounts of unlabeled data. Additionally, we aim to disentangle the representation of time series into ...
Towards controllable image descriptions with semi-supervised VAE
Highlights- Used SCVAE for generating image descriptions with controllable styles.
- SCVAE ...
AbstractImage captioning models successfully describe the visual contents of images using natural language. To generate more natural and diverse descriptions, a model must learn style-specific patterns and requires collecting style-specific ...
Comments