research-article

VAE-Info-cGAN: generating synthetic images by combining pixel-level and feature-level geospatial conditional inputs

Authors:
Xuerong Xiao

Stanford University

Stanford University
View Profile

,
Swetava Ganguli

Apple

Apple
View Profile

,
Vipul Pandey

Apple

Apple
View Profile

IWCTS '20: Proceedings of the 13th ACM SIGSPATIAL International Workshop on Computational Transportation ScienceNovember 2020Article No.: 1Pages 1–10https://doi.org/10.1145/3423457.3429361

Published:03 November 2020Publication History

IWCTS '20: Proceedings of the 13th ACM SIGSPATIAL International Workshop on Computational Transportation Science

Pages 1–10

ABSTRACT

Training robust supervised deep learning models for many geospatial applications of computer vision is difficult due to dearth of class-balanced and diverse training data. Conversely, obtaining enough training data for many applications is financially prohibitive or may be infeasible, especially when the application involves modeling rare or extreme events. Synthetically generating data (and labels) using a generative model that can sample from a target distribution and exploit the multi-scale nature of images can be an inexpensive solution to address scarcity of labeled data. Towards this goal, we present a deep conditional generative model, called VAE-Info-cGAN, that combines a Variational Autoencoder (VAE) with a conditional Information Maximizing Generative Adversarial Network (InfoGAN), for synthesizing semantically rich images simultaneously conditioned on a pixel-level condition (PLC) and a macroscopic feature-level condition (FLC). Dimensionally, the PLC can only vary in the channel dimension from the synthesized image and is meant to be a task-specific input. The FLC is modeled as an attribute vector, a, in the latent space of the generated image which controls the contributions of various characteristic attributes germane to the target distribution. During generation, a is sampled from U[0, 1], while it is learned directly from the ground truth during training. An interpretation of a to systematically generate synthetic images by varying a chosen binary macroscopic feature is explored by training a linear binary classifier in the latent space. Experiments on a GPS trajectories dataset show that the proposed model can accurately generate various forms of spatio-temporal aggregates across different geographic locations while conditioned only on a raster representation of the road network. The primary intended application of the VAE-Info-cGAN is synthetic data (and label) generation for targeted data augmentation for computer vision-based modeling of problems relevant to geospatial analysis and remote sensing.

References

C. Bowles et al. 2018. GAN augmentation: Augmenting training data using Generative Adversarial Networks. arXiv preprint arXiv:1810.10863 (2018).Google Scholar
A. Brock et al. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).Google Scholar
C. P. Burgess et al. 2018. Understanding disentangling in β-VAE. arXiv preprint arXiv:1804.03599 (2018).Google Scholar
C. Chan et al. 2019. Everybody dance now. In Proceedings of the IEEE International Conference on Computer Vision. 5933--5942.Google ScholarCross Ref
X. Chen et al. 2016. InfoGAN: Interpretable representation learning by information maximizing Generative Adversarial Nets.Google Scholar
J. Engel et al. 2019. GANSynth: Adversarial neural audio synthesis. arXiv preprint arXiv:1902.08710.Google Scholar
Y. Gal et al. 2017. Deep Bayesian active learning with image data. arXiv preprint arXiv:1703.02910.Google Scholar
S. Ganguli, P. Garzon, and N. Glaser. 2019. GeoGAN: A conditional GAN with reconstruction and style loss to generate standard layer of maps from satellite images. arXiv preprint arXiv:1902.05611 (2019).Google Scholar
T. Kaneko et al. 2017. Generative attribute controller with conditional filtered Generative Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6089--6098.Google ScholarCross Ref
T. Karras et al. 2018. A style-based generator architecture for Generative Adversarial Networks. arXiv preprint arXiv:1812.04948.Google Scholar
A. Kumar et al. 2017. Variational inference of disentangled latent concepts from unlabeled observations. arXiv preprint arXiv:1711.00848 (2017).Google Scholar
A. B. L. Larsen et al. 2015. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300 (2015).Google Scholar
K. Lee and D. Moloney. 2017. Evaluation of synthetic data for deep learning stereo depth algorithms on embedded platforms. In 4th International Conference on Systems and Informatics (ICSAI). 170--176.Google Scholar
Liqian Ma et al. 2019. A novel bilevel paradigm for image-to-image translation. arXiv preprint arXiv:1904.09028 (2019).Google Scholar
M. Mirza and S. Osindero. 2014. Conditional Generative Adversarial Nets. arXiv preprint arXiv:1411.1784 (2014).Google Scholar
L. Moreira-Matias et al. 2013. Predicting taxi-passenger demand using streaming data. IEEE Transactions on Intelligent Transportation Systems 14 (2013), 1393--1402.Google ScholarDigital Library
S. I Nikolenko. 2019. Synthetic data for deep learning. arXiv preprint arXiv:1909.11512 (2019).Google Scholar
A. Odena et al. 2016. Deconvolution and checkerboard artifacts. Distill (2016). http://distill.pub/2016/deconv-checkerboardGoogle Scholar
A. Perez et al. 2019. Semi-supervised multitask learning on multispectral satellite images using Wasserstein Generative Adversarial Networks (GANs) for predicting poverty. arXiv preprint arXiv:1902.11110 (2019).Google Scholar
EPSG Geodetic Parameter Registry. 2020. Official entry of EPSG:3857 spherical Mercator projection coordinate system (Date Accessed: Apr 16, 2020). http://www.epsg-registry.org.Google Scholar
Microsoft Research. 2011. T-Drive trajectory data sample. https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/Google Scholar
Y. Shen et al. 2019. Interpreting the latent space of GANs for semantic face editing. arXiv preprint arXiv:1907.10786 (2019).Google Scholar
K. Sohn et al. 2015. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems. 3483--3491.Google Scholar
C. Szegedy et al. 2017. Inception-v4, Inception-Resnet, and the Impact of Residual Connections on Learning. In 31st AAAI conference on AI.Google Scholar
TechCrunch. 2018. Apple is rebuilding Maps from the ground up. https://techcrunch.com/2018/06/29/apple-is-rebuilding-maps-from-the-ground-up/Google Scholar
L. Theis et al. 2015. A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844 (2015).Google Scholar
Q. Xie et al. 2019. Unsupervised data augmentation. arXiv preprint arXiv:1904.12848 (2019).Google Scholar
Y. Zhang et al. 2017. Adversarial feature matching for text generation. In Proceedings of the 34th International Conference on Machine Learning. JMLR, 4006--4015.Google Scholar
J. Y. Zhu et al. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017).Google Scholar

Index Terms

VAE-Info-cGAN: generating synthetic images by combining pixel-level and feature-level geospatial conditional inputs
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
  2. Machine learning
    1. Machine learning approaches
      1. Learning latent representations

Recommendations

Unsupervised meta-learning via spherical latent representations and dual VAE-GAN
Abstract
Unsupervised learning and meta-learning share a common goal of enhancing learning efficiency compared to starting from scratch. However, meta-learning methods are predominantly employed in supervised settings, where acquiring labels for meta-...
Read More
Learning Disentangled Representations of Satellite Image Time Series
Machine Learning and Knowledge Discovery in Databases
Abstract
In this paper, we investigate how to learn a suitable representation of satellite image time series in an unsupervised manner by leveraging large amounts of unlabeled data. Additionally, we aim to disentangle the representation of time series into ...
Read More
Towards controllable image descriptions with semi-supervised VAE
Highlights
- Used SCVAE for generating image descriptions with controllable styles.
- SCVAE ...
Abstract
Image captioning models successfully describe the visual contents of images using natural language. To generate more natural and diverse descriptions, a model must learn style-specific patterns and requires collecting style-specific ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IWCTS '20: Proceedings of the 13th ACM SIGSPATIAL International Workshop on Computational Transportation Science
November 2020
75 pages
ISBN:9781450381666
DOI:10.1145/3423457
Editors:
Anne Berres
Oak Ridge National Laboratory, TN
,
Kuldeep Kurte
Oak Ridge National Laboratory, TN
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GAN
VAE
deep conditional generative models
synthetic data
Qualifiers
- research-article
Conference

Acceptance Rates
IWCTS '20 Paper Acceptance Rate9of11submissions,82%Overall Acceptance Rate42of57submissions,74%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 316
  Total Downloads
- Downloads (Last 12 months)82
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

VAE-Info-cGAN: generating synthetic images by combining pixel-level and feature-level geospatial conditional inputs

IWCTS '20: Proceedings of the 13th ACM SIGSPATIAL International Workshop on Computational Transportation Science

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised meta-learning via spherical latent representations and dual VAE-GAN

Learning Disentangled Representations of Satellite Image Time Series

Towards controllable image descriptions with semi-supervised VAE

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

VAE-Info-cGAN: generating synthetic images by combining pixel-level and feature-level geospatial conditional inputs

IWCTS '20: Proceedings of the 13th ACM SIGSPATIAL International Workshop on Computational Transportation Science

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unsupervised meta-learning via spherical latent representations and dual VAE-GAN

Learning Disentangled Representations of Satellite Image Time Series

Towards controllable image descriptions with semi-supervised VAE

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media