skip to main content
10.1145/3423457.3429361acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

VAE-Info-cGAN: generating synthetic images by combining pixel-level and feature-level geospatial conditional inputs

Published:03 November 2020Publication History

ABSTRACT

Training robust supervised deep learning models for many geospatial applications of computer vision is difficult due to dearth of class-balanced and diverse training data. Conversely, obtaining enough training data for many applications is financially prohibitive or may be infeasible, especially when the application involves modeling rare or extreme events. Synthetically generating data (and labels) using a generative model that can sample from a target distribution and exploit the multi-scale nature of images can be an inexpensive solution to address scarcity of labeled data. Towards this goal, we present a deep conditional generative model, called VAE-Info-cGAN, that combines a Variational Autoencoder (VAE) with a conditional Information Maximizing Generative Adversarial Network (InfoGAN), for synthesizing semantically rich images simultaneously conditioned on a pixel-level condition (PLC) and a macroscopic feature-level condition (FLC). Dimensionally, the PLC can only vary in the channel dimension from the synthesized image and is meant to be a task-specific input. The FLC is modeled as an attribute vector, a, in the latent space of the generated image which controls the contributions of various characteristic attributes germane to the target distribution. During generation, a is sampled from U[0, 1], while it is learned directly from the ground truth during training. An interpretation of a to systematically generate synthetic images by varying a chosen binary macroscopic feature is explored by training a linear binary classifier in the latent space. Experiments on a GPS trajectories dataset show that the proposed model can accurately generate various forms of spatio-temporal aggregates across different geographic locations while conditioned only on a raster representation of the road network. The primary intended application of the VAE-Info-cGAN is synthetic data (and label) generation for targeted data augmentation for computer vision-based modeling of problems relevant to geospatial analysis and remote sensing.

References

  1. C. Bowles et al. 2018. GAN augmentation: Augmenting training data using Generative Adversarial Networks. arXiv preprint arXiv:1810.10863 (2018).Google ScholarGoogle Scholar
  2. A. Brock et al. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).Google ScholarGoogle Scholar
  3. C. P. Burgess et al. 2018. Understanding disentangling in β-VAE. arXiv preprint arXiv:1804.03599 (2018).Google ScholarGoogle Scholar
  4. C. Chan et al. 2019. Everybody dance now. In Proceedings of the IEEE International Conference on Computer Vision. 5933--5942.Google ScholarGoogle ScholarCross RefCross Ref
  5. X. Chen et al. 2016. InfoGAN: Interpretable representation learning by information maximizing Generative Adversarial Nets.Google ScholarGoogle Scholar
  6. J. Engel et al. 2019. GANSynth: Adversarial neural audio synthesis. arXiv preprint arXiv:1902.08710.Google ScholarGoogle Scholar
  7. Y. Gal et al. 2017. Deep Bayesian active learning with image data. arXiv preprint arXiv:1703.02910.Google ScholarGoogle Scholar
  8. S. Ganguli, P. Garzon, and N. Glaser. 2019. GeoGAN: A conditional GAN with reconstruction and style loss to generate standard layer of maps from satellite images. arXiv preprint arXiv:1902.05611 (2019).Google ScholarGoogle Scholar
  9. T. Kaneko et al. 2017. Generative attribute controller with conditional filtered Generative Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6089--6098.Google ScholarGoogle ScholarCross RefCross Ref
  10. T. Karras et al. 2018. A style-based generator architecture for Generative Adversarial Networks. arXiv preprint arXiv:1812.04948.Google ScholarGoogle Scholar
  11. A. Kumar et al. 2017. Variational inference of disentangled latent concepts from unlabeled observations. arXiv preprint arXiv:1711.00848 (2017).Google ScholarGoogle Scholar
  12. A. B. L. Larsen et al. 2015. Autoencoding beyond pixels using a learned similarity metric. arXiv preprint arXiv:1512.09300 (2015).Google ScholarGoogle Scholar
  13. K. Lee and D. Moloney. 2017. Evaluation of synthetic data for deep learning stereo depth algorithms on embedded platforms. In 4th International Conference on Systems and Informatics (ICSAI). 170--176.Google ScholarGoogle Scholar
  14. Liqian Ma et al. 2019. A novel bilevel paradigm for image-to-image translation. arXiv preprint arXiv:1904.09028 (2019).Google ScholarGoogle Scholar
  15. M. Mirza and S. Osindero. 2014. Conditional Generative Adversarial Nets. arXiv preprint arXiv:1411.1784 (2014).Google ScholarGoogle Scholar
  16. L. Moreira-Matias et al. 2013. Predicting taxi-passenger demand using streaming data. IEEE Transactions on Intelligent Transportation Systems 14 (2013), 1393--1402.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. I Nikolenko. 2019. Synthetic data for deep learning. arXiv preprint arXiv:1909.11512 (2019).Google ScholarGoogle Scholar
  18. A. Odena et al. 2016. Deconvolution and checkerboard artifacts. Distill (2016). http://distill.pub/2016/deconv-checkerboardGoogle ScholarGoogle Scholar
  19. A. Perez et al. 2019. Semi-supervised multitask learning on multispectral satellite images using Wasserstein Generative Adversarial Networks (GANs) for predicting poverty. arXiv preprint arXiv:1902.11110 (2019).Google ScholarGoogle Scholar
  20. EPSG Geodetic Parameter Registry. 2020. Official entry of EPSG:3857 spherical Mercator projection coordinate system (Date Accessed: Apr 16, 2020). http://www.epsg-registry.org.Google ScholarGoogle Scholar
  21. Microsoft Research. 2011. T-Drive trajectory data sample. https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/Google ScholarGoogle Scholar
  22. Y. Shen et al. 2019. Interpreting the latent space of GANs for semantic face editing. arXiv preprint arXiv:1907.10786 (2019).Google ScholarGoogle Scholar
  23. K. Sohn et al. 2015. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems. 3483--3491.Google ScholarGoogle Scholar
  24. C. Szegedy et al. 2017. Inception-v4, Inception-Resnet, and the Impact of Residual Connections on Learning. In 31st AAAI conference on AI.Google ScholarGoogle Scholar
  25. TechCrunch. 2018. Apple is rebuilding Maps from the ground up. https://techcrunch.com/2018/06/29/apple-is-rebuilding-maps-from-the-ground-up/Google ScholarGoogle Scholar
  26. L. Theis et al. 2015. A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844 (2015).Google ScholarGoogle Scholar
  27. Q. Xie et al. 2019. Unsupervised data augmentation. arXiv preprint arXiv:1904.12848 (2019).Google ScholarGoogle Scholar
  28. Y. Zhang et al. 2017. Adversarial feature matching for text generation. In Proceedings of the 34th International Conference on Machine Learning. JMLR, 4006--4015.Google ScholarGoogle Scholar
  29. J. Y. Zhu et al. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017).Google ScholarGoogle Scholar

Index Terms

  1. VAE-Info-cGAN: generating synthetic images by combining pixel-level and feature-level geospatial conditional inputs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        IWCTS '20: Proceedings of the 13th ACM SIGSPATIAL International Workshop on Computational Transportation Science
        November 2020
        75 pages
        ISBN:9781450381666
        DOI:10.1145/3423457

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 November 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        IWCTS '20 Paper Acceptance Rate9of11submissions,82%Overall Acceptance Rate42of57submissions,74%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader