Photo-realistic Neural Domain Randomization

Zakharov, Sergey; Ambruș, Rareș; Guizilini, Vitor; Kehl, Wadim; Gaidon, Adrien

doi:10.1007/978-3-031-19806-9_18

Sergey Zakharov¹²,
Rareș Ambruș¹²,
Vitor Guizilini¹²,
Wadim Kehl¹³ &
…
Adrien Gaidon¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13685))

Included in the following conference series:

European Conference on Computer Vision

1972 Accesses
5 Citations

Abstract

Synthetic data is a scalable alternative to manual supervision, but it requires overcoming the sim-to-real domain gap. This discrepancy between virtual and real worlds is addressed by two seemingly opposed approaches: improving the realism of simulation or foregoing realism entirely via domain randomization. In this paper, we show that the recent progress in neural rendering enables a new unified approach we call Photo-realistic Neural Domain Randomization (PNDR). We propose to learn a composition of neural networks that acts as a physics-based ray tracer generating high-quality renderings from scene geometry alone. Our approach is modular, composed of different neural networks for materials, lighting, and rendering, thus enabling randomization of different key image generation components in a differentiable pipeline. Once trained, our method can be combined with other methods and used to generate photo-realistic image augmentations online and significantly more efficiently than via traditional ray-tracing. We demonstrate the usefulness of PNDR through two downstream tasks: 6D object detection and monocular depth estimation. Our experiments show that training with PNDR enables generalization to novel scenes and significantly outperforms the state of the art in terms of real-world transfer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://ambientcg.com.

References

Alghonaim, R., Johns, E.: Benchmarking domain randomisation for visual sim-to-real transfer. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 12802–12808. IEEE (2021)
Google Scholar
Abu Alhaija, H., Mustikovela, S.K., Geiger, A., Rother, C.: Geometric image synthesis. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11366, pp. 85–100. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20876-9_6
Chapter Google Scholar
Aliev, K.-A., Sevastopolsky, A., Kolos, M., Ulyanov, D., Lempitsky, V.: Neural point-based graphics. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 696–712. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_42
Chapter Google Scholar
Boss, M., Braun, R., Jampani, V., Barron, J.T., Liu, C., Lensch, H.: Nerd: neural reflectance decomposition from image collections. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12684–12694 (2021)
Google Scholar
Boss, M., Jampani, V., Braun, R., Liu, C., Barron, J.T., Lensch, H.: Neural-pil: neural pre-integrated lighting for reflectance decomposition. arXiv preprint arXiv:2110.14373 (2021)
Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3722–3731 (2017)
Google Scholar
Csurka, G.: Domain adaptation for visual applications: a comprehensive survey. arXiv preprint arXiv:1702.05374 (2017)
Deering, M., Winner, S., Schediwy, B., Duffy, C., Hunt, N.: The triangle processor and normal vector shader: a vlsi system for high performance graphics. Acm siggraph Comput. Graph. 22(4), 21–30 (1988)
Article Google Scholar
Denninger, M., et al.: Blenderproc. arXiv preprint arXiv:1911.01911 (2019)
Devaranjan, J., Kar, A., Fidler, S.: Meta-Sim2: unsupervised learning of scene structure for synthetic data generation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 715–733. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_42
Chapter Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)
Google Scholar
Engelmann, F., Rematas, K., Leibe, B., Ferrari, V.: From points to multi-object 3d reconstruction. In: CVPR, pp. 4588–4597 (2021)
Google Scholar
Ganin, Y., et al.: Domain-adversarial training of neural networks. In: JMLR (2016)
Google Scholar
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838 (2019)
Google Scholar
Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., Gaidon, A.: 3d packing for self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2485–2494 (2020)
Google Scholar
Guizilini, V., Li, J., Ambrus, R., Gaidon, A.: Geometric unsupervised domain adaptation for semantic segmentation. arXiv preprint arXiv:2103.16694 (2021)
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In: ACCV (2012)
Google Scholar
Hinterstoisser, S., Pauly, O., Heibel, H., Martina, M., Bokeloh, M.: An annotation saved is an annotation earned: using fully synthetic training for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Google Scholar
Hodan, T., Barath, D., Matas, J.: Epos: estimating 6d pose of objects with symmetries. In: CVPR, pp. 11703–11712 (2020)
Google Scholar
Hodaň, T., et al.: Photorealistic image synthesis for object instance detection. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 66–70. IEEE (2019)
Google Scholar
Hoffman, N.: Crafting physically motivated shading models for game development. part of “Physically Based Shading Models in Film and Game Production”. In: SIGGRAPH (2010)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
Google Scholar
Hosseini Jafari, O., Mustikovela, S.K., Pertsch, K., Brachmann, E., Rother, C.: iPose: instance-aware 6d pose estimation of partly occluded objects. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 477–492. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_30
Chapter Google Scholar
Janner, M., Wu, J., Kulkarni, T.D., Yildirim, I., Tenenbaum, J.B.: Self-supervised intrinsic image decomposition. arXiv preprint arXiv:1711.03678 (2017)
Kar, A., et al.: Meta-sim: Learning to generate synthetic datasets. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4551–4560 (2019)
Google Scholar
Kaskman, R., Zakharov, S., Shugurov, I., Ilic, S.: Homebreweddb: Rgb-d dataset for 6d pose estimation of 3d objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019)
Google Scholar
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: Ssd-6d: making rgb-based 3d detection and 6d pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1521–1529 (2017)
Google Scholar
Labbé, Y., Carpentier, J., Aubry, M., Sivic, J.: CosyPose: consistent multi-view multi-object 6d pose estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 574–591. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_34
Chapter Google Scholar
Lambert, J.H.: Photometria sive de mensura et gradibus luminis, colorum et umbrae. sumptibus vidvae E. Klett, typis CP Detleffsen (1760)
Google Scholar
Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: ECCV (2018)
Google Scholar
Lee, J.H., Han, M.K., Ko, D.W., Suh, I.H.: From big to small: multi-scale local planar guidance for monocular depth estimation (2019)
Google Scholar
Li, Z., Wang, G., Ji, X.: Cdpn: coordinates-based disentangled pose network for real-time RGB-based 6-dof object pose estimation. In: ICCV, pp. 7678–7687 (2019)
Google Scholar
Liu, X., et al.: Adversarial unsupervised domain adaptation with conditional and label shift: Infer, align and iterate. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10367–10376 (2021)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chapter Google Scholar
Mustikovela, S.K., et al.: Self-supervised object detection via generative image synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8609–8618 (2021)
Google Scholar
Nalbach, O., Arabadzhiyska, E., Mehta, D., Seidel, H.P., Ritschel, T.: Deep shading: convolutional neural networks for screen space shading. In: Computer Graphics Forum, vol. 36, pp. 65–78. Wiley Online Library (2017)
Google Scholar
Niemeyer, M., Geiger, A.: Giraffe: representing scenes as compositional generative neural feature fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11453–11464 (2021)
Google Scholar
Oren, M., Nayar, S.K.: Generalization of lambert’s reflectance model. In: Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques, pp. 239–246 (1994)
Google Scholar
Park, K., Patten, T., Vincze, M.: Pix2pose: pixel-wise coordinate regression of objects for 6d pose estimation. In: ICCV, pp. 7668–7677 (2019)
Google Scholar
Park, T., Efros, A.A., Zhang, R., Zhu, J.-Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 319–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19
Chapter Google Scholar
Peng, S., Liu, Y., Huang, Q., Zhou, X., Bao, H.: Pvnet: pixel-wise voting network for 6dof pose estimation. In: CVPR, pp. 4561–4570 (2019)
Google Scholar
Planche, B., Zakharov, S., Wu, Z., Hutter, A., Kosch, H., Ilic, S.: Seeing beyond appearance-mapping real images into geometrical domains for unsupervised cad-based recognition. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2579–2586. IEEE (2019)
Google Scholar
Prakash, A., et al.: Structured domain randomization: bridging the reality gap by context-aware synthetic data. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 7249–7255. IEEE (2019)
Google Scholar
Prakash, A., Debnath, S., Lafleche, J.F., Cameracci, E., Birchfield, S., Law, M.T., et al.: Self-supervised real-to-sim scene generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16044–16054 (2021)
Google Scholar
Richter, S.R., AlHaija, H.A., Koltun, V.: Enhancing photorealism enhancement. arXiv preprint arXiv:2105.04619 (2021)
Rückert, D., Franke, L., Stamminger, M.: Adop: approximate differentiable one-pixel point rendering. arXiv preprint arXiv:2110.06635 (2021)
Sitzmann, V., Martel, J., Bergman, A., Lindell, D., Wetzstein, G.: Implicit neural representations with periodic activation functions. In: NeurIPS (2020)
Google Scholar
Srinivasan, P.P., Deng, B., Zhang, X., Tancik, M., Mildenhall, B., Barron, J.T.: Nerv: neural reflectance and visibility fields for relighting and view synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7495–7504 (2021)
Google Scholar
Tewari, A., et al.: Advances in neural rendering. arXiv preprint arXiv:2111.05849 (2021)
Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: Image synthesis using neural textures. ACM Trans. Graph. (TOG) 38(4), 1–12 (2019)
Article Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: IROS (2017)
Google Scholar
Tremblay, J., et al.: Training deep networks with synthetic data: bridging the reality gap by domain randomization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 969–977 (2018)
Google Scholar
Volpi, R., Morerio, P., Savarese, S., Murino, V.: Adversarial feature augmentation for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5495–5504 (2018)
Google Scholar
Walter, B., Marschner, S.R., Li, H., Torrance, K.E.: Microfacet models for refraction through rough surfaces. Rendering Tech. 2007, 18th (2007)
Google Scholar
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6d object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)
Google Scholar
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Zakharov, S., et al.: Single-shot scene reconstruction. In: 5th Annual Conference on Robot Learning (2021)
Google Scholar
Zakharov, S., Kehl, W., Ilic, S.: Deceptionnet: network-driven domain randomization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 532–541 (2019)
Google Scholar
Zakharov, S., Planche, B., Wu, Z., Hutter, A., Kosch, H., Ilic, S.: Keep it unreal: bridging the realism gap for 2.5d recognition with geometry priors only. In: 3DV (2018)
Google Scholar
Zakharov, S., Shugurov, I., Ilic, S.: Dpod: 6d pose object detector and refiner. In: ICCV (2019)
Google Scholar
Zhang, H., Dana, K.: Multi-style generative network for real-time transfer. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Google Scholar
Zheng, Z., Yang, Y.: Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation. Int. J. Comput. Vision 129(4), 1106–1120 (2021)
Article Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Google Scholar
Zou, Y., Yu, Z., Liu, X., Kumar, B., Wang, J.: Confidence regularized self-training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5982–5991 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Toyota Research Institute, Los Altos, CA, USA
Sergey Zakharov, Rareș Ambruș, Vitor Guizilini & Adrien Gaidon
Woven Planet, Tokyo, Japan
Wadim Kehl

Authors

Sergey Zakharov
View author publications
You can also search for this author in PubMed Google Scholar
Rareș Ambruș
View author publications
You can also search for this author in PubMed Google Scholar
Vitor Guizilini
View author publications
You can also search for this author in PubMed Google Scholar
Wadim Kehl
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Gaidon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergey Zakharov .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 13287 KB)

Supplementary material 1 (pdf 3044 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zakharov, S., Ambruș, R., Guizilini, V., Kehl, W., Gaidon, A. (2022). Photo-realistic Neural Domain Randomization. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13685. Springer, Cham. https://doi.org/10.1007/978-3-031-19806-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-19806-9_18
Published: 20 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19805-2
Online ISBN: 978-3-031-19806-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics