Skip to main content
Log in

3D scene reconstruction using a texture probabilistic grammar

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, texture probabilistic grammar is defined for the first time. We have developed an algorithm to obtain the 3D information in a 2D scene by training the texture probabilistic grammar from the prebuilt model library. The well-trained texture probabilistic grammar could also be applied to 3D reconstruction. Our detailed process contains: dividing the 2D scene into texture fragments; assigning the most suitable 3D object label to the 2D texture fragments; using our texture probabilistic grammar to predict 3D information of the texture fragments in 2D scene image; constructing the 3D model of the original 2D scene image. Through experiments, it is proved that the algorithm has a better effect on reconstruction of indoor scenes and building structures, and the algorithm is superior to the traditional reconstruction method based on point clouds. Different datasets and reconstructed objects are tested, which verifies the robustness of the algorithm. As a result, our algorithm is able to deal with the large numbers of scenes with similar semantics and it is also fast enough to deal with the online 3D reconstruction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Ahmed M T, Dailey M N, Landabaso J L, et al. (2010) Robust key frame extraction for 3D reconstruction from video streams. In: Proceedings of the Fifth International Conference on Computer Vision Theory and Applications (VISAPP 2010), pp 231–236

  2. Ankerst M, Kastenmüller G, Kriegel HP et al (1999) 3D shape histograms for similarity search and classification in spatial databases. Lect Notes Comput Sci 1651:207–226

    Article  Google Scholar 

  3. Audras C, Comport A, Meilland M, et al. (2011) Real-time dense appearance-based SLAM for RGB-D sensors. In: 2011 Australasian conference on robotics and automation, pp 2–2

  4. Bay H, Ess A, Tuytelaars T et al (2008) Speeded-up robust features. Computer Vision & Image Understanding 110(3):404–417

    Article  Google Scholar 

  5. Bengio Y (2009) Learning deep architectures for AI. Foundations & Trends® in Machine Learning 2(1):1–127

    Article  Google Scholar 

  6. Boscaini D, Masci J, Melzi S, Bronstein MM, Castellani U, Vandergheynst P (2015) Learning class-specific descriptors for deformable shapes using localized spectral convolutional networks. Computer Graphics Forum 34(5):13–23

    Article  Google Scholar 

  7. Campen M, Attene M, Kobbelt L (2012) A Practical Guide to Polygon Mesh Repairing. Eurographics (Tutorials)

  8. Chang A X, Funkhouser T, Guibas L, et al. (2015) Shapenet: An information-rich 3d model repository arXiv preprint arXiv:1512.03012

  9. Chaudhuri S, Koltun V (2010) Data-driven suggestions for creativity support in 3D modeling. ACM Trans Graph 29(6):81–95

    Article  Google Scholar 

  10. Cho S, Lee S (2009) Fast motion deblurring. ACM Trans Graph 28(5):1–8

    Article  Google Scholar 

  11. Couprie C, Farabet C, Najman L, et al (2013) Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572

  12. Fergus R, Singh B, Hertzmann A, Roweis ST, Freeman WT (2006) Removing camera shake from a single photograph. ACM Trans Graph 25(3):787–794

    Article  Google Scholar 

  13. Fraser CS (1997) Digital camera self-calibration. ISPRS J Photogramm Remote Sens 52(4):149–159

    Article  Google Scholar 

  14. Furukawa Y, Ponce J (2010) Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis & Machine Intelligence 32(8):1362–1376

    Article  Google Scholar 

  15. Guillou E, Meneveaux D, Maisel E et al (2010) Using vanishing points for camera calibration and coarse 3D reconstruction from a single image. Visual. Computer 16(7):396–410

    MATH  Google Scholar 

  16. Handa A, Patraucean V, Badrinarayanan V, et al (2015) Synthcam3d: Semantic understanding with synthetic indoor scenes arXiv preprint arXiv:1505.00171

  17. Heikkila J, Silven O (1997) A four-step camera calibration procedure with implicitimage correction. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 1106–1112

  18. Henry P, Krainin M, Herbst E (2012) RGB-D mapping: using Kinect-style depth cameras for dense 3D modeling of indoor environments. The International Journal of Robotics Research 31(5):647–663

    Article  Google Scholar 

  19. Hong W, Yang AY, Huang K, Ma Y (2004) On symmetry and multiple-view geometry: structure, pose, and calibration from a single image. Int J Comput Vis 60(3):241–265

    Article  Google Scholar 

  20. Horn BKP (1983) Extended Gaussian images. Proc IEEE 72(12):1671–1686

    Article  Google Scholar 

  21. Hu G, Huang S, Zhao L (2012) A robust rgb-d slam algorithm. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 1714–1719

  22. Jiang N, Tan P, Cheong L F (2009) Symmetric architecture modeling with a single image. In: ACM SIGGRAPH Asia, ACM, pp. 113

  23. Kalogerakis E, Hertzmann A, Singh K (2010) Learning 3D mesh segmentation and labeling. ACM Trans Graph 29(4):102

    Article  Google Scholar 

  24. Kato H, Ushiku Y, Harada T (2017) Neural 3D Mesh Renderer. arXiv preprint arXiv: 1711.07566

  25. Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques - adaptive computation and machine learning. MIT Press, Cambridge

    Google Scholar 

  26. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp. 1097–1105

  27. Li FF, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Transactions on Pattern Analysis & Machine Intelligence 28(4):594–611

    Article  Google Scholar 

  28. Li W, Mitra NJ et al (2013) Learning part-based templates from large collections of 3D shapes. ACM Trans Graph 32(4):70

    MATH  Google Scholar 

  29. Liebowitz D, Criminisi A, Zisserman A (1999) Creating architectural models from images. Computer Graphics Forum 18:39–50

    Article  Google Scholar 

  30. Lourakis MIA, Argyros AA (2009) SBA: A software package for generic sparse bundle adjustment. ACM Trans Math Softw 36(1):2

    Article  MathSciNet  Google Scholar 

  31. Lowe DG (2004) Distinctive image features from scale-invariant Keypoints. Int J Comput Vis 60(2):91–110

    Article  MathSciNet  Google Scholar 

  32. Orghidan R, Salvi J, Gordan M, Florea C, Batlle J (2014) Structured light self-calibration with vanishing points. Mach Vis Appl 25(2):489–500

    Article  Google Scholar 

  33. Rashidi A, Dai F, Brilakis I, Vela P (2013) Optimized selection of keyframes for monocular videogrammetric surveying of civil infrastructure. Adv Eng Inform 27(2):270–282

    Article  Google Scholar 

  34. Ren X, Bo L, Fox D (2012) RGB-D scene labeling: features and algorithms. In: 2012 I.E. Conference on Computer Vision and Pattern Recognition, IEEE, pp. 2759–2766

  35. Seo YH, Choi JS (2008) Optimal keyframe selection algorithm for three-dimensional reconstruction in uncalibrated multiple images. Opt Eng 47(5):525–534

    Article  Google Scholar 

  36. Shi Y, Long P, Xu K, Huang H, Xiong Y (2016) Data-driven contextual modeling for 3D scene understanding. Comput Graph 55:55–67

    Article  Google Scholar 

  37. Shilane P, Min P, Kazhdan M, et al. (2004) The Princeton shape benchmark. In: Shape Modeling International, IEEE, pp. 167–178

  38. Silberman N, Fergus R (2011) Indoor scene segmentation using a structured light sensor. In: IEEE International Conference on Computer Vision Workshops, IEEE, pp. 601–608

  39. Silberman N, Hoiem D, Kohli P, et al (2012) Indoor segmentation and support inference from RGBD images. In: European Conference on Computer Vision, Springer, pp. 746–760

  40. Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vis 80(2):189–210

    Article  Google Scholar 

  41. Socher R, Huval B, Bhat B, et al (2012) Convolutional-recursive deep learning for 3d object classification. In: Advances in neural information processing systems, pp. 656–664

  42. Song S, Lichtenberg S P, Xiao J Sun (2015) RGB-D: a rgb-d scene understanding benchmark suite. In: 2-15 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 567–576

  43. Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: Computer Vision and Pattern Recognition IEEE, pp 1–9

  44. Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence 30(11):1958–1970

    Article  Google Scholar 

  45. Triggs B, Mclauchlan P F, Hartley R I, et al (1999) Bundle adjustment — a modern synthesis. In: 1999 international workshop on vision algorithms: theory and practice, springer, pp 298–372

    Chapter  Google Scholar 

  46. Vanegas C A, Aliaga DG, Benes B (2010) Building reconstruction using Manhattan-world grammars. In: Computer Vision and Pattern Recognition (CVPR), IEEE, pp 358–365

  47. Wilczkowiak M, Sturm P, Boyer E (2005) Using geometric constraints through parallelepipeds for calibration and 3D modeling. IEEE Transactions on Pattern Analysis & Machine Intelligence 27(2):194–207

    Article  Google Scholar 

  48. Wu Z, Song S, Khosla A, et al (2015) 3D ShapeNets: a deep representation for volumetric shapes. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1912–1920

  49. Wu J, Xue T, Lim JJ, et al (2016) Single image 3D interpreter network. In: European Conference on Computer Vision, Springer, pp. 365–382

  50. Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014) A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Processing Letters 21(5):573–576

    Article  Google Scholar 

  51. Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F (2014) Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Transactions on Circuits & Systems for Video Technology 24(12):2077–2089

    Article  Google Scholar 

  52. Yan C, Xie H, Yang D et al (2017) Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans Intell Transp Syst 99:1–12

    Google Scholar 

  53. Yan C, Xie H, Liu S et al (2017) Effective Uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans Intell Transp Syst 99:1–10

    Google Scholar 

  54. Yang Z, Hoseinzadeh M. (2017) AutoTiering: Automatic Data Placement Manager in Multi-Tier All-Flash Datacenter, IEEE International Performance Computing and Communications Conference (IPCCC)

  55. Yu K, Ng A (2010) Feature learning for image classification. ECCV (tutorials)

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China (No. 61502185) and the Fundamental Research Funds for the Central Universities (No: 2017KFYXJJ071).

The authors would like to thank Shuang Liu for helping us to collect experimental material.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuke Sun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, D., Hu, D., Sun, Y. et al. 3D scene reconstruction using a texture probabilistic grammar. Multimed Tools Appl 77, 28417–28440 (2018). https://doi.org/10.1007/s11042-018-6052-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6052-z

Keywords

Navigation