Abstract
In this paper, texture probabilistic grammar is defined for the first time. We have developed an algorithm to obtain the 3D information in a 2D scene by training the texture probabilistic grammar from the prebuilt model library. The well-trained texture probabilistic grammar could also be applied to 3D reconstruction. Our detailed process contains: dividing the 2D scene into texture fragments; assigning the most suitable 3D object label to the 2D texture fragments; using our texture probabilistic grammar to predict 3D information of the texture fragments in 2D scene image; constructing the 3D model of the original 2D scene image. Through experiments, it is proved that the algorithm has a better effect on reconstruction of indoor scenes and building structures, and the algorithm is superior to the traditional reconstruction method based on point clouds. Different datasets and reconstructed objects are tested, which verifies the robustness of the algorithm. As a result, our algorithm is able to deal with the large numbers of scenes with similar semantics and it is also fast enough to deal with the online 3D reconstruction.
Similar content being viewed by others
References
Ahmed M T, Dailey M N, Landabaso J L, et al. (2010) Robust key frame extraction for 3D reconstruction from video streams. In: Proceedings of the Fifth International Conference on Computer Vision Theory and Applications (VISAPP 2010), pp 231–236
Ankerst M, Kastenmüller G, Kriegel HP et al (1999) 3D shape histograms for similarity search and classification in spatial databases. Lect Notes Comput Sci 1651:207–226
Audras C, Comport A, Meilland M, et al. (2011) Real-time dense appearance-based SLAM for RGB-D sensors. In: 2011 Australasian conference on robotics and automation, pp 2–2
Bay H, Ess A, Tuytelaars T et al (2008) Speeded-up robust features. Computer Vision & Image Understanding 110(3):404–417
Bengio Y (2009) Learning deep architectures for AI. Foundations & Trends® in Machine Learning 2(1):1–127
Boscaini D, Masci J, Melzi S, Bronstein MM, Castellani U, Vandergheynst P (2015) Learning class-specific descriptors for deformable shapes using localized spectral convolutional networks. Computer Graphics Forum 34(5):13–23
Campen M, Attene M, Kobbelt L (2012) A Practical Guide to Polygon Mesh Repairing. Eurographics (Tutorials)
Chang A X, Funkhouser T, Guibas L, et al. (2015) Shapenet: An information-rich 3d model repository arXiv preprint arXiv:1512.03012
Chaudhuri S, Koltun V (2010) Data-driven suggestions for creativity support in 3D modeling. ACM Trans Graph 29(6):81–95
Cho S, Lee S (2009) Fast motion deblurring. ACM Trans Graph 28(5):1–8
Couprie C, Farabet C, Najman L, et al (2013) Indoor semantic segmentation using depth information. arXiv preprint arXiv:1301.3572
Fergus R, Singh B, Hertzmann A, Roweis ST, Freeman WT (2006) Removing camera shake from a single photograph. ACM Trans Graph 25(3):787–794
Fraser CS (1997) Digital camera self-calibration. ISPRS J Photogramm Remote Sens 52(4):149–159
Furukawa Y, Ponce J (2010) Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis & Machine Intelligence 32(8):1362–1376
Guillou E, Meneveaux D, Maisel E et al (2010) Using vanishing points for camera calibration and coarse 3D reconstruction from a single image. Visual. Computer 16(7):396–410
Handa A, Patraucean V, Badrinarayanan V, et al (2015) Synthcam3d: Semantic understanding with synthetic indoor scenes arXiv preprint arXiv:1505.00171
Heikkila J, Silven O (1997) A four-step camera calibration procedure with implicitimage correction. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 1106–1112
Henry P, Krainin M, Herbst E (2012) RGB-D mapping: using Kinect-style depth cameras for dense 3D modeling of indoor environments. The International Journal of Robotics Research 31(5):647–663
Hong W, Yang AY, Huang K, Ma Y (2004) On symmetry and multiple-view geometry: structure, pose, and calibration from a single image. Int J Comput Vis 60(3):241–265
Horn BKP (1983) Extended Gaussian images. Proc IEEE 72(12):1671–1686
Hu G, Huang S, Zhao L (2012) A robust rgb-d slam algorithm. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, pp 1714–1719
Jiang N, Tan P, Cheong L F (2009) Symmetric architecture modeling with a single image. In: ACM SIGGRAPH Asia, ACM, pp. 113
Kalogerakis E, Hertzmann A, Singh K (2010) Learning 3D mesh segmentation and labeling. ACM Trans Graph 29(4):102
Kato H, Ushiku Y, Harada T (2017) Neural 3D Mesh Renderer. arXiv preprint arXiv: 1711.07566
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques - adaptive computation and machine learning. MIT Press, Cambridge
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp. 1097–1105
Li FF, Fergus R, Perona P (2006) One-shot learning of object categories. IEEE Transactions on Pattern Analysis & Machine Intelligence 28(4):594–611
Li W, Mitra NJ et al (2013) Learning part-based templates from large collections of 3D shapes. ACM Trans Graph 32(4):70
Liebowitz D, Criminisi A, Zisserman A (1999) Creating architectural models from images. Computer Graphics Forum 18:39–50
Lourakis MIA, Argyros AA (2009) SBA: A software package for generic sparse bundle adjustment. ACM Trans Math Softw 36(1):2
Lowe DG (2004) Distinctive image features from scale-invariant Keypoints. Int J Comput Vis 60(2):91–110
Orghidan R, Salvi J, Gordan M, Florea C, Batlle J (2014) Structured light self-calibration with vanishing points. Mach Vis Appl 25(2):489–500
Rashidi A, Dai F, Brilakis I, Vela P (2013) Optimized selection of keyframes for monocular videogrammetric surveying of civil infrastructure. Adv Eng Inform 27(2):270–282
Ren X, Bo L, Fox D (2012) RGB-D scene labeling: features and algorithms. In: 2012 I.E. Conference on Computer Vision and Pattern Recognition, IEEE, pp. 2759–2766
Seo YH, Choi JS (2008) Optimal keyframe selection algorithm for three-dimensional reconstruction in uncalibrated multiple images. Opt Eng 47(5):525–534
Shi Y, Long P, Xu K, Huang H, Xiong Y (2016) Data-driven contextual modeling for 3D scene understanding. Comput Graph 55:55–67
Shilane P, Min P, Kazhdan M, et al. (2004) The Princeton shape benchmark. In: Shape Modeling International, IEEE, pp. 167–178
Silberman N, Fergus R (2011) Indoor scene segmentation using a structured light sensor. In: IEEE International Conference on Computer Vision Workshops, IEEE, pp. 601–608
Silberman N, Hoiem D, Kohli P, et al (2012) Indoor segmentation and support inference from RGBD images. In: European Conference on Computer Vision, Springer, pp. 746–760
Snavely N, Seitz SM, Szeliski R (2008) Modeling the world from internet photo collections. Int J Comput Vis 80(2):189–210
Socher R, Huval B, Bhat B, et al (2012) Convolutional-recursive deep learning for 3d object classification. In: Advances in neural information processing systems, pp. 656–664
Song S, Lichtenberg S P, Xiao J Sun (2015) RGB-D: a rgb-d scene understanding benchmark suite. In: 2-15 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 567–576
Szegedy C, Liu W, Jia Y, et al (2015) Going deeper with convolutions. In: Computer Vision and Pattern Recognition IEEE, pp 1–9
Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence 30(11):1958–1970
Triggs B, Mclauchlan P F, Hartley R I, et al (1999) Bundle adjustment — a modern synthesis. In: 1999 international workshop on vision algorithms: theory and practice, springer, pp 298–372
Vanegas C A, Aliaga DG, Benes B (2010) Building reconstruction using Manhattan-world grammars. In: Computer Vision and Pattern Recognition (CVPR), IEEE, pp 358–365
Wilczkowiak M, Sturm P, Boyer E (2005) Using geometric constraints through parallelepipeds for calibration and 3D modeling. IEEE Transactions on Pattern Analysis & Machine Intelligence 27(2):194–207
Wu Z, Song S, Khosla A, et al (2015) 3D ShapeNets: a deep representation for volumetric shapes. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 1912–1920
Wu J, Xue T, Lim JJ, et al (2016) Single image 3D interpreter network. In: European Conference on Computer Vision, Springer, pp. 365–382
Yan C, Zhang Y, Xu J, Dai F, Li L, Dai Q, Wu F (2014) A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Processing Letters 21(5):573–576
Yan C, Zhang Y, Xu J, Dai F, Zhang J, Dai Q, Wu F (2014) Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Transactions on Circuits & Systems for Video Technology 24(12):2077–2089
Yan C, Xie H, Yang D et al (2017) Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans Intell Transp Syst 99:1–12
Yan C, Xie H, Liu S et al (2017) Effective Uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans Intell Transp Syst 99:1–10
Yang Z, Hoseinzadeh M. (2017) AutoTiering: Automatic Data Placement Manager in Multi-Tier All-Flash Datacenter, IEEE International Performance Computing and Communications Conference (IPCCC)
Yu K, Ng A (2010) Feature learning for image classification. ECCV (tutorials)
Acknowledgments
This work is supported by National Natural Science Foundation of China (No. 61502185) and the Fundamental Research Funds for the Central Universities (No: 2017KFYXJJ071).
The authors would like to thank Shuang Liu for helping us to collect experimental material.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, D., Hu, D., Sun, Y. et al. 3D scene reconstruction using a texture probabilistic grammar. Multimed Tools Appl 77, 28417–28440 (2018). https://doi.org/10.1007/s11042-018-6052-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6052-z