research-article

Open Access

RAID: a relation-augmented image descriptor

Authors:
Paul Guerrero

University College London

University College London
View Profile

,
Niloy J. Mitra

University College London

University College London
View Profile

,
Peter Wonka

KAUST

KAUST
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 35 Issue 4Article No.: 46pp 1–12https://doi.org/10.1145/2897824.2925939

Published:11 July 2016Publication History

ACM Transactions on Graphics

Abstract

As humans, we regularly interpret scenes based on how objects are related, rather than based on the objects themselves. For example, we see a person riding an object X or a plank bridging two objects. Current methods provide limited support to search for content based on such relations. We present raid, a relation-augmented image descriptor that supports queries based on inter-region relations. The key idea of our descriptor is to encode region-to-region relations as the spatial distribution of point-to-region relationships between two image regions. raid allows sketch-based retrieval and requires minimal training data, thus making it suited even for querying uncommon relations. We evaluate the proposed descriptor by querying into large image databases and successfully extract non-trivial images demonstrating complex inter-region relations, which are easily missed or erroneously classified by existing methods. We assess the robustness of raid on multiple datasets even when the region segmentation is computed automatically or very noisy.

Supplemental Material

a46.mp4

mp4

255.1 MB

Download

Available for Download

zip

a46-guerrero-supp.zip (83.4 MB)

Supplemental files.

References

Arnold, S., M., W., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE PAMI 22, 12 (Dec.), 1349--1380. Google ScholarDigital Library
Badadapure, P. R. 2013. Content-Based Image Retrieval by Combining Structural and Content Based Features. International Journal of Engineering and Advanced Technology 2, 4, 154--156.Google Scholar
Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. Pattern Analysis and Machine Intelligence, IEEE Transactions on 24, 4, 509--522. Google ScholarDigital Library
Berthouzoz, F., Li, W., Dontcheva, M., and Agrawala, M. 2011. A framework for content-adaptive photo manipulation macros: Application to face, landscape, and global manipulations. ACM TOG 30, 5 (Oct.), 120:1--120:14. Google ScholarDigital Library
Bloch, I. 2005. Fuzzy spatial relationships for image processing and interpretation: A review. In Image and Vision Computing, vol. 23, 89--110. Google ScholarDigital Library
2015. Boost polygon, version 1.58. www.boost.org.Google Scholar
Cao, Y., Wang, C., Zhang, L., and Zhang, L. 2011. Edgel index for large-scale sketch-based image search. In IEEE CVPR, 761--768. Google ScholarDigital Library
Celebi, M. E., and Aslandogan, Y. A. 2005. A comparative study of three moment-based shape descriptors. In IEEE Proc. of the Internat. Conf. on Information Technology, 788--793. Google ScholarDigital Library
Chandran, S., and Kiran, N. 2003. Image retrieval with embedded region relationships. In Proceedings of SAC, 760. Google ScholarDigital Library
Chao, Y.-W., Wang, Z., He, Y., Wang, J., and Deng, J. 2015. Hico: A benchmark for recognizing human-object interactions in images. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarDigital Library
Chen, T., Cheng, M.-M., Tan, P., Shamir, A., and Hu, S.-M. 2009. Sketch2photo: Internet image montage. ACM TOG 28, 5 (Dec.), 124:1--124:10. Google ScholarDigital Library
Chen, K., Lai, Y.-K., Wu, Y.-X., Martin, R., and Hu, S.-M. 2014. Automatic semantic modeling of indoor scenes from low-quality rgb-d data using contextual information. ACM TOG 33, 6 (Nov.), 208:1--208:12. Google ScholarDigital Library
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. 2015. Semantic image segmentation with deep convolutional nets and fully connected crfs. ICLR (Nov.).Google Scholar
Choi, W., Shahid, K., and Savarese, S. 2009. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In ICCV Workshops, 1282--1289.Google Scholar
Chua, T. S., Tan, K.-L., and Ooi, B. C. 1997. Fast signature-based color-spatial image retrieval. In Multimedia Computing and Systems '97. Proceedings., IEEE International Conference on, 362--369. Google ScholarDigital Library
Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2009. A descriptor for large scale image retrieval based on sketched feature lines. In Eurographics Symposium on Sketch-Based Interfaces and Modeling, 29--38. Google ScholarDigital Library
Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2009. A descriptor for large scale image retrieval based on sketched feature lines. In SBIM '09, ACM, New York, NY, USA, 29--36. Google ScholarDigital Library
Eitz, M., Richter, R., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Photosketcher: Interactive sketch-based image synthesis. Computer Graphics and Applications, IEEE 31, 6 (Nov), 56--66. Google ScholarDigital Library
Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., and Alexa, M. 2012. Sketch-based shape retrieval. ACM TOG 31, 4 (July), 31:1--31:10. Google ScholarDigital Library
Fisher, M., Savva, M., and Hanrahan, P. 2011. Characterizing structural relationships in scenes using graph kernels. In ACM TOG, vol. 30, ACM, 34. Google ScholarDigital Library
Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., and Hanrahan, P. 2012. Example-based synthesis of 3d object arrangements. In ACM SIGGRAPH Asia. Google ScholarDigital Library
Fisher, M., Savva, M., Li, Y., Hanrahan, P., and Niessner, M. 2015. Activity-centric scene synthesis for functional 3d scene modeling. ACM TOG 34, 6. Google ScholarDigital Library
Flusser, J. 1992. Invariant shape description and measure of object similarity. In Image Processing and its Applications, 1992., International Conference on, 139--142.Google Scholar
Goshtasby, A. 1985. Description and discrimination of planar shapes using shape matrices. IEEE PAMI 7, 6, 738--743. Google ScholarDigital Library
Hays, J., and Efros, A. A. 2007. Scene completion using millions of photographs. ACM TOG 26, 3 (July). Google ScholarDigital Library
Hsieh, S.-M., and Hsu, C.-C. 2008. Retrieval of images by spatial and object similarities. Inf. Process. Manage. 44, 3 (May), 1214--1233. Google ScholarDigital Library
Hu, S.-M., Zhang, F.-L., Wang, M., Martin, R. R., and Wang, J. 2013. PatchNet: A Patch-based Image Representation for Interactive Library-driven Image Editing. ACM TOG 32, 6, 1--12. Google ScholarDigital Library
Hu, R., Zhu, C., van Kaick, O., Liu, L., Shamir, A., and Zhang, H. 2015. Interaction context (icon): Towards a geometric functionality descriptor. ACM TOG 34, 4 (July), 83:1--83:12. Google ScholarDigital Library
Huang, H., Yin, K., Gong, M., Lischinski, D., Cohen-Or, D., Ascher, U., and Chen, B. 2013. "mind the gap": Tele-registration for structure-driven image completion. ACM TOG 32, 6 (Nov.), 174:1--174:10. Google ScholarDigital Library
Huang, S., Wang, W., and Zhang, H. 2014. Retrieving images using saliency detection and graph matching. In IEEE ICIP, 3087--3091.Google Scholar
Jansen, S., Shantia, A., and Wiering, M. A. 2015. The neural-sift feature descriptor for visual vocabulary object recognition. In IJCNN, 1--8.Google Scholar
Karpathy, A., and Li, F.-F. 2015. Deep Visual-Semantic Alignments for Generating Image Descriptions. In IEEE CVPR.Google Scholar
Kazmi, I. K., You, L., and Zhang, J. J. 2013. A survey of 2d and 3d shape descriptors. 2014 11th International Conference on Computer Graphics, Imaging and Visualization 0, 1--10. Google ScholarDigital Library
Kim, V. G., Chaudhuri, S., Guibas, L., and Funkhouser, T. 2014. Shape2Pose: Human-Centric Shape Analysis. ACM SIGGRAPH 33, 4. Google ScholarDigital Library
Ko, B., and Byun, H. 2002. Multiple Regions and Their Spatial Relationship-Based Image Retrieval. In LNCS 2383. 81--90. Google ScholarDigital Library
Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., Bernstein, M., and Fei-Fei, L. 2016. Visual genome: Connecting language and vision using crowd-sourced dense image annotations.Google Scholar
Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., Berg, A. C., and Berg, T. L. 2013. Baby talk: Understanding and generating simple image descriptions. IEEE PAMI 35, 12, 2891--2903. Google ScholarDigital Library
Lan, T., Yang, W., Wang, Y., and Mori, G. 2012. Image retrieval with structured object queries using latent ranking SVM. In Lect. Notes in Computer Science, vol. 7577 LNCS, 129--142. Google ScholarDigital Library
Lan, T., Raptis, M., Sigal, L., and Mori, G. 2013. From subcategories to visual composites: A multi-level framework for object detection. In IEEE ICCV. Google ScholarDigital Library
Lee, S. L. S., and Hwang, E. H. E. 2002. Spatial similarity and annotation-based image retrieval system. Proceedings of Fourth Int. Symposium on Multimedia Software Engineering. Google ScholarDigital Library
Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. 2014. Microsoft COCO: common objects in context. CoRR abs/1405.0312.Google Scholar
Liu, T., Chaudhuri, S., Kim, V. G., Huang, Q.-X., Mitra, N. J., and Funkhouser, T. 2014. Creating Consistent Scene Graphs Using a Probabilistic Grammar. ACM Transactions on Graphics (Proc. of SIGGRAPH Asia) 33, 6. Google ScholarDigital Library
Long, J., Shelhamer, E., and Darrell, T. 2015. Fully convolutional networks for semantic segmentation. IEEE CVPR.Google Scholar
Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. Journal of Computer Vision 60, 2, 91--110. Google ScholarDigital Library
Malisiewicz, T., and A., E. A. 2009. Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships. In NIPS, 1--9.Google Scholar
Ooi, B. C., Tan, K.-L., Chua, T. S., and Hsu, W. 1998. Fast image retrieval using color-spatial information. The VLDB Journal 7, 2, 115--128. Google ScholarDigital Library
Pentland, A., Picard, R. W., and Sclaroff, S. 1996. Photobook: Content-based manipulation of image databases. Int. J. Comput. Vision 18, 3 (June), 233--254. Google ScholarDigital Library
Rubner, Y., Tomasi, C., and Guibas, L. J. 1998. A metric for distributions with applications to image databases. IEEE Computer Society, Washington, DC, USA, IEEE ICCV, 59--66. Google ScholarDigital Library
Sadeghi, M. A., and Farhadi, A. 2011. Recognition using visual phrases. IEEE Computer Society, Washington, DC, USA, IEEE CVPR, 1745--1752. Google ScholarDigital Library
Shao, T., Monszpart, A., Zheng, Y., Koo, B., Xu, W., Zhou, K., and Mitra, N. 2014. Imagining the unseen: Stability-based cuboid arrangements for scene understanding. ACM SIGGRAPH Asia.* Joint first authors. Google ScholarDigital Library
Shechtman, E., and Irani, M. 2007. Matching local self-similarities across images and videos. In IEEE CVPR, 1--8.Google Scholar
Smith, J. R., and Chang, S.-F. 1996. Visualseek: A fully automated content-based image query system. In Proceedings of the Fourth ACM International Conference on Multimedia, ACM, New York, NY, USA, MULTIMEDIA '96, 87--98. Google ScholarDigital Library
Teague, M. R. 1980. Image analysis via the general theory of moments*. J. Opt. Soc. Am. 70, 8 (Aug), 920--930.Google ScholarCross Ref
Wang, J., and Hua, X.-S. 2011. Interactive image search by color map. ACM Trans. Intell. Syst. Technol. 3, 1, 12:1--12:23. Google ScholarDigital Library
Wang, Y.-H., 2003. Image indexing and similarity retrieval based on spatial relationship model.Google Scholar
Xu, H., Wang, J., Hua, X.-S., and Li, S. 2010. Image search by concept map. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR '10, 275--282. Google ScholarDigital Library
Xu, K., Chen, K., Fu, H., Sun, W.-L., and Hu, S.-M. 2013. Sketch2scene: Sketch-based co-retrieval and co-placement of 3d models. ACM TOG 32, 4 (July), 123:1--123:15. Google ScholarDigital Library
Yücer, K., Jacobson, A., Hornung, A., and Sorkine, O. 2012. Transfusive image manipulation. ACM TOG 31, 6 (Nov.), 176:1--176:9. Google ScholarDigital Library
Zhang, D., and Lu, G. 2004. Review of shape representation and description techniques. Pattern Recognition 37, 1, 1--19.Google ScholarCross Ref
Zhao, X., Wang, H., and Komura, T. 2014. Indexing 3d scenes using the interaction bisector surface. ACM TOG 33, 3 (June), 22:1--22:14. Google ScholarDigital Library
Zheng, Y., Cohen-Or, D., Averkiou, M., and Mitra, N. J. 2014. Recurring part arrangements in shape collections. Computer Graphics Forum. Google ScholarDigital Library
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., and Torr, P. 2015. Conditional random fields as recurrent neural networks. In IEEE ICCV. Google ScholarDigital Library
Zhou, X. M., Ang, C. H., and Ling, T. W. 2001. Image retrieval based on object's orientation spatial relationship. Pattern Recognition Letters 22, 5, 469--477. Google ScholarDigital Library

Index Terms

RAID: a relation-augmented image descriptor
1. Computing methodologies
  1. Computer graphics
    1. Shape modeling
      1. Shape analysis

Recommendations

A Comparison of Multi-scale Local Binary Pattern Variants for Bark Image Retrieval
ACIVS 2015: Proceedings of the 16th International Conference on Advanced Concepts for Intelligent Vision Systems - Volume 9386

With the growing interest in identifying plant species and the availability of digital collections, many automated methods based on bark images have been proposed. Bark identification is often formulated as a texture analysis problem. Among numerous ...
Read More
On Using SIFT Descriptors for Image Parameter Evaluation
ICDMW '13: Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops

In this work we present a composite method for image parameter evaluation using Scale-Invariant Feature Transform (SIFT) descriptors and bag of words representation applied to pre-selected image parameters, with potential applications to solar data and ...
Read More
Relative Position Descriptors
ICPRAM 2015: Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1

A relative position descriptor is a quantitative representation of the relative position of two spatial objects. It

is a low-level image descriptor, like colour, texture, and shape descriptors. A good amount of work has been

carried out on relative ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Graphics Volume 35, Issue 4
July 2016
1396 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2897824
Issue’s Table of Contents

Copyright © 2016 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 July 2016
Published in tog Volume 35, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
image descriptors
image retrieval
relation-based query
sketch-based query
spatial relationships
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 838
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

RAID: a relation-augmented image descriptor

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

A Comparison of Multi-scale Local Binary Pattern Variants for Bark Image Retrieval

On Using SIFT Descriptors for Image Parameter Evaluation

Relative Position Descriptors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

RAID: a relation-augmented image descriptor

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

A Comparison of Multi-scale Local Binary Pattern Variants for Bark Image Retrieval

On Using SIFT Descriptors for Image Parameter Evaluation

Relative Position Descriptors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media