skip to main content
research-article
Open Access

RAID: a relation-augmented image descriptor

Published:11 July 2016Publication History
Skip Abstract Section

Abstract

As humans, we regularly interpret scenes based on how objects are related, rather than based on the objects themselves. For example, we see a person riding an object X or a plank bridging two objects. Current methods provide limited support to search for content based on such relations. We present raid, a relation-augmented image descriptor that supports queries based on inter-region relations. The key idea of our descriptor is to encode region-to-region relations as the spatial distribution of point-to-region relationships between two image regions. raid allows sketch-based retrieval and requires minimal training data, thus making it suited even for querying uncommon relations. We evaluate the proposed descriptor by querying into large image databases and successfully extract non-trivial images demonstrating complex inter-region relations, which are easily missed or erroneously classified by existing methods. We assess the robustness of raid on multiple datasets even when the region segmentation is computed automatically or very noisy.

Skip Supplemental Material Section

Supplemental Material

a46.mp4

mp4

255.1 MB

References

  1. Arnold, S., M., W., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE PAMI 22, 12 (Dec.), 1349--1380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Badadapure, P. R. 2013. Content-Based Image Retrieval by Combining Structural and Content Based Features. International Journal of Engineering and Advanced Technology 2, 4, 154--156.Google ScholarGoogle Scholar
  3. Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. Pattern Analysis and Machine Intelligence, IEEE Transactions on 24, 4, 509--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Berthouzoz, F., Li, W., Dontcheva, M., and Agrawala, M. 2011. A framework for content-adaptive photo manipulation macros: Application to face, landscape, and global manipulations. ACM TOG 30, 5 (Oct.), 120:1--120:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bloch, I. 2005. Fuzzy spatial relationships for image processing and interpretation: A review. In Image and Vision Computing, vol. 23, 89--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 2015. Boost polygon, version 1.58. www.boost.org.Google ScholarGoogle Scholar
  7. Cao, Y., Wang, C., Zhang, L., and Zhang, L. 2011. Edgel index for large-scale sketch-based image search. In IEEE CVPR, 761--768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Celebi, M. E., and Aslandogan, Y. A. 2005. A comparative study of three moment-based shape descriptors. In IEEE Proc. of the Internat. Conf. on Information Technology, 788--793. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chandran, S., and Kiran, N. 2003. Image retrieval with embedded region relationships. In Proceedings of SAC, 760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chao, Y.-W., Wang, Z., He, Y., Wang, J., and Deng, J. 2015. Hico: A benchmark for recognizing human-object interactions in images. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chen, T., Cheng, M.-M., Tan, P., Shamir, A., and Hu, S.-M. 2009. Sketch2photo: Internet image montage. ACM TOG 28, 5 (Dec.), 124:1--124:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chen, K., Lai, Y.-K., Wu, Y.-X., Martin, R., and Hu, S.-M. 2014. Automatic semantic modeling of indoor scenes from low-quality rgb-d data using contextual information. ACM TOG 33, 6 (Nov.), 208:1--208:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. 2015. Semantic image segmentation with deep convolutional nets and fully connected crfs. ICLR (Nov.).Google ScholarGoogle Scholar
  14. Choi, W., Shahid, K., and Savarese, S. 2009. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In ICCV Workshops, 1282--1289.Google ScholarGoogle Scholar
  15. Chua, T. S., Tan, K.-L., and Ooi, B. C. 1997. Fast signature-based color-spatial image retrieval. In Multimedia Computing and Systems '97. Proceedings., IEEE International Conference on, 362--369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2009. A descriptor for large scale image retrieval based on sketched feature lines. In Eurographics Symposium on Sketch-Based Interfaces and Modeling, 29--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2009. A descriptor for large scale image retrieval based on sketched feature lines. In SBIM '09, ACM, New York, NY, USA, 29--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Eitz, M., Richter, R., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Photosketcher: Interactive sketch-based image synthesis. Computer Graphics and Applications, IEEE 31, 6 (Nov), 56--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., and Alexa, M. 2012. Sketch-based shape retrieval. ACM TOG 31, 4 (July), 31:1--31:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Fisher, M., Savva, M., and Hanrahan, P. 2011. Characterizing structural relationships in scenes using graph kernels. In ACM TOG, vol. 30, ACM, 34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., and Hanrahan, P. 2012. Example-based synthesis of 3d object arrangements. In ACM SIGGRAPH Asia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Fisher, M., Savva, M., Li, Y., Hanrahan, P., and Niessner, M. 2015. Activity-centric scene synthesis for functional 3d scene modeling. ACM TOG 34, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Flusser, J. 1992. Invariant shape description and measure of object similarity. In Image Processing and its Applications, 1992., International Conference on, 139--142.Google ScholarGoogle Scholar
  24. Goshtasby, A. 1985. Description and discrimination of planar shapes using shape matrices. IEEE PAMI 7, 6, 738--743. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hays, J., and Efros, A. A. 2007. Scene completion using millions of photographs. ACM TOG 26, 3 (July). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hsieh, S.-M., and Hsu, C.-C. 2008. Retrieval of images by spatial and object similarities. Inf. Process. Manage. 44, 3 (May), 1214--1233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hu, S.-M., Zhang, F.-L., Wang, M., Martin, R. R., and Wang, J. 2013. PatchNet: A Patch-based Image Representation for Interactive Library-driven Image Editing. ACM TOG 32, 6, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hu, R., Zhu, C., van Kaick, O., Liu, L., Shamir, A., and Zhang, H. 2015. Interaction context (icon): Towards a geometric functionality descriptor. ACM TOG 34, 4 (July), 83:1--83:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Huang, H., Yin, K., Gong, M., Lischinski, D., Cohen-Or, D., Ascher, U., and Chen, B. 2013. "mind the gap": Tele-registration for structure-driven image completion. ACM TOG 32, 6 (Nov.), 174:1--174:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Huang, S., Wang, W., and Zhang, H. 2014. Retrieving images using saliency detection and graph matching. In IEEE ICIP, 3087--3091.Google ScholarGoogle Scholar
  31. Jansen, S., Shantia, A., and Wiering, M. A. 2015. The neural-sift feature descriptor for visual vocabulary object recognition. In IJCNN, 1--8.Google ScholarGoogle Scholar
  32. Karpathy, A., and Li, F.-F. 2015. Deep Visual-Semantic Alignments for Generating Image Descriptions. In IEEE CVPR.Google ScholarGoogle Scholar
  33. Kazmi, I. K., You, L., and Zhang, J. J. 2013. A survey of 2d and 3d shape descriptors. 2014 11th International Conference on Computer Graphics, Imaging and Visualization 0, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kim, V. G., Chaudhuri, S., Guibas, L., and Funkhouser, T. 2014. Shape2Pose: Human-Centric Shape Analysis. ACM SIGGRAPH 33, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ko, B., and Byun, H. 2002. Multiple Regions and Their Spatial Relationship-Based Image Retrieval. In LNCS 2383. 81--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., Bernstein, M., and Fei-Fei, L. 2016. Visual genome: Connecting language and vision using crowd-sourced dense image annotations.Google ScholarGoogle Scholar
  37. Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., Berg, A. C., and Berg, T. L. 2013. Baby talk: Understanding and generating simple image descriptions. IEEE PAMI 35, 12, 2891--2903. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Lan, T., Yang, W., Wang, Y., and Mori, G. 2012. Image retrieval with structured object queries using latent ranking SVM. In Lect. Notes in Computer Science, vol. 7577 LNCS, 129--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Lan, T., Raptis, M., Sigal, L., and Mori, G. 2013. From subcategories to visual composites: A multi-level framework for object detection. In IEEE ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Lee, S. L. S., and Hwang, E. H. E. 2002. Spatial similarity and annotation-based image retrieval system. Proceedings of Fourth Int. Symposium on Multimedia Software Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. 2014. Microsoft COCO: common objects in context. CoRR abs/1405.0312.Google ScholarGoogle Scholar
  42. Liu, T., Chaudhuri, S., Kim, V. G., Huang, Q.-X., Mitra, N. J., and Funkhouser, T. 2014. Creating Consistent Scene Graphs Using a Probabilistic Grammar. ACM Transactions on Graphics (Proc. of SIGGRAPH Asia) 33, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Long, J., Shelhamer, E., and Darrell, T. 2015. Fully convolutional networks for semantic segmentation. IEEE CVPR.Google ScholarGoogle Scholar
  44. Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. Journal of Computer Vision 60, 2, 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Malisiewicz, T., and A., E. A. 2009. Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships. In NIPS, 1--9.Google ScholarGoogle Scholar
  46. Ooi, B. C., Tan, K.-L., Chua, T. S., and Hsu, W. 1998. Fast image retrieval using color-spatial information. The VLDB Journal 7, 2, 115--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Pentland, A., Picard, R. W., and Sclaroff, S. 1996. Photobook: Content-based manipulation of image databases. Int. J. Comput. Vision 18, 3 (June), 233--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Rubner, Y., Tomasi, C., and Guibas, L. J. 1998. A metric for distributions with applications to image databases. IEEE Computer Society, Washington, DC, USA, IEEE ICCV, 59--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Sadeghi, M. A., and Farhadi, A. 2011. Recognition using visual phrases. IEEE Computer Society, Washington, DC, USA, IEEE CVPR, 1745--1752. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Shao, T., Monszpart, A., Zheng, Y., Koo, B., Xu, W., Zhou, K., and Mitra, N. 2014. Imagining the unseen: Stability-based cuboid arrangements for scene understanding. ACM SIGGRAPH Asia.* Joint first authors. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Shechtman, E., and Irani, M. 2007. Matching local self-similarities across images and videos. In IEEE CVPR, 1--8.Google ScholarGoogle Scholar
  52. Smith, J. R., and Chang, S.-F. 1996. Visualseek: A fully automated content-based image query system. In Proceedings of the Fourth ACM International Conference on Multimedia, ACM, New York, NY, USA, MULTIMEDIA '96, 87--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Teague, M. R. 1980. Image analysis via the general theory of moments*. J. Opt. Soc. Am. 70, 8 (Aug), 920--930.Google ScholarGoogle ScholarCross RefCross Ref
  54. Wang, J., and Hua, X.-S. 2011. Interactive image search by color map. ACM Trans. Intell. Syst. Technol. 3, 1, 12:1--12:23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Wang, Y.-H., 2003. Image indexing and similarity retrieval based on spatial relationship model.Google ScholarGoogle Scholar
  56. Xu, H., Wang, J., Hua, X.-S., and Li, S. 2010. Image search by concept map. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR '10, 275--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Xu, K., Chen, K., Fu, H., Sun, W.-L., and Hu, S.-M. 2013. Sketch2scene: Sketch-based co-retrieval and co-placement of 3d models. ACM TOG 32, 4 (July), 123:1--123:15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yücer, K., Jacobson, A., Hornung, A., and Sorkine, O. 2012. Transfusive image manipulation. ACM TOG 31, 6 (Nov.), 176:1--176:9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Zhang, D., and Lu, G. 2004. Review of shape representation and description techniques. Pattern Recognition 37, 1, 1--19.Google ScholarGoogle ScholarCross RefCross Ref
  60. Zhao, X., Wang, H., and Komura, T. 2014. Indexing 3d scenes using the interaction bisector surface. ACM TOG 33, 3 (June), 22:1--22:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Zheng, Y., Cohen-Or, D., Averkiou, M., and Mitra, N. J. 2014. Recurring part arrangements in shape collections. Computer Graphics Forum. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., and Torr, P. 2015. Conditional random fields as recurrent neural networks. In IEEE ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Zhou, X. M., Ang, C. H., and Ling, T. W. 2001. Image retrieval based on object's orientation spatial relationship. Pattern Recognition Letters 22, 5, 469--477. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. RAID: a relation-augmented image descriptor

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader