Abstract
As humans, we regularly interpret scenes based on how objects are related, rather than based on the objects themselves. For example, we see a person riding an object X or a plank bridging two objects. Current methods provide limited support to search for content based on such relations. We present raid, a relation-augmented image descriptor that supports queries based on inter-region relations. The key idea of our descriptor is to encode region-to-region relations as the spatial distribution of point-to-region relationships between two image regions. raid allows sketch-based retrieval and requires minimal training data, thus making it suited even for querying uncommon relations. We evaluate the proposed descriptor by querying into large image databases and successfully extract non-trivial images demonstrating complex inter-region relations, which are easily missed or erroneously classified by existing methods. We assess the robustness of raid on multiple datasets even when the region segmentation is computed automatically or very noisy.
Supplemental Material
Available for Download
Supplemental files.
- Arnold, S., M., W., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE PAMI 22, 12 (Dec.), 1349--1380. Google ScholarDigital Library
- Badadapure, P. R. 2013. Content-Based Image Retrieval by Combining Structural and Content Based Features. International Journal of Engineering and Advanced Technology 2, 4, 154--156.Google Scholar
- Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. Pattern Analysis and Machine Intelligence, IEEE Transactions on 24, 4, 509--522. Google ScholarDigital Library
- Berthouzoz, F., Li, W., Dontcheva, M., and Agrawala, M. 2011. A framework for content-adaptive photo manipulation macros: Application to face, landscape, and global manipulations. ACM TOG 30, 5 (Oct.), 120:1--120:14. Google ScholarDigital Library
- Bloch, I. 2005. Fuzzy spatial relationships for image processing and interpretation: A review. In Image and Vision Computing, vol. 23, 89--110. Google ScholarDigital Library
- 2015. Boost polygon, version 1.58. www.boost.org.Google Scholar
- Cao, Y., Wang, C., Zhang, L., and Zhang, L. 2011. Edgel index for large-scale sketch-based image search. In IEEE CVPR, 761--768. Google ScholarDigital Library
- Celebi, M. E., and Aslandogan, Y. A. 2005. A comparative study of three moment-based shape descriptors. In IEEE Proc. of the Internat. Conf. on Information Technology, 788--793. Google ScholarDigital Library
- Chandran, S., and Kiran, N. 2003. Image retrieval with embedded region relationships. In Proceedings of SAC, 760. Google ScholarDigital Library
- Chao, Y.-W., Wang, Z., He, Y., Wang, J., and Deng, J. 2015. Hico: A benchmark for recognizing human-object interactions in images. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarDigital Library
- Chen, T., Cheng, M.-M., Tan, P., Shamir, A., and Hu, S.-M. 2009. Sketch2photo: Internet image montage. ACM TOG 28, 5 (Dec.), 124:1--124:10. Google ScholarDigital Library
- Chen, K., Lai, Y.-K., Wu, Y.-X., Martin, R., and Hu, S.-M. 2014. Automatic semantic modeling of indoor scenes from low-quality rgb-d data using contextual information. ACM TOG 33, 6 (Nov.), 208:1--208:12. Google ScholarDigital Library
- Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. 2015. Semantic image segmentation with deep convolutional nets and fully connected crfs. ICLR (Nov.).Google Scholar
- Choi, W., Shahid, K., and Savarese, S. 2009. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In ICCV Workshops, 1282--1289.Google Scholar
- Chua, T. S., Tan, K.-L., and Ooi, B. C. 1997. Fast signature-based color-spatial image retrieval. In Multimedia Computing and Systems '97. Proceedings., IEEE International Conference on, 362--369. Google ScholarDigital Library
- Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2009. A descriptor for large scale image retrieval based on sketched feature lines. In Eurographics Symposium on Sketch-Based Interfaces and Modeling, 29--38. Google ScholarDigital Library
- Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2009. A descriptor for large scale image retrieval based on sketched feature lines. In SBIM '09, ACM, New York, NY, USA, 29--36. Google ScholarDigital Library
- Eitz, M., Richter, R., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Photosketcher: Interactive sketch-based image synthesis. Computer Graphics and Applications, IEEE 31, 6 (Nov), 56--66. Google ScholarDigital Library
- Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., and Alexa, M. 2012. Sketch-based shape retrieval. ACM TOG 31, 4 (July), 31:1--31:10. Google ScholarDigital Library
- Fisher, M., Savva, M., and Hanrahan, P. 2011. Characterizing structural relationships in scenes using graph kernels. In ACM TOG, vol. 30, ACM, 34. Google ScholarDigital Library
- Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., and Hanrahan, P. 2012. Example-based synthesis of 3d object arrangements. In ACM SIGGRAPH Asia. Google ScholarDigital Library
- Fisher, M., Savva, M., Li, Y., Hanrahan, P., and Niessner, M. 2015. Activity-centric scene synthesis for functional 3d scene modeling. ACM TOG 34, 6. Google ScholarDigital Library
- Flusser, J. 1992. Invariant shape description and measure of object similarity. In Image Processing and its Applications, 1992., International Conference on, 139--142.Google Scholar
- Goshtasby, A. 1985. Description and discrimination of planar shapes using shape matrices. IEEE PAMI 7, 6, 738--743. Google ScholarDigital Library
- Hays, J., and Efros, A. A. 2007. Scene completion using millions of photographs. ACM TOG 26, 3 (July). Google ScholarDigital Library
- Hsieh, S.-M., and Hsu, C.-C. 2008. Retrieval of images by spatial and object similarities. Inf. Process. Manage. 44, 3 (May), 1214--1233. Google ScholarDigital Library
- Hu, S.-M., Zhang, F.-L., Wang, M., Martin, R. R., and Wang, J. 2013. PatchNet: A Patch-based Image Representation for Interactive Library-driven Image Editing. ACM TOG 32, 6, 1--12. Google ScholarDigital Library
- Hu, R., Zhu, C., van Kaick, O., Liu, L., Shamir, A., and Zhang, H. 2015. Interaction context (icon): Towards a geometric functionality descriptor. ACM TOG 34, 4 (July), 83:1--83:12. Google ScholarDigital Library
- Huang, H., Yin, K., Gong, M., Lischinski, D., Cohen-Or, D., Ascher, U., and Chen, B. 2013. "mind the gap": Tele-registration for structure-driven image completion. ACM TOG 32, 6 (Nov.), 174:1--174:10. Google ScholarDigital Library
- Huang, S., Wang, W., and Zhang, H. 2014. Retrieving images using saliency detection and graph matching. In IEEE ICIP, 3087--3091.Google Scholar
- Jansen, S., Shantia, A., and Wiering, M. A. 2015. The neural-sift feature descriptor for visual vocabulary object recognition. In IJCNN, 1--8.Google Scholar
- Karpathy, A., and Li, F.-F. 2015. Deep Visual-Semantic Alignments for Generating Image Descriptions. In IEEE CVPR.Google Scholar
- Kazmi, I. K., You, L., and Zhang, J. J. 2013. A survey of 2d and 3d shape descriptors. 2014 11th International Conference on Computer Graphics, Imaging and Visualization 0, 1--10. Google ScholarDigital Library
- Kim, V. G., Chaudhuri, S., Guibas, L., and Funkhouser, T. 2014. Shape2Pose: Human-Centric Shape Analysis. ACM SIGGRAPH 33, 4. Google ScholarDigital Library
- Ko, B., and Byun, H. 2002. Multiple Regions and Their Spatial Relationship-Based Image Retrieval. In LNCS 2383. 81--90. Google ScholarDigital Library
- Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., Bernstein, M., and Fei-Fei, L. 2016. Visual genome: Connecting language and vision using crowd-sourced dense image annotations.Google Scholar
- Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., Berg, A. C., and Berg, T. L. 2013. Baby talk: Understanding and generating simple image descriptions. IEEE PAMI 35, 12, 2891--2903. Google ScholarDigital Library
- Lan, T., Yang, W., Wang, Y., and Mori, G. 2012. Image retrieval with structured object queries using latent ranking SVM. In Lect. Notes in Computer Science, vol. 7577 LNCS, 129--142. Google ScholarDigital Library
- Lan, T., Raptis, M., Sigal, L., and Mori, G. 2013. From subcategories to visual composites: A multi-level framework for object detection. In IEEE ICCV. Google ScholarDigital Library
- Lee, S. L. S., and Hwang, E. H. E. 2002. Spatial similarity and annotation-based image retrieval system. Proceedings of Fourth Int. Symposium on Multimedia Software Engineering. Google ScholarDigital Library
- Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. 2014. Microsoft COCO: common objects in context. CoRR abs/1405.0312.Google Scholar
- Liu, T., Chaudhuri, S., Kim, V. G., Huang, Q.-X., Mitra, N. J., and Funkhouser, T. 2014. Creating Consistent Scene Graphs Using a Probabilistic Grammar. ACM Transactions on Graphics (Proc. of SIGGRAPH Asia) 33, 6. Google ScholarDigital Library
- Long, J., Shelhamer, E., and Darrell, T. 2015. Fully convolutional networks for semantic segmentation. IEEE CVPR.Google Scholar
- Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. Journal of Computer Vision 60, 2, 91--110. Google ScholarDigital Library
- Malisiewicz, T., and A., E. A. 2009. Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships. In NIPS, 1--9.Google Scholar
- Ooi, B. C., Tan, K.-L., Chua, T. S., and Hsu, W. 1998. Fast image retrieval using color-spatial information. The VLDB Journal 7, 2, 115--128. Google ScholarDigital Library
- Pentland, A., Picard, R. W., and Sclaroff, S. 1996. Photobook: Content-based manipulation of image databases. Int. J. Comput. Vision 18, 3 (June), 233--254. Google ScholarDigital Library
- Rubner, Y., Tomasi, C., and Guibas, L. J. 1998. A metric for distributions with applications to image databases. IEEE Computer Society, Washington, DC, USA, IEEE ICCV, 59--66. Google ScholarDigital Library
- Sadeghi, M. A., and Farhadi, A. 2011. Recognition using visual phrases. IEEE Computer Society, Washington, DC, USA, IEEE CVPR, 1745--1752. Google ScholarDigital Library
- Shao, T., Monszpart, A., Zheng, Y., Koo, B., Xu, W., Zhou, K., and Mitra, N. 2014. Imagining the unseen: Stability-based cuboid arrangements for scene understanding. ACM SIGGRAPH Asia.* Joint first authors. Google ScholarDigital Library
- Shechtman, E., and Irani, M. 2007. Matching local self-similarities across images and videos. In IEEE CVPR, 1--8.Google Scholar
- Smith, J. R., and Chang, S.-F. 1996. Visualseek: A fully automated content-based image query system. In Proceedings of the Fourth ACM International Conference on Multimedia, ACM, New York, NY, USA, MULTIMEDIA '96, 87--98. Google ScholarDigital Library
- Teague, M. R. 1980. Image analysis via the general theory of moments*. J. Opt. Soc. Am. 70, 8 (Aug), 920--930.Google ScholarCross Ref
- Wang, J., and Hua, X.-S. 2011. Interactive image search by color map. ACM Trans. Intell. Syst. Technol. 3, 1, 12:1--12:23. Google ScholarDigital Library
- Wang, Y.-H., 2003. Image indexing and similarity retrieval based on spatial relationship model.Google Scholar
- Xu, H., Wang, J., Hua, X.-S., and Li, S. 2010. Image search by concept map. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR '10, 275--282. Google ScholarDigital Library
- Xu, K., Chen, K., Fu, H., Sun, W.-L., and Hu, S.-M. 2013. Sketch2scene: Sketch-based co-retrieval and co-placement of 3d models. ACM TOG 32, 4 (July), 123:1--123:15. Google ScholarDigital Library
- Yücer, K., Jacobson, A., Hornung, A., and Sorkine, O. 2012. Transfusive image manipulation. ACM TOG 31, 6 (Nov.), 176:1--176:9. Google ScholarDigital Library
- Zhang, D., and Lu, G. 2004. Review of shape representation and description techniques. Pattern Recognition 37, 1, 1--19.Google ScholarCross Ref
- Zhao, X., Wang, H., and Komura, T. 2014. Indexing 3d scenes using the interaction bisector surface. ACM TOG 33, 3 (June), 22:1--22:14. Google ScholarDigital Library
- Zheng, Y., Cohen-Or, D., Averkiou, M., and Mitra, N. J. 2014. Recurring part arrangements in shape collections. Computer Graphics Forum. Google ScholarDigital Library
- Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., and Torr, P. 2015. Conditional random fields as recurrent neural networks. In IEEE ICCV. Google ScholarDigital Library
- Zhou, X. M., Ang, C. H., and Ling, T. W. 2001. Image retrieval based on object's orientation spatial relationship. Pattern Recognition Letters 22, 5, 469--477. Google ScholarDigital Library
Index Terms
- RAID: a relation-augmented image descriptor
Recommendations
A Comparison of Multi-scale Local Binary Pattern Variants for Bark Image Retrieval
ACIVS 2015: Proceedings of the 16th International Conference on Advanced Concepts for Intelligent Vision Systems - Volume 9386With the growing interest in identifying plant species and the availability of digital collections, many automated methods based on bark images have been proposed. Bark identification is often formulated as a texture analysis problem. Among numerous ...
On Using SIFT Descriptors for Image Parameter Evaluation
ICDMW '13: Proceedings of the 2013 IEEE 13th International Conference on Data Mining WorkshopsIn this work we present a composite method for image parameter evaluation using Scale-Invariant Feature Transform (SIFT) descriptors and bag of words representation applied to pre-selected image parameters, with potential applications to solar data and ...
Relative Position Descriptors
ICPRAM 2015: Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1A relative position descriptor is a quantitative representation of the relative position of two spatial objects. It
is a low-level image descriptor, like colour, texture, and shape descriptors. A good amount of work has been
carried out on relative ...
Comments