ABSTRACT
In this paper, we propose an end-to-end Attention-Block network for image retrieval (ABIR), which greatly increases the retrieval accuracy without human annotations like bounding boxes. Specifically, our network utilizes coarse-scale feature fusion, which generates the attentive local features via combining the information from different intermediate layers. Detailed feature information is extracted with the application of two attention blocks. Extensive experiments show that our method outperforms the state-of-the-art by a significant margin on four public datasets for image retrieval tasks.
- Artem Babenko, Anton Slesarev, Alexander Chigorin, and Victor S. Lempitsky. 2014. Neural Codes for Image Retrieval. In ECCV 2014, Vol. 8689. Springer, 584--599.Google Scholar
- Charles Corbière, Hedi Ben-younes, Alexandre Ramé, and Charles Ollion. 2017. Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label Prediction. In IEEE Conference on Computer Vision and Pattern Recognition Workshops,2017. 2268--2274.Google Scholar
- Weifeng Ge, Weilin Huang, Dengke Dong, and Matthew R. Scott. 2018. Deep Metric Learning with Hierarchical Triplet Loss. In ECCV 2018, Vol. 11210. Springer, 272--288.Google Scholar
- Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In IEEE, CVPR 2018. 7132--7141.Google ScholarCross Ref
- Junshi Huang, Rogerio S. Feris, Chen Qiang, and Shuicheng Yan. 2015. Cross-domain Image Retrieval with a Dual Attribute-aware Ranking Network. In IEEE, CVPR 2015. 1062--1074. Google ScholarDigital Library
- Wonsik Kim, Bhavya Goyal, Kunal Chawla, Jungmin Lee, and Keunjoo Kwon. 2018. Attention-based Ensemble for Deep Metric Learning. ECCV 2018, Vol. 11205. Springer, 760--777.Google Scholar
- Jonathan Krause, Michael Stark, Jia Deng, and Fei Fei Li. 2014. 3D Object Representations for Fine-Grained Categorization. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014. 554--561. Google ScholarDigital Library
- Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In IEEE, CVPR 2016. 1096--1104.Google ScholarCross Ref
- Joe Yue-Hei Ng, Fan Yang, and Larry S. Davis. 2015. Exploiting local features from deep networks for image retrieval. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015. 53--61.Google Scholar
- Michael Opitz, Georg Waltner, Horst Possegger, and Horst Bischof. 2018. Deep Metric Learning with BIER: Boosting Independent Embeddings Robustly. CoRR, Vol. abs/1801.04815 (2018). http://arxiv.org/abs/1801.04815Google Scholar
- Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014. 512--519. Google ScholarDigital Library
- Namhoon Lee Saumya Jetley, Nicholas A. Lord and Philip H. S. Torr. 2018. Learn To Pay Attention. In International Conference of Learning Representation .Google Scholar
- Jo Schlemper, Ozan Oktay, Michiel Schaap, Mattias P. Heinrich, Bernhard Kainz, Ben Glocker, and Daniel Rueckert. 2018. Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images. CoRR, Vol. abs/1808.08114 (2018). http://arxiv.org/abs/1808.08114Google Scholar
- Peichung Shih and Chengjun Liu. 2005. Comparative Assessment of Content-Based Face Image Retrieval in Different Color Spaces. In 2005 Audio- and Video-Based Biometric Person Authentication, Vol. 3546. Springer, 1039--1048. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, Vol. abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Hyun Oh Song, Xiang Yu, Stefanie Jegelka, and Silvio Savarese. 2016. Deep Metric Learning via Lifted Structured Feature Embedding. In IEEE, CVPR 2016. 4004--4012.Google ScholarCross Ref
- Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. CoRR, Vol. abs/1511.05879 (2015). http://arxiv.org/abs/1511.05879Google Scholar
- Evgeniya Ustinova and Victor S. Lempitsky. 2016. Learning Deep Embeddings with Histogram Loss. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016. 4170--4178. Google ScholarDigital Library
- C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report.Google Scholar
- Xiu-Shen Wei, Jian-Hao Luo, Jianxin Wu, and Zhi-Hua Zhou. 2017. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Transactions on Image Processing, Vol. 26, 6 (2017), 2868--2881. Google ScholarDigital Library
- Chao Yuan Wu, R. Manmatha, Alexander J. Smola, and Philipp Kr?henb?hl. 2017. Sampling Matters in Deep Embedding Learning. In IEEE, CVPR 2017. 2859--2867.Google Scholar
- Lingxi Xie, Jingdong Wang, Bo Zhang, and Qi Tian. 2015. Fine-grained image search. IEEE Transactions on Multimedia, Vol. 17, 5 (2015), 636--647.Google ScholarDigital Library
- Kota Yamaguchi, M Hadi Kiapour, and Tamara L Berg. 2013. Paper doll parsing: Retrieving similar styles to parse clothing items. In IEEE, CVPR 2013. 3519--3526. Google ScholarDigital Library
- Yuhui Yuan, Kuiyuan Yang, and Zhang Chao. 2017. Hard-Aware Deeply Cascaded Embedding. In IEEE, CVPR 2017. 814--823.Google Scholar
- Wengang Zhou, Houqiang Li, and Tian Qi. 2017. Recent Advance in Content-based Image Retrieval: A Literature Survey., Vol. abs/1706.06064 (2017).Google Scholar
Index Terms
- Weakly Supervised Image Retrieval via Coarse-scale Feature Fusion and Multi-level Attention Blocks
Recommendations
Augmented Feature Fusion for Image Retrieval System
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia RetrievalThe performance of current image retrieval system is largely determined by the quality and discriminative capability of features. Therefore, using what features and how to effectively combine the power of appropriate features are important in the ...
A decisive content based image retrieval approach for feature fusion in visual and textual images
AbstractImage content analysis plays a dynamic role in various computer vision applications. These contents can be either visual (i.e. color, shape, texture) or the textual (i.e. text appearing within images). Both the contents involve ...
Highlights- A decisive CBIR system is proposed that considers visual and textual contents.
- ...
Image Retrieval Using Fused Deep Convolutional Features
This paper proposes an image retrieval using fused deep convolutional features to solve the semantic gap between low-level features and high-level semantic features of traditional contend-based image retrieval method. Firstly, the improved network ...
Comments