skip to main content
10.1145/3571600.3571625acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicvgipConference Proceedingsconference-collections
research-article

Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification✱

Published:12 May 2023Publication History

ABSTRACT

In this paper, we study the problem of identifying logos of business brands in natural scenes in an open-set one-shot setting. This problem setup is significantly more challenging than traditionally-studied ‘closed-set’ and ‘large-scale training samples per category’ logo recognition settings. We propose a novel multi-view textual-visual encoding framework that encodes text appearing in the logos as well as the graphical design of the logos to learn robust contrastive representations. These representations are jointly learned for multiple views of logos over a batch and thereby they generalize well to unseen logos. We evaluate our proposed framework for cropped logo verification, cropped logo identification, and end-to-end logo identification in natural scene tasks; and compare it against state-of-the-art methods. Further, the literature lacks a ‘very-large-scale’ collection of reference logo images that can facilitate the study of one-hundred thousand-scale logo identification. To fill this gap in the literature, we introduce Wikidata Reference Logo Dataset (WiRLD), containing logos for 100K business brands harvested from Wikidata. Our proposed framework that achieves an area under the ROC curve of 91.3% on the QMUL-OpenLogo dataset for the verification task, outperforms state-of-the-art methods by 9.1% and 2.6% on the one-shot logo identification task on the Toplogos-10 and the FlickrLogos32 datasets, respectively. Further, we show that our method is more stable compared to other baselines even when the number of candidate logos is on a 100K scale.

References

  1. Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, and Hwalsuk Lee. 2019. What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis. In ICCV.Google ScholarGoogle Scholar
  2. Muhammet Bastan, Hao-Yu Wu, Tian Cao, Bhargava Kota, and Mehmet Tek. 2019. Large scale open-set deep logo detection. arXiv preprint arXiv:1911.07440(2019).Google ScholarGoogle Scholar
  3. Ayan Kumar Bhunia, Ankan Kumar Bhunia, Shuvozit Ghose, Abhirup Das, Partha Pratim Roy, and Umapada Pal. 2019. A deep one-shot network for query-based logo retrieval. Pattern Recognition 96(2019), 106965.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Simone Bianco, Marco Buzzelli, Davide Mazzini, and Raimondo Schettini. 2015. Logo recognition using cnn features. In International Conference on Image Analysis and Processing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Simone Bianco, Marco Buzzelli, Davide Mazzini, and Raimondo Schettini. 2017. Deep learning for logo recognition. Neurocomputing 245(2017), 23–30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML.Google ScholarGoogle Scholar
  7. S. Chopra, R. Hadsell, and Y. LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In CVPR.Google ScholarGoogle Scholar
  8. István Fehérvári and Srikar Appalaraju. 2019. Scalable logo recognition using proxies. In WACV.Google ScholarGoogle Scholar
  9. Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, koray kavukcuoglu, Remi Munos, and Michal Valko. 2020. Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning. In NeurIPS.Google ScholarGoogle Scholar
  10. Michael Gutmann and Aapo Hyvärinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 297–304.Google ScholarGoogle Scholar
  11. Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In CVPR.Google ScholarGoogle Scholar
  12. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.Google ScholarGoogle Scholar
  13. Elad Hoffer and Nir Ailon. 2015. Deep metric learning using triplet network. In International workshop on similarity-based pattern recognition. Springer, 84–92.Google ScholarGoogle ScholarCross RefCross Ref
  14. Steven CH Hoi, Xiongwei Wu, Hantang Liu, Yue Wu, Huiqiong Wang, Hui Xue, and Qiang Wu. 2015. Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462(2015).Google ScholarGoogle Scholar
  15. Steven CH Hoi, Xiongwei Wu, Hantang Liu, Yue Wu, Huiqiong Wang, Hui Xue, and Qiang Wu. 2015. Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. arXiv preprint arXiv:1511.02462(2015).Google ScholarGoogle Scholar
  16. Sujuan Hou, Jianwei Lin, Shangbo Zhou, Maoling Qin, Weikuan Jia, and Yuanjie Zheng. 2017. Deep hierarchical representation from classifying logo-405. Complexity 2017(2017).Google ScholarGoogle Scholar
  17. Forrest N Iandola, Anting Shen, Peter Gao, and Kurt Keutzer. 2015. Deeplogo: Hitting logo recognition with the deep neural network hammer. arXiv preprint arXiv:1510.02131(2015).Google ScholarGoogle Scholar
  18. Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. 2021. A survey on contrastive self-supervised learning. Technologies 9, 1 (2021), 2.Google ScholarGoogle ScholarCross RefCross Ref
  19. Glenn Jocher, Ayush Chaurasia, Alex Stoken, Jirka Borovec, NanoCode012, Yonghye Kwon, TaoXie, Jiacong Fang, imyhxy, Kalen Michael, Lorna, Abhiram V, Diego Montes, Jebastin Nadar, Laughing, tkianai, yxNONG, Piotr Skalski, Zhiqiang Wang, Adam Hogan, Cristi Fati, Lorenzo Mammana, AlexWang1900, Deep Patel, Ding Yiwei, Felix You, Jan Hajek, Laurentiu Diaconu, and Mai Thanh Minh. 2022. ultralytics/yolov5: v6.1 - TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference. https://doi.org/10.5281/zenodo.6222936Google ScholarGoogle ScholarCross RefCross Ref
  20. Alexis Joly and Olivier Buisson. 2009. Logo retrieval with a contrario visual query expansion. In ACM-MM.Google ScholarGoogle Scholar
  21. Y. Kalantidis, LG. Pueyo, M. Trevisiol, R. van Zwol, and Y. Avrithis. 2011. Scalable Triangulation-based Logo Recognition. In ICMR.Google ScholarGoogle Scholar
  22. Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. NeurIPS (2020).Google ScholarGoogle Scholar
  23. Junsik Kim, Seokju Lee, Tae-Hyun Oh, and In So Kweon. 2018. Co-domain embedding using deep quadruplet networks for unseen traffic sign recognition. In AAAI.Google ScholarGoogle Scholar
  24. Junsik Kim, Tae-Hyun Oh, Seokju Lee, Fei Pan, and In So Kweon. 2019. Variational prototyping-encoder: One-shot learning with prototypical images. In CVPR.Google ScholarGoogle Scholar
  25. Gregory Koch, Richard Zemel, Ruslan Salakhutdinov, 2015. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop.Google ScholarGoogle Scholar
  26. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. NeurIPS (2012).Google ScholarGoogle Scholar
  27. Chenge Li, István Fehérvári, Xiaonan Zhao, Ives Macedo, and Srikar Appalaraju. 2022. SeeTek: Very Large-Scale Open-set Logo Recognition with Text-Aware Metric Learning. In WACV.Google ScholarGoogle Scholar
  28. Jan Neumann, Hanan Samet, and Aya Soffer. 2002. Integration of local and global shape analysis for logo classification. Pattern recognition letters 23, 12 (2002), 1449–1457.Google ScholarGoogle Scholar
  29. Stefan Romberg and Rainer Lienhart. 2013. Bundle min-hashing for logo recognition. In ICMR.Google ScholarGoogle Scholar
  30. Stefan Romberg, Lluis Garcia Pueyo, Rainer Lienhart, and Roelof Van Zwol. 2011. Scalable logo recognition in real-world images. In ICMR.Google ScholarGoogle Scholar
  31. Baoguang Shi, Xiang Bai, and Cong Yao. 2017. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE TPAMI 39(2017), 2298–2304.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kihyuk Sohn. 2016. Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In NeurIPS, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.).Google ScholarGoogle Scholar
  33. Hang Su, Shaogang Gong, and Xiatian Zhu. 2017. Weblogo-2m: Scalable logo detection by deep learning from the web. In CVPRW.Google ScholarGoogle Scholar
  34. Hang Su, Xiatian Zhu, and Shaogang Gong. 2017. Deep learning logo detection with data expansion by synthesising context. In WACV.Google ScholarGoogle Scholar
  35. Hang Su, Xiatian Zhu, and Shaogang Gong. 2018. Open Logo Detection Challenge. In BMVC.Google ScholarGoogle Scholar
  36. Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2020. Contrastive Multiview Coding. In ECCV.Google ScholarGoogle Scholar
  37. Andras Tüzkö, Christian Herrmann, Daniel Manger, and Jürgen Beyerer. 2017. Open set logo detection and retrieval. arXiv preprint arXiv:1710.10891(2017).Google ScholarGoogle Scholar
  38. Camilo Vargas, Qianni Zhang, and Ebroul Izquierdo. 2020. One shot logo recognition based on siamese neural networks. In ICMR.Google ScholarGoogle Scholar
  39. Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, 2016. Matching networks for one shot learning. NeurIPS (2016).Google ScholarGoogle Scholar
  40. Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78–85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jing Wang, Weiqing Min, Sujuan Hou, Shengnan Ma, Yuanjie Zheng, and Shuqiang Jiang. 2022. LogoDet-3K: A Large-Scale Image Dataset for Logo Detection. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 18, 1 (2022), 1–19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jing Wang, Weiqing Min, Sujuan Hou, Shengnan Ma, Yuanjie Zheng, Haishuai Wang, and Shuqiang Jiang. 2020. Logo-2K+: A large-scale logo dataset for scalable logo classification. In AAAI.Google ScholarGoogle Scholar
  43. Chenxi Xiao, Naveen Madapana, and Juan Wachs. 2021. One-Shot Image Recognition Using Prototypical Encoders with Reduced Hubness. In WACV.Google ScholarGoogle Scholar
  44. Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stephane Deny. 2021. Barlow Twins: Self-Supervised Learning via Redundancy Reduction. In ICML.Google ScholarGoogle Scholar

Index Terms

  1. Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification✱

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing
      December 2022
      506 pages
      ISBN:9781450398220
      DOI:10.1145/3571600

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 May 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate95of286submissions,33%
    • Article Metrics

      • Downloads (Last 12 months)20
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format