skip to main content
10.1145/3240508.3240616acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

WildFish: A Large Benchmark for Fish Recognition in the Wild

Published:15 October 2018Publication History

ABSTRACT

Fish recognition is an important task to understand the marine ecosystem and biodiversity. It is often challenging to identify fish species in the wild, due to the following difficulties. First, most fish benchmarks are small-scale, which may limit the representation power of machine learning models. Second, the number of fish species is huge, and there may still exist unknown categories in our planet. The traditional classifiers often fail to deal with this open-set scenario. Third, certain fish species are highly-confused. It is often hard to figure out the subtle differences, only by the unconstrained images. Motivated by these facts, we introduce a large-scale WildFish benchmark for fish recognition in the wild. Specifically, we make three contributions in this paper. First, WildFish is the largest image data set for wild fish recognition, to our best knowledge. It consists of 1000 fish categories with 54,459 unconstrained images, allowing to train high-capacity models for automatic fish classification. Second, we propose a novel open-set fish classification task for realistic scenarios, and investigate the open-set deep learning framework with a number of practical designs. Third, we propose a novel fine-grained recognition task, with the guidance of pairwise textual descriptions. Via leveraging the comparison knowledge in the sentence, we design a multi-modal fish net to effectively distinguish two confused categories in a pair. Finally, we release WildFish (https://github.com/PeiqinZhuang/WildFish), in order to bring benefit to more research studies in multimedia and beyond.

References

  1. K. Anantharajah, Z. Ge, C. McCool, S. Denman, C. Fookes, P. Corke, D. Tjondronegoro, and S. Sridharan. 2014. Local Inter-Session Variability Modelling for Object Classification. In WACV .Google ScholarGoogle Scholar
  2. J. L. Ba, K. Swersky, S. Fidler, and R. Salakhutdinov. 2015. Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions. In ICCV . Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bendale and T. Boult. 2015. Towards Open World Recognition. In CVPR .Google ScholarGoogle Scholar
  4. A. Bendale and T. E. Boult. 2016. Towards Open Set Deep Networks. In CVPR .Google ScholarGoogle Scholar
  5. B. J. Boom, P. X. Huang, J. He, and R. B. Fisher. 2012. Supporting Ground-Truth annotation of image datasets using clustering. In ICPR .Google ScholarGoogle Scholar
  6. P. P. Busto and J. Gall. 2017. Open Set Domain Adaptation. In ICCV .Google ScholarGoogle Scholar
  7. L. Castrejon, Y. Aytar, C. Vondrick, H. Pirsiavash, and A. Torralba. 2016. Learning Aligned Cross-Modal Representations from Weakly Aligned Data. In CVPR .Google ScholarGoogle Scholar
  8. LifeCLEF Challenges. 2017. ImageCLEF/LifeCLEF. In http://www.imageclef.org/.Google ScholarGoogle Scholar
  9. F. Chollet. 2017. Xception: Deep Learning with Depthwise Separable Convolutions. In CVPR .Google ScholarGoogle Scholar
  10. G. Cutter, K. Stierhoff, and J. Zeng. 2015. Automated detection of rockfish in unconstrained underwater videos using Haar cascades and a new image dataset: labeled fishes in the wild. In WACV . Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR .Google ScholarGoogle Scholar
  12. Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML . Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, and T. Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In NIPS . Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Fu, H. Zheng, and T. Mei. 2017b. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition. In CVPR .Google ScholarGoogle Scholar
  15. Y. Fu, T. Xiang, Y. Jiang, X. Xue, L. Sigal, and S. Gong. 2017a. Recent Advances in Zero-shot Recognition. IEEE Signal Processing Magazine (2017).Google ScholarGoogle Scholar
  16. Z. Ge, S. Demyanov, Z. Chen, and R. Garnavi. 2017. Generative OpenMax for Multi-Class Open Set Classification. In BMVC .Google ScholarGoogle Scholar
  17. M. Gunther, S. Cruz, E. M. Rudd, and T. E. Boult. 2017. Toward Open-Set Face Recognition. In CVPRW .Google ScholarGoogle Scholar
  18. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. In arXiv:1512.03385 .Google ScholarGoogle Scholar
  19. X. He and Y. Peng. 2017. Fine-Grained Image Classification via Combining Vision and Language. In CVPR .Google ScholarGoogle Scholar
  20. G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. 2017. Densely Connected Convolutional Networks. In CVPR .Google ScholarGoogle Scholar
  21. S. Huang, Z. Xu, D. Tao, and Y. Zhang. 2016. Part-Stacked CNN for Fine-Grained Visual Categorization. In CVPR .Google ScholarGoogle Scholar
  22. iNaturalist Competition. 2018. The Fifth Workshop on Fine-Grained Visual Categorization. In CVPR .Google ScholarGoogle Scholar
  23. L. P. Jain, W. J. Scheirer, and T. E. Boult. 2014. Multi-class Open Set Recognition Using Probability of Inclusion. In ECCV .Google ScholarGoogle Scholar
  24. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS . Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Lin, X. Shen, C. Lu, and J. Jia. 2015. Deep LAC: Deep localization, alignment and classification for fine-grained recognition. In CVPR .Google ScholarGoogle Scholar
  26. T. Y. Lin, A. Roychowdhury, and S. Maji. 2016. Bilinear CNN Models for Fine-Grained Visual Recognition. In ICCV . Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Liu, T. Xia, J. Wang, Y. Yang, F. Zhou, and Y. Lin. 2017. Fully Convolutional Attention Networks for Fine-Grained Recognition. arxiv:1603.06765 (2017).Google ScholarGoogle Scholar
  28. T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arxiv:1301.3781 (2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Peng, X. He, and J. Zhao. 2018. Object-Part Attention Model for Fine-grained Image Classification. IEEE TIP (2018).Google ScholarGoogle Scholar
  30. S. Reed, Z. Akata, B. Schiele, and H. Lee. 2016. Learning Deep Representations of Fine-grained Visual Descriptions. In CVPR .Google ScholarGoogle Scholar
  31. A. Salvador, N. Hynes, Y. Aytar, J. Marin, F. Ofli, I. Weber, and A. Torralba. 2017. Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. In CVPR .Google ScholarGoogle Scholar
  32. H. Sattar, S. Muller, M. Fritz, and A. Bulling. 2015. Prediction of Search Targets From Fixations in Open-World Settings. In CVPR .Google ScholarGoogle Scholar
  33. W. J. Scheirer and T. E. Boult L. P. Jain. 2014. Probability Models for Open Set Recognition. IEEE T-PAMI (2014).Google ScholarGoogle Scholar
  34. W. J. Scheirer, A. Rocha, R. J. Micheals, and T. E. Boult. 2011. Meta-Recognition: The Theory and Practice of Recognition Score Analysis. IEEE T-PAMI (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. J. Scheirer, A. Rocha, A. Sapkota, and T. E. Boult. 2013. Towards Open Set Recognition. IEEE T-PAMI (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. L. Shu, H. Xu, and B. Liu. 2017. DOC: Deep Open Classification of Text Documents. In EMNLP .Google ScholarGoogle Scholar
  37. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  38. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In CVPR .Google ScholarGoogle Scholar
  39. Y. H. H. Tsai, L. K. Huang, and R. Salakhutdinov. 2017. Learning Robust Visual-Semantic Embeddings. In ICCV .Google ScholarGoogle Scholar
  40. C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report.Google ScholarGoogle Scholar
  41. L. Wang, Y. Li, J. Huang, and S. Lazebnik. 2017. Learning Two-Branch Neural Networks for Image-Text Matching Tasks. IEEE TPAMI (2017).Google ScholarGoogle Scholar
  42. Y. Wen, K. Zhang, Z. Li, and Y. Qiao. 2016. A Discriminative Feature Learning Approach for Deep Face Recognition. In ECCV .Google ScholarGoogle Scholar
  43. Yongqin Xian, Bernt Schiele, and Zeynep Akata. 2017. Zero-Shot Learning - The Good, the Bad and the Ugly. In CVPR .Google ScholarGoogle Scholar
  44. T. Xiao, Y. Xu, K. Yang, J. Zhang, Y. Peng, and Z. Zhang. 2015. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In CVPR .Google ScholarGoogle Scholar
  45. S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. 2016. Aggregated Residual Transformations for Deep Neural Networks. In arXiv:1611.05431 .Google ScholarGoogle Scholar
  46. H. Zhang, T. Xu, M. Elhoseiny, X. Huang, S. Zhang, A. Elgammal, and D. Metaxas. 2016b. SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition. In CVPR .Google ScholarGoogle Scholar
  47. N. Zhang, J. Donahue, R. Girshick, and T. Darrell. 2014. Part-based R-CNNs for Fine-grained Category Detection. In ECCV .Google ScholarGoogle Scholar
  48. X. Zhang, H. Xiong, W. Zhou, W. Lin, and Q. Tian. 2016a. Picking Deep Filter Responses for Fine-Grained Image Recognition. In CVPR .Google ScholarGoogle Scholar
  49. H. Zhao, X. Puig, B. Zhou, S. Fidler, and A. Torralba. 2017. Open Vocabulary Scene Parsing. In CVPR .Google ScholarGoogle Scholar
  50. H. Zheng, J. Fu, T. Mei, and J. Luo. 2017. Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition. In ICCV .Google ScholarGoogle Scholar
  51. W. Zheng, S. Gong, and T. Xiang. 2016. Towards Open-World Person Re-Identification by One-Shot Group-Based Verification. IEEE T-PAMI (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. WildFish: A Large Benchmark for Fish Recognition in the Wild

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '18: Proceedings of the 26th ACM international conference on Multimedia
        October 2018
        2167 pages
        ISBN:9781450356657
        DOI:10.1145/3240508

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 October 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader