research-article

WildFish: A Large Benchmark for Fish Recognition in the Wild

Authors:
Peiqin Zhuang

Chinese Academy of Sciences, Shenzhen, China

Chinese Academy of Sciences, Shenzhen, China
View Profile

,
Yali Wang

Chinese Academy of Sciences, Shenzhen, China

Chinese Academy of Sciences, Shenzhen, China
View Profile

,
Yu Qiao

Chinese Academy of Sciences, Shenzhen, China

Chinese Academy of Sciences, Shenzhen, China
View Profile

MM '18: Proceedings of the 26th ACM international conference on MultimediaOctober 2018Pages 1301–1309https://doi.org/10.1145/3240508.3240616

Published:15 October 2018Publication History

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 1301–1309

ABSTRACT

Fish recognition is an important task to understand the marine ecosystem and biodiversity. It is often challenging to identify fish species in the wild, due to the following difficulties. First, most fish benchmarks are small-scale, which may limit the representation power of machine learning models. Second, the number of fish species is huge, and there may still exist unknown categories in our planet. The traditional classifiers often fail to deal with this open-set scenario. Third, certain fish species are highly-confused. It is often hard to figure out the subtle differences, only by the unconstrained images. Motivated by these facts, we introduce a large-scale WildFish benchmark for fish recognition in the wild. Specifically, we make three contributions in this paper. First, WildFish is the largest image data set for wild fish recognition, to our best knowledge. It consists of 1000 fish categories with 54,459 unconstrained images, allowing to train high-capacity models for automatic fish classification. Second, we propose a novel open-set fish classification task for realistic scenarios, and investigate the open-set deep learning framework with a number of practical designs. Third, we propose a novel fine-grained recognition task, with the guidance of pairwise textual descriptions. Via leveraging the comparison knowledge in the sentence, we design a multi-modal fish net to effectively distinguish two confused categories in a pair. Finally, we release WildFish (https://github.com/PeiqinZhuang/WildFish), in order to bring benefit to more research studies in multimedia and beyond.

References

K. Anantharajah, Z. Ge, C. McCool, S. Denman, C. Fookes, P. Corke, D. Tjondronegoro, and S. Sridharan. 2014. Local Inter-Session Variability Modelling for Object Classification. In WACV .Google Scholar
J. L. Ba, K. Swersky, S. Fidler, and R. Salakhutdinov. 2015. Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions. In ICCV . Google ScholarDigital Library
A. Bendale and T. Boult. 2015. Towards Open World Recognition. In CVPR .Google Scholar
A. Bendale and T. E. Boult. 2016. Towards Open Set Deep Networks. In CVPR .Google Scholar
B. J. Boom, P. X. Huang, J. He, and R. B. Fisher. 2012. Supporting Ground-Truth annotation of image datasets using clustering. In ICPR .Google Scholar
P. P. Busto and J. Gall. 2017. Open Set Domain Adaptation. In ICCV .Google Scholar
L. Castrejon, Y. Aytar, C. Vondrick, H. Pirsiavash, and A. Torralba. 2016. Learning Aligned Cross-Modal Representations from Weakly Aligned Data. In CVPR .Google Scholar
LifeCLEF Challenges. 2017. ImageCLEF/LifeCLEF. In http://www.imageclef.org/.Google Scholar
F. Chollet. 2017. Xception: Deep Learning with Depthwise Separable Convolutions. In CVPR .Google Scholar
G. Cutter, K. Stierhoff, and J. Zeng. 2015. Automated detection of rockfish in unconstrained underwater videos using Haar cascades and a new image dataset: labeled fishes in the wild. In WACV . Google ScholarDigital Library
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR .Google Scholar
Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML . Google ScholarDigital Library
A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, and T. Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In NIPS . Google ScholarDigital Library
J. Fu, H. Zheng, and T. Mei. 2017b. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition. In CVPR .Google Scholar
Y. Fu, T. Xiang, Y. Jiang, X. Xue, L. Sigal, and S. Gong. 2017a. Recent Advances in Zero-shot Recognition. IEEE Signal Processing Magazine (2017).Google Scholar
Z. Ge, S. Demyanov, Z. Chen, and R. Garnavi. 2017. Generative OpenMax for Multi-Class Open Set Classification. In BMVC .Google Scholar
M. Gunther, S. Cruz, E. M. Rudd, and T. E. Boult. 2017. Toward Open-Set Face Recognition. In CVPRW .Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. In arXiv:1512.03385 .Google Scholar
X. He and Y. Peng. 2017. Fine-Grained Image Classification via Combining Vision and Language. In CVPR .Google Scholar
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. 2017. Densely Connected Convolutional Networks. In CVPR .Google Scholar
S. Huang, Z. Xu, D. Tao, and Y. Zhang. 2016. Part-Stacked CNN for Fine-Grained Visual Categorization. In CVPR .Google Scholar
iNaturalist Competition. 2018. The Fifth Workshop on Fine-Grained Visual Categorization. In CVPR .Google Scholar
L. P. Jain, W. J. Scheirer, and T. E. Boult. 2014. Multi-class Open Set Recognition Using Probability of Inclusion. In ECCV .Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS . Google ScholarDigital Library
D. Lin, X. Shen, C. Lu, and J. Jia. 2015. Deep LAC: Deep localization, alignment and classification for fine-grained recognition. In CVPR .Google Scholar
T. Y. Lin, A. Roychowdhury, and S. Maji. 2016. Bilinear CNN Models for Fine-Grained Visual Recognition. In ICCV . Google ScholarDigital Library
X. Liu, T. Xia, J. Wang, Y. Yang, F. Zhou, and Y. Lin. 2017. Fully Convolutional Attention Networks for Fine-Grained Recognition. arxiv:1603.06765 (2017).Google Scholar
T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arxiv:1301.3781 (2013).Google ScholarDigital Library
Y. Peng, X. He, and J. Zhao. 2018. Object-Part Attention Model for Fine-grained Image Classification. IEEE TIP (2018).Google Scholar
S. Reed, Z. Akata, B. Schiele, and H. Lee. 2016. Learning Deep Representations of Fine-grained Visual Descriptions. In CVPR .Google Scholar
A. Salvador, N. Hynes, Y. Aytar, J. Marin, F. Ofli, I. Weber, and A. Torralba. 2017. Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. In CVPR .Google Scholar
H. Sattar, S. Muller, M. Fritz, and A. Bulling. 2015. Prediction of Search Targets From Fixations in Open-World Settings. In CVPR .Google Scholar
W. J. Scheirer and T. E. Boult L. P. Jain. 2014. Probability Models for Open Set Recognition. IEEE T-PAMI (2014).Google Scholar
W. J. Scheirer, A. Rocha, R. J. Micheals, and T. E. Boult. 2011. Meta-Recognition: The Theory and Practice of Recognition Score Analysis. IEEE T-PAMI (2011). Google ScholarDigital Library
W. J. Scheirer, A. Rocha, A. Sapkota, and T. E. Boult. 2013. Towards Open Set Recognition. IEEE T-PAMI (2013). Google ScholarDigital Library
L. Shu, H. Xu, and B. Liu. 2017. DOC: Deep Open Classification of Text Documents. In EMNLP .Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014).Google Scholar
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In CVPR .Google Scholar
Y. H. H. Tsai, L. K. Huang, and R. Salakhutdinov. 2017. Learning Robust Visual-Semantic Embeddings. In ICCV .Google Scholar
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report.Google Scholar
L. Wang, Y. Li, J. Huang, and S. Lazebnik. 2017. Learning Two-Branch Neural Networks for Image-Text Matching Tasks. IEEE TPAMI (2017).Google Scholar
Y. Wen, K. Zhang, Z. Li, and Y. Qiao. 2016. A Discriminative Feature Learning Approach for Deep Face Recognition. In ECCV .Google Scholar
Yongqin Xian, Bernt Schiele, and Zeynep Akata. 2017. Zero-Shot Learning - The Good, the Bad and the Ugly. In CVPR .Google Scholar
T. Xiao, Y. Xu, K. Yang, J. Zhang, Y. Peng, and Z. Zhang. 2015. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In CVPR .Google Scholar
S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. 2016. Aggregated Residual Transformations for Deep Neural Networks. In arXiv:1611.05431 .Google Scholar
H. Zhang, T. Xu, M. Elhoseiny, X. Huang, S. Zhang, A. Elgammal, and D. Metaxas. 2016b. SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition. In CVPR .Google Scholar
N. Zhang, J. Donahue, R. Girshick, and T. Darrell. 2014. Part-based R-CNNs for Fine-grained Category Detection. In ECCV .Google Scholar
X. Zhang, H. Xiong, W. Zhou, W. Lin, and Q. Tian. 2016a. Picking Deep Filter Responses for Fine-Grained Image Recognition. In CVPR .Google Scholar
H. Zhao, X. Puig, B. Zhou, S. Fidler, and A. Torralba. 2017. Open Vocabulary Scene Parsing. In CVPR .Google Scholar
H. Zheng, J. Fu, T. Mei, and J. Luo. 2017. Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition. In ICCV .Google Scholar
W. Zheng, S. Gong, and T. Xiang. 2016. Towards Open-World Person Re-Identification by One-Shot Group-Based Verification. IEEE T-PAMI (2016). Google ScholarDigital Library

Index Terms

WildFish: A Large Benchmark for Fish Recognition in the Wild
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Fish Image Classification Using Deep Convolutional Neural Network
CIPAE 2020: Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced Education

Scientific research on species composition and geographical distribution of marine organisms is of great significance to the research of marine resources and the protection of rare species of marine life. In these studies, divers or underwater robots ...
Read More
Open-Set Plankton Recognition Using Similarity Learning
Advances in Visual Computing
Abstract
Automatic plankton recognition provides new possibilities to study plankton populations and various environmental aspects related to them. Most of the existing recognition methods focus on individual datasets with a known set of classes limiting ...
Read More
Robust Underwater Fish Classification Based on Data Augmentation by Adding Noises in Random Local Regions
Advances in Multimedia Information Processing – PCM 2018
Abstract
Underwater fish classification is in great demand, but the unrestricted natural environment makes it a challenging task. The monitor placed underwater gets a lot of low-quality and hard-to-mark marine fish images. These images suffer from various ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
fine-grained recognition
fish classification
open-set classification
vision-text modeling
Qualifiers
- research-article
Conference

Acceptance Rates
MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 38
  Total Citations
  View Citations
- 914
  Total Downloads
- Downloads (Last 12 months)125
- Downloads (Last 6 weeks)27
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

WildFish: A Large Benchmark for Fish Recognition in the Wild

MM '18: Proceedings of the 26th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fish Image Classification Using Deep Convolutional Neural Network

Open-Set Plankton Recognition Using Similarity Learning

Robust Underwater Fish Classification Based on Data Augmentation by Adding Noises in Random Local Regions