ABSTRACT
Fish recognition is an important task to understand the marine ecosystem and biodiversity. It is often challenging to identify fish species in the wild, due to the following difficulties. First, most fish benchmarks are small-scale, which may limit the representation power of machine learning models. Second, the number of fish species is huge, and there may still exist unknown categories in our planet. The traditional classifiers often fail to deal with this open-set scenario. Third, certain fish species are highly-confused. It is often hard to figure out the subtle differences, only by the unconstrained images. Motivated by these facts, we introduce a large-scale WildFish benchmark for fish recognition in the wild. Specifically, we make three contributions in this paper. First, WildFish is the largest image data set for wild fish recognition, to our best knowledge. It consists of 1000 fish categories with 54,459 unconstrained images, allowing to train high-capacity models for automatic fish classification. Second, we propose a novel open-set fish classification task for realistic scenarios, and investigate the open-set deep learning framework with a number of practical designs. Third, we propose a novel fine-grained recognition task, with the guidance of pairwise textual descriptions. Via leveraging the comparison knowledge in the sentence, we design a multi-modal fish net to effectively distinguish two confused categories in a pair. Finally, we release WildFish (https://github.com/PeiqinZhuang/WildFish), in order to bring benefit to more research studies in multimedia and beyond.
- K. Anantharajah, Z. Ge, C. McCool, S. Denman, C. Fookes, P. Corke, D. Tjondronegoro, and S. Sridharan. 2014. Local Inter-Session Variability Modelling for Object Classification. In WACV .Google Scholar
- J. L. Ba, K. Swersky, S. Fidler, and R. Salakhutdinov. 2015. Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions. In ICCV . Google ScholarDigital Library
- A. Bendale and T. Boult. 2015. Towards Open World Recognition. In CVPR .Google Scholar
- A. Bendale and T. E. Boult. 2016. Towards Open Set Deep Networks. In CVPR .Google Scholar
- B. J. Boom, P. X. Huang, J. He, and R. B. Fisher. 2012. Supporting Ground-Truth annotation of image datasets using clustering. In ICPR .Google Scholar
- P. P. Busto and J. Gall. 2017. Open Set Domain Adaptation. In ICCV .Google Scholar
- L. Castrejon, Y. Aytar, C. Vondrick, H. Pirsiavash, and A. Torralba. 2016. Learning Aligned Cross-Modal Representations from Weakly Aligned Data. In CVPR .Google Scholar
- LifeCLEF Challenges. 2017. ImageCLEF/LifeCLEF. In http://www.imageclef.org/.Google Scholar
- F. Chollet. 2017. Xception: Deep Learning with Depthwise Separable Convolutions. In CVPR .Google Scholar
- G. Cutter, K. Stierhoff, and J. Zeng. 2015. Automated detection of rockfish in unconstrained underwater videos using Haar cascades and a new image dataset: labeled fishes in the wild. In WACV . Google ScholarDigital Library
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR .Google Scholar
- Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML . Google ScholarDigital Library
- A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, and T. Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In NIPS . Google ScholarDigital Library
- J. Fu, H. Zheng, and T. Mei. 2017b. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition. In CVPR .Google Scholar
- Y. Fu, T. Xiang, Y. Jiang, X. Xue, L. Sigal, and S. Gong. 2017a. Recent Advances in Zero-shot Recognition. IEEE Signal Processing Magazine (2017).Google Scholar
- Z. Ge, S. Demyanov, Z. Chen, and R. Garnavi. 2017. Generative OpenMax for Multi-Class Open Set Classification. In BMVC .Google Scholar
- M. Gunther, S. Cruz, E. M. Rudd, and T. E. Boult. 2017. Toward Open-Set Face Recognition. In CVPRW .Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. In arXiv:1512.03385 .Google Scholar
- X. He and Y. Peng. 2017. Fine-Grained Image Classification via Combining Vision and Language. In CVPR .Google Scholar
- G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. 2017. Densely Connected Convolutional Networks. In CVPR .Google Scholar
- S. Huang, Z. Xu, D. Tao, and Y. Zhang. 2016. Part-Stacked CNN for Fine-Grained Visual Categorization. In CVPR .Google Scholar
- iNaturalist Competition. 2018. The Fifth Workshop on Fine-Grained Visual Categorization. In CVPR .Google Scholar
- L. P. Jain, W. J. Scheirer, and T. E. Boult. 2014. Multi-class Open Set Recognition Using Probability of Inclusion. In ECCV .Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS . Google ScholarDigital Library
- D. Lin, X. Shen, C. Lu, and J. Jia. 2015. Deep LAC: Deep localization, alignment and classification for fine-grained recognition. In CVPR .Google Scholar
- T. Y. Lin, A. Roychowdhury, and S. Maji. 2016. Bilinear CNN Models for Fine-Grained Visual Recognition. In ICCV . Google ScholarDigital Library
- X. Liu, T. Xia, J. Wang, Y. Yang, F. Zhou, and Y. Lin. 2017. Fully Convolutional Attention Networks for Fine-Grained Recognition. arxiv:1603.06765 (2017).Google Scholar
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arxiv:1301.3781 (2013).Google ScholarDigital Library
- Y. Peng, X. He, and J. Zhao. 2018. Object-Part Attention Model for Fine-grained Image Classification. IEEE TIP (2018).Google Scholar
- S. Reed, Z. Akata, B. Schiele, and H. Lee. 2016. Learning Deep Representations of Fine-grained Visual Descriptions. In CVPR .Google Scholar
- A. Salvador, N. Hynes, Y. Aytar, J. Marin, F. Ofli, I. Weber, and A. Torralba. 2017. Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. In CVPR .Google Scholar
- H. Sattar, S. Muller, M. Fritz, and A. Bulling. 2015. Prediction of Search Targets From Fixations in Open-World Settings. In CVPR .Google Scholar
- W. J. Scheirer and T. E. Boult L. P. Jain. 2014. Probability Models for Open Set Recognition. IEEE T-PAMI (2014).Google Scholar
- W. J. Scheirer, A. Rocha, R. J. Micheals, and T. E. Boult. 2011. Meta-Recognition: The Theory and Practice of Recognition Score Analysis. IEEE T-PAMI (2011). Google ScholarDigital Library
- W. J. Scheirer, A. Rocha, A. Sapkota, and T. E. Boult. 2013. Towards Open Set Recognition. IEEE T-PAMI (2013). Google ScholarDigital Library
- L. Shu, H. Xu, and B. Liu. 2017. DOC: Deep Open Classification of Text Documents. In EMNLP .Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014).Google Scholar
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In CVPR .Google Scholar
- Y. H. H. Tsai, L. K. Huang, and R. Salakhutdinov. 2017. Learning Robust Visual-Semantic Embeddings. In ICCV .Google Scholar
- C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report.Google Scholar
- L. Wang, Y. Li, J. Huang, and S. Lazebnik. 2017. Learning Two-Branch Neural Networks for Image-Text Matching Tasks. IEEE TPAMI (2017).Google Scholar
- Y. Wen, K. Zhang, Z. Li, and Y. Qiao. 2016. A Discriminative Feature Learning Approach for Deep Face Recognition. In ECCV .Google Scholar
- Yongqin Xian, Bernt Schiele, and Zeynep Akata. 2017. Zero-Shot Learning - The Good, the Bad and the Ugly. In CVPR .Google Scholar
- T. Xiao, Y. Xu, K. Yang, J. Zhang, Y. Peng, and Z. Zhang. 2015. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In CVPR .Google Scholar
- S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. 2016. Aggregated Residual Transformations for Deep Neural Networks. In arXiv:1611.05431 .Google Scholar
- H. Zhang, T. Xu, M. Elhoseiny, X. Huang, S. Zhang, A. Elgammal, and D. Metaxas. 2016b. SPDA-CNN: Unifying Semantic Part Detection and Abstraction for Fine-Grained Recognition. In CVPR .Google Scholar
- N. Zhang, J. Donahue, R. Girshick, and T. Darrell. 2014. Part-based R-CNNs for Fine-grained Category Detection. In ECCV .Google Scholar
- X. Zhang, H. Xiong, W. Zhou, W. Lin, and Q. Tian. 2016a. Picking Deep Filter Responses for Fine-Grained Image Recognition. In CVPR .Google Scholar
- H. Zhao, X. Puig, B. Zhou, S. Fidler, and A. Torralba. 2017. Open Vocabulary Scene Parsing. In CVPR .Google Scholar
- H. Zheng, J. Fu, T. Mei, and J. Luo. 2017. Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition. In ICCV .Google Scholar
- W. Zheng, S. Gong, and T. Xiang. 2016. Towards Open-World Person Re-Identification by One-Shot Group-Based Verification. IEEE T-PAMI (2016). Google ScholarDigital Library
Index Terms
- WildFish: A Large Benchmark for Fish Recognition in the Wild
Recommendations
Fish Image Classification Using Deep Convolutional Neural Network
CIPAE 2020: Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced EducationScientific research on species composition and geographical distribution of marine organisms is of great significance to the research of marine resources and the protection of rare species of marine life. In these studies, divers or underwater robots ...
Open-Set Plankton Recognition Using Similarity Learning
Advances in Visual ComputingAbstractAutomatic plankton recognition provides new possibilities to study plankton populations and various environmental aspects related to them. Most of the existing recognition methods focus on individual datasets with a known set of classes limiting ...
Robust Underwater Fish Classification Based on Data Augmentation by Adding Noises in Random Local Regions
Advances in Multimedia Information Processing – PCM 2018AbstractUnderwater fish classification is in great demand, but the unrestricted natural environment makes it a challenging task. The monitor placed underwater gets a lot of low-quality and hard-to-mark marine fish images. These images suffer from various ...
Comments