ABSTRACT
One non-negligible flaw of the convolutional neural networks (CNNs) based single image super-resolution (SISR) models is that most of them are not able to restore high-resolution (HR) images containing sufficient high-frequency information. Worse still, as the depth of CNNs increases, the training easily suffers from the vanishing gradients. These problems hinder the effectiveness of CNNs in SISR. In this paper, we propose the Dual-view Attention Networks to alleviate these problems for SISR. Specifically, we propose the local aware (LA) and global aware (GA) attentions to deal with LR features in unequal manners, which can highlight the high-frequency components and discriminate each feature from LR images in the local and global views, respectively. Furthermore, the local attentive residual-dense (LARD) block that combines the LA attention with multiple residual and dense connections is proposed to fit a deeper yet easy to train architecture. The experimental results verified the effectiveness of our model compared with other state-of-the-art methods.
Supplemental Material
- Pablo Arbelaez, Michael Maire, Charless Fowlkes, and Jitendra Malik. 2011. Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 5 (2011), 898--916.Google Scholar
- Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2015. Multiple Object Recognition with Visual Attention. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
- Simon Baker and Takeo Kanade. 2002. Limits on super-resolution and how to break them. IEEE Transactions on Pattern Analysis & Machine Intelligence 9 (2002), 1167--1183.Google ScholarDigital Library
- Pawel Benecki, Michal Kawulok, Daniel Kostrzewa, and Lukasz Skonieczny. 2018. Evaluating super-resolution reconstruction of satellite images. Acta Astronautica, Vol. 153 (2018), 15--25.Google ScholarCross Ref
- Marco Bevilacqua, Aline Roumy, Christine Guillemot, and Marie Line Alberi-Morel. 2012. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. (2012).Google Scholar
- Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, and Ram Nevatia. 2015. Abc-cnn: An attention based convolutional neural network for visual question answering. arXiv preprint arXiv:1511.05960 (2015).Google Scholar
- Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2016b. Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, Vol. 38, 2 (2016), 295--307.Google Scholar
- Chao Dong, Chen Change Loy, and Xiaoou Tang. 2016a. Accelerating the super-resolution convolutional neural network. In European conference on computer vision. Springer, 391--407.Google ScholarCross Ref
- Daniel Glasner, Shai Bagon, and Michal Irani. 2009. Super-resolution from a single image. In 2009 IEEE 12th International Conference on Computer Vision (ICCV). IEEE, 349--356.Google ScholarCross Ref
- Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2018. Deep back-projection networks for super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1664--1673.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.Google ScholarCross Ref
- Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.Google ScholarCross Ref
- Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja. 2015. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5197--5206.Google ScholarCross Ref
- Yan Huang, Wei Wang, and Liang Wang. 2018. Video super-resolution via bidirectional recurrent convolutional networks. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 4 (2018), 1015--1028.Google Scholar
- Muwei Jian and Kin-Man Lam. 2015. Simultaneous hallucination and recognition of low-resolution faces based on singular value decomposition. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 25, 11 (2015), 1761--1772.Google ScholarCross Ref
- Muwei Jian, Kin-Man Lam, and Junyu Dong. 2013. A novel face-hallucination scheme based on singular value decomposition. Pattern Recognition, Vol. 46, 11 (2013), 3091--3102.Google ScholarCross Ref
- Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. 2016. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1646--1654.Google ScholarCross Ref
- Wei-Sheng Lai, Jia-Bin Huang, Narendra Ahuja, and Ming-Hsuan Yang. 2017. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition. 624--632.Google ScholarCross Ref
- Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et almbox. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4681--4690.Google ScholarCross Ref
- Xin Li and Michael T Orchard. 2001. New edge-directed interpolation. IEEE transactions on image processing, Vol. 10, 10 (2001), 1521--1527.Google Scholar
- Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. 2017. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 136--144.Google ScholarCross Ref
- Zhouchen Lin and Heung-Yeung Shum. 2004. Fundamental limits of reconstruction-based superresolution algorithms under local translation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, 1 (2004), 83--97.Google ScholarDigital Library
- Yusuke Matsui, Kota Ito, Yuji Aramaki, Azuma Fujimoto, Toru Ogawa, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2017. Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications, Vol. 76, 20 (2017), 21811--21838.Google ScholarDigital Library
- Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. 2017. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3883--3891.Google ScholarCross Ref
- Ozan Oktay, Enzo Ferrante, Konstantinos Kamnitsas, Mattias Heinrich, Wenjia Bai, Jose Caballero, Stuart A Cook, Antonio De Marvao, Timothy Dawes, Declan P O'Regan, et almbox. 2018. Anatomically constrained neural networks (ACNNs): application to cardiac image enhancement and segmentation. IEEE transactions on medical imaging, Vol. 37, 2 (2018), 384--395.Google Scholar
- Mehdi SM Sajjadi, Bernhard Scholkopf, and Michael Hirsch. 2017. Enhancenet: Single image super-resolution through automated texture synthesis. In Proceedings of the IEEE International Conference on Computer Vision. 4491--4500.Google ScholarCross Ref
- Jian Sun, Zongben Xu, and Heung-Yeung Shum. 2008. Image super-resolution using gradient profile prior. In 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.Google Scholar
- Yu-Wing Tai, Shuaicheng Liu, Michael S Brown, and Stephen Lin. 2010. Super resolution using edge prior and single image detail synthesis. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2400--2407.Google ScholarCross Ref
- Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming-Hsuan Yang, Lei Zhang, Bee Lim, et almbox. 2017. NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.Google Scholar
- T onis Uiboupin, Pejman Rasti, Gholamreza Anbarjafari, and Hasan Demirel. 2016. Facial image super resolution using sparse representation for improving face recognition in surveillance monitoring. In 2016 24th Signal Processing and Communication Application Conference (SIU). IEEE, 437--440.Google ScholarCross Ref
- Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3156--3164.Google ScholarCross Ref
- Huijuan Xu and Kate Saenko. 2016. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In European Conference on Computer Vision. Springer, 451--466.Google ScholarCross Ref
- Shipeng Yan, Songyang Zhang, Xuming He, et almbox. 2019. A Dual Attention Network with Semantic Embedding for Few-shot Learning. (2019).Google Scholar
- Chih-Yuan Yang and Ming-Hsuan Yang. 2013. Fast direct super-resolution by simple functions. In Proceedings of the IEEE international conference on computer vision. 561--568.Google ScholarDigital Library
- Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016. Stacked attention networks for image question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 21--29.Google ScholarCross Ref
- Minghao Yin, Yongbing Zhang, Xiu Li, and Shiqi Wang. 2018. When Deep Fool Meets Deep Prior: Adversarial Attack on Super-Resolution Network. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 1930--1938.Google ScholarDigital Library
- Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In Advances in neural information processing systems. 3320--3328.Google Scholar
- Yuan Yuan, Siyuan Liu, Jiawei Zhang, Yongbing Zhang, Chao Dong, and Liang Lin. 2018. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 701--710.Google ScholarCross Ref
- Roman Zeyde, Michael Elad, and Matan Protter. 2010. On single image scale-up using sparse-representations. In International conference on curves and surfaces. Springer, 711--730.Google Scholar
- Lei Zhang and Xiaolin Wu. 2006. An edge-guided image interpolation algorithm via directional filtering and data fusion. IEEE transactions on Image Processing, Vol. 15, 8 (2006), 2226--2238.Google Scholar
- Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. 2018a. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV). 286--301.Google ScholarDigital Library
- Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu. 2018b. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2472--2481.Google ScholarCross Ref
Index Terms
- Dual-view Attention Networks for Single Image Super-Resolution
Recommendations
Lightweight Image Super-Resolution with Information Multi-distillation Network
MM '19: Proceedings of the 27th ACM International Conference on MultimediaIn recent years, single image super-resolution (SISR) methods using deep convolution neural network (CNN) have achieved impressive results. Thanks to the powerful representation capabilities of the deep networks, numerous previous ways can learn the ...
Edge Learning: The Enabling Technology for Distributed Big Data Analytics in the Edge
Machine Learning (ML) has demonstrated great promise in various fields, e.g., self-driving, smart city, which are fundamentally altering the way individuals and organizations live, work, and interact. Traditional centralized learning frameworks require ...
Transformers in Vision: A Survey
Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies between input ...
Comments