IR-Capsule: Two-Stream Network for Face Forgery Detection

Lin, Kaihan; Han, Weihong; Li, Shudong; Gu, Zhaoquan; Zhao, Huimin; Ren, Jinchang; Zhu, Li; Lv, Jujian

doi:10.1007/s12559-022-10008-4

IR-Capsule: Two-Stream Network for Face Forgery Detection

Published: 02 June 2022

Volume 15, pages 13–22, (2023)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Kaihan Lin¹,
Weihong Han^1,2,
Shudong Li¹,
Zhaoquan Gu¹,
Huimin Zhao³,
Jinchang Ren^3,4,
Li Zhu⁵ &
…
Jujian Lv³

546 Accesses
4 Citations
Explore all metrics

Abstract

With the emergence of deep learning, generating forged images or videos has become much easier in recent years. Face forgery detection, as a way to detect forgery, is an important topic in digital media forensics. Despite previous works having made remarkable progress, the spatial relationships of each part of the face that has significant forgery clues are seldom explored. To overcome this shortcoming, a two-stream face forgery detection network that fuses Inception ResNet stream and capsule network stream (IR-Capsule) is proposed in this paper, which can learn both conventional facial features and hierarchical pose relationships and angle features between different parts of the face. Furthermore, part of the Inception ResNet V1 model pre-trained on the VGGFACE2 dataset is utilized as an initial feature extractor to reduce overfitting and training time, and a modified capsule loss is proposed for the IR-Capsule network. Experimental results on the challenging FaceForensics++ benchmark show that the proposed IR-Capsule improves accuracy by more than 3% compared with several state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deepfake: An Overview

Image forgery detection: a survey of recent deep-learning approaches

Article Open access 03 October 2022

Deepfake generation and detection, a survey

Article 08 January 2022

References

Deng J, Guo J, Xue N, Zafeiriou S. ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 4690–9.
Lin K, Zhao H, Lv J, Li C, Liu X, Chen R, Zhao R. Face detection and segmentation based on improved mask R-CNN. Discrete Dyn Nat Soc. 2020;2020:9242917.
Article Google Scholar
Fang Z, Ren J, Marshall S, Zhao H, Wang Z, Huang K, Xiao B. Triple loss for hard face detection. Neurocomputing. 2020;398:20–30.
Article Google Scholar
Zhao J, Han J, Shao L. Unconstrained face recognition using a set-to-set distance measure on deep learned features. IEEE Trans Circuits Syst Video Technol. 2017;28(10):2679–89.
Article Google Scholar
Yan Y, Ren J, Zhao H, Sun G, Wang Z, Zheng J, Marshall S, Soraghan J. Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos. Cognit Comput. 2018;10(1):94–104.
Article Google Scholar
Wang Z, Ren J, Zhang D, Sun M, Jiang J. A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos. Neurocomputing. 2018;287:68–83.
Article Google Scholar
Li S, Jiang L, Wu X, Han W, Zhao D, Wang Z. A weighted network community detection algorithm based on deep learning. Appl Math Comput. 2021;401:126012.
Han W, Tian Z, Zhu C, Huang Z, Jia Y, Guizani M. A topic representation model for online social networks based on hybrid human-artificial intelligence. IEEE Trans Comput Soc Syst. 2019;8:191–200.
Article Google Scholar
Han W, Tian Z, Huang Z, Li S, Jia Y. Topic representation model based on microblogging behavior analysis. World Wide Web. 2020;23(6):3083–97.
Article Google Scholar
Verdoliva L. Media forensics and DeepFakes: an overview. IEEE J Sel Top Signal Process. 2020;14(5):910–32.
Article Google Scholar
Sabour S, Frosst N, Hinton GE. Dynamic routing between capsules. In: Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017. p. 3859–69.
Zhu K, Chen Y, Ghamisi P, Jia X, Benediktsson JA. Deep convolutional capsule network for hyperspectral image spectral and spectral-spatial classification. Remote Sens. 2019;11(3):223.
Article Google Scholar
Paoletti ME, Haut JM, Fernandez-Beltran R, Plaza J, Plaza A, Li J, Pla F. Capsule networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2018;57(4):2145–60.
Article Google Scholar
Zhu Z, Peng G, Chen Y, Gao H. A convolutional neural network based on a capsule network with strong generalization for bearing fault diagnosis. Neurocomputing. 2019;323:62–75.
Article Google Scholar
Bregler C, Covell M, Slaney M. Video rewrite: driving visual speech with audio. In: Proceedings of the 24th annual conference on Computer graphics and interactive techniques; 1997. p. 353–60.
Alexander O, Rogers M, Lambeth W, Chiang JY, Ma WC, Wang CC, Debevec P. The Digital Emily Project: achieving a photorealistic digital actor. IEEE Comput Graph Appl. 2010;30(4):20–31.
Article Google Scholar
Dale K, Sunkavalli K, Johnson MK, Vlasic D, Matusik W, Pfister H. Video face replacement. ACM Trans Graph. 2011;30(6):1–10.
Article Google Scholar
Garrido P, Valgaerts L, Sarmadi H, Steiner I, Varanasi K, Perez P, Theobalt C. VDub: modifying face video of actors for plausible visual alignment to a dubbed audio track. Comput Graph Forum. 2015;34(2):193–204.
Article Google Scholar
Thies J, Zollhöfer M, Nießner M, Valgaerts L, Stamminger M, Theobalt C. Real-time expression transfer for facial reenactment. ACM Trans Graph. 2015;34(6):183–91.
Article Google Scholar
Thies J, Zollhöfer M, Stamminger M, Theobalt C, Nießner M. FaceVR: real-time facial reenactment and eye gaze control in virtual reality. arXiv preprint; 2016. https://arxiv.org/abs/1610.03151.
Thies J, Zollhöfer M, Theobalt C, Stamminger M, Nießner M. HeadOn: real-time reenactment of human portrait videos. ACM Trans Graph. 2018;37(4):1–3.
Article Google Scholar
Kim H, Elgharib M, Zollhöfer M, Seidel HP, Beeler T, Richardt C, Theobalt C. Neural style-preserving visual dubbing. ACM Trans Graph. 2019;38(6):1–3.
Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial networks. Commun ACM. 2020;63(11):139–44.
Article MathSciNet Google Scholar
Nirkin Y, Keller Y, Hassner T. FSGAN: subject agnostic face swapping and reenactment. In: Proceedings of the IEEE/CVF international conference on computer vision; 2019. p. 7184–93.
Tripathy S, Kannala J, Rahtu E. ICface: interpretable and controllable face reenactment using GANs. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; 2020. p. 3385–94.
FaceSwap. www.github.com/MarekKowalski/FaceSwap. Accessed 10 May 2021.
Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M. Face2Face: real-time face capture and reenactment of RGB videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 2387–95.
DeepFakes. www.github.com/deepfakes/faceswap. Accessed 10 May 2021.
Thies J, Zollhöfer M, Nießner M. Deferred neural rendering: image synthesis using neural textures. ACM Trans Graph. 2019;38(4):1–2.
Article Google Scholar
Fridrich J, Kodovsky J. Rich models for steganalysis of digital images. IEEE Trans Inf Forensics Secur. 2012;7(3):868–82.
Article Google Scholar
Cozzolino D, Poggi G, Verdoliva L. Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In: Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security; 2017. p. 159–164.
Lyu S, Pan X, Zhang X. Exposing region splicing forgeries with blind local noise estimation. Int J Comput Vis. 2014;110(2):202–21.
Article Google Scholar
Popescu AC, Farid H. Exposing digital forgeries in color filter array interpolated images. IEEE Trans Signal Process. 2005;53(10):3948–59.
Article MathSciNet MATH Google Scholar
Gallagher AC, Chen T. Image authentication by detecting traces of demosaicing. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 2008; IEEE. p. 1–8.
Dirik AE, Nasir M. Image tamper detection based on demosaicing artifacts. In: 16th IEEE International Conference on Image Processing (ICIP). IEEE; 2009.
Ho JS, Au OC, Zhou J, Guo Y. Inter-channel demosaicking traces for digital image forensics. In: 2010 IEEE International Conference on Multimedia and Expo. IEEE; 2010. p. 1475–80.
Bianchi T, Piva A. Image forgery localization via block-grained analysis of JPEG artifacts. IEEE Trans Inf Forensics Secur. 2012;7(3):1003–17.
Article Google Scholar
Fan Z, De Queiroz RL. Identification of bitmap compression history: JPEG detection and quantizer estimation. IEEE Trans Image Process. 2003;12(2):230–5.
Article Google Scholar
Luo W, Qu Z, Huang J, Qiu G. A novel method for detecting cropped and recompressed image block. In: 2007 IEEE International Conference on Acoustics Speech and Signal Processing-ICASSP’07. IEEE; 2007. (Vol. 2, pp. II-217).
Li W, Yuan Y, Yu N. Passive detection of doctored JPEG image via block artifact grid extraction. Signal Process. 2009;89(9):1821–9.
Article MATH Google Scholar
Lin Z, He J, Tang X, Tang CK. Fast, automatic and fine-grained tampered JPEG image detection via DCT coefficient analysis. Pattern Recognit. 2009;42(11):2492–501.
Article MATH Google Scholar
Iakovidou C, Zampoglou M, Papadopoulos S, Kompatsiaris Y. Content-aware detection of JPEG grid inconsistencies for intuitive image forensics. J Vis Commun Image Represent. 2018;54:155–70.
Article Google Scholar
Zhou P, Han X, Morariu VI, Davis LS. Two-stream neural networks for tampered face detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE; 2017. p. 1831–9.
Zabalza J, Ren J, Zheng J, Han J, Zhao H, Li S, Marshall S. Novel two-dimensional singular spectrum analysis for effective feature extraction and data classification in hyperspectral imaging. IEEE Trans Geosci Remote Sens. 2015;53(8):4418–33.
Article Google Scholar
Güera D, Delp EJ. DeepFake video detection using recurrent neural networks. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE; 2018. p. 1–6.
Sabir E, Cheng J, Jaiswal A, AbdAlmageed W, Masi I, Natarajan P. Recurrent convolutional strategies for face manipulation detection in videos. Interfaces (GUI). 2019;3(1):80–7.
Google Scholar
Nguyen HH, Tieu TN, Nguyen-Son HQ, Nozick V, Yamagishi J, Echizen I. Modular convolutional neural network for discriminating between computer-generated images and photographic images. In: Proceedings of the 13th international conference on availability, reliability and security; 2018. p. 1–10.
Nguyen HH, Fang F, Yamagishi J, Echizen I. Multi-task learning for detecting and segmenting manipulated facial images and videos. arXiv preprint; 2019. https://arxiv.org/abs/1906.06876.
Nguyen HH, Yamagishi J, Echizen I. Capsule-forensics: using capsule networks to detect forged images and videos. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2019. p. 2307–2311.
Nguyen HH, Yamagishi J, Echizen I. Use of a capsule network to detect fake images and videos. arXiv preprint; 2019. https://arxiv.org/abs/1910.12467.
Amerini I, Galteri L, Caldelli R, Del Bimbo A. DeepFake video detection through optical flow based CNN. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops; 2019.
Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M. FaceForensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019. p. 1–11.
Dogonadze N, Obernosterer J, Hou J. Deep face forgery detection. arXiv preprint; 2020. https://arxiv.org/abs/2004.11804.
Li L, Bao J, Zhang T, Yang H, Chen D, Wen F, Guo B. Face x-ray for more general face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 5001–10.
Zhang K, Zhang Z, Li Z, Qiao Y. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett. 2016;23(10):1499–503.
Article Google Scholar
Cao Q, Shen L, Xie W, Parkhi OM, Zisserman A. VGGFace2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE; 2018. p. 67–74.
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence; 2017.
Li S, Zhao D, Wu X, Tian Z, Li A, Wang Z. Functional immunization of networks based on message passing. Appl Math Comput. 2020;366:124728.
Rahmouni N, Nozick V, Yamagishi J, Echizen I. Distinguishing computer graphics from natural images using convolution neural networks. In: 2017 IEEE Workshop on Information Forensics and Security (WIFS). IEEE; 2017. p. 1–6.
Bayar B, Stamm MC. A deep learning approach to universal image manipulation detection using a new convolutional layer. In: Proceedings of the 4th ACM workshop on information hiding and multimedia security; 2016. p. 5–10.
Liu H, Simonyan K, Yang Y. Darts: differentiable architecture search. arXiv preprint; 2018. https://arxiv.org/abs/1806.09055.
Baek JY, Yoo YS, Bae SH. Generative adversarial ensemble learning for face forensics. IEEE Access. 2020;8:45421–31.
Article Google Scholar
Afchar D, Nozick V, Yamagishi J, Echizen I. MesoNet: a compact facial video forgery detection network. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE; 2018. p. 1–7.

Download references

Funding

This work was partly supported by National Natural Science Foundation of China (No. 61972106, U1803263, 61902082), The Major Key Project of PCL (No. PCL2021A02), National Key research and Development Plan (No. 2019QY1406), Key-Area Research and Development Program of Guangdong Province (No. 2019B010136003), Dongguan Innovative Research Team Program (No. 2018607201008), Guangdong Higher Education Innovation Group (No. 2020KCXTD007), Guangzhou Higher Education Innovation Group (No. 202032854), Guangzhou Key research and Development Plan (No. 202206030001), Key Laboratory of the Education Department of Guangdong Province (No. 2019KSYS009), and Scientific and Technological Planning Projects of Guangdong Province (No. 2021A0505030074).

Author information

Authors and Affiliations

Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China
Kaihan Lin, Weihong Han, Shudong Li & Zhaoquan Gu
Peng Cheng Laboratory, Shenzhen, China
Weihong Han
School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China
Huimin Zhao, Jinchang Ren & Jujian Lv
National Subsea Centre, Robert Gordon University, Aberdeen, UK
Jinchang Ren
Industrial Training Center, Guangdong Polytechnic Normal University, Guangzhou, China
Li Zhu

Authors

Kaihan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Weihong Han
View author publications
You can also search for this author in PubMed Google Scholar
Shudong Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoquan Gu
View author publications
You can also search for this author in PubMed Google Scholar
Huimin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jinchang Ren
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jujian Lv
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Weihong Han or Shudong Li.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, K., Han, W., Li, S. et al. IR-Capsule: Two-Stream Network for Face Forgery Detection. Cogn Comput 15, 13–22 (2023). https://doi.org/10.1007/s12559-022-10008-4

Download citation

Received: 27 August 2021
Accepted: 06 March 2022
Published: 02 June 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s12559-022-10008-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

IR-Capsule: Two-Stream Network for Face Forgery Detection

Abstract

Access this article

Similar content being viewed by others

Deepfake: An Overview

Image forgery detection: a survey of recent deep-learning approaches

Deepfake generation and detection, a survey

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

IR-Capsule: Two-Stream Network for Face Forgery Detection

Abstract

Access this article

Similar content being viewed by others

Deepfake: An Overview

Image forgery detection: a survey of recent deep-learning approaches

Deepfake generation and detection, a survey

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation