Skip to main content
Log in

IR-Capsule: Two-Stream Network for Face Forgery Detection

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

With the emergence of deep learning, generating forged images or videos has become much easier in recent years. Face forgery detection, as a way to detect forgery, is an important topic in digital media forensics. Despite previous works having made remarkable progress, the spatial relationships of each part of the face that has significant forgery clues are seldom explored. To overcome this shortcoming, a two-stream face forgery detection network that fuses Inception ResNet stream and capsule network stream (IR-Capsule) is proposed in this paper, which can learn both conventional facial features and hierarchical pose relationships and angle features between different parts of the face. Furthermore, part of the Inception ResNet V1 model pre-trained on the VGGFACE2 dataset is utilized as an initial feature extractor to reduce overfitting and training time, and a modified capsule loss is proposed for the IR-Capsule network. Experimental results on the challenging FaceForensics++ benchmark show that the proposed IR-Capsule improves accuracy by more than 3% compared with several state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Deng J, Guo J, Xue N, Zafeiriou S. ArcFace: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 4690–9.

  2. Lin K, Zhao H, Lv J, Li C, Liu X, Chen R, Zhao R. Face detection and segmentation based on improved mask R-CNN. Discrete Dyn Nat Soc. 2020;2020:9242917.

    Article  Google Scholar 

  3. Fang Z, Ren J, Marshall S, Zhao H, Wang Z, Huang K, Xiao B. Triple loss for hard face detection. Neurocomputing. 2020;398:20–30.

    Article  Google Scholar 

  4. Zhao J, Han J, Shao L. Unconstrained face recognition using a set-to-set distance measure on deep learned features. IEEE Trans Circuits Syst Video Technol. 2017;28(10):2679–89.

    Article  Google Scholar 

  5. Yan Y, Ren J, Zhao H, Sun G, Wang Z, Zheng J, Marshall S, Soraghan J. Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos. Cognit Comput. 2018;10(1):94–104.

    Article  Google Scholar 

  6. Wang Z, Ren J, Zhang D, Sun M, Jiang J. A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos. Neurocomputing. 2018;287:68–83.

    Article  Google Scholar 

  7. Li S, Jiang L, Wu X, Han W, Zhao D, Wang Z. A weighted network community detection algorithm based on deep learning. Appl Math Comput. 2021;401:126012.

  8. Han W, Tian Z, Zhu C, Huang Z, Jia Y, Guizani M. A topic representation model for online social networks based on hybrid human-artificial intelligence. IEEE Trans Comput Soc Syst. 2019;8:191–200.

    Article  Google Scholar 

  9. Han W, Tian Z, Huang Z, Li S, Jia Y. Topic representation model based on microblogging behavior analysis. World Wide Web. 2020;23(6):3083–97.

    Article  Google Scholar 

  10. Verdoliva L. Media forensics and DeepFakes: an overview. IEEE J Sel Top Signal Process. 2020;14(5):910–32.

    Article  Google Scholar 

  11. Sabour S, Frosst N, Hinton GE. Dynamic routing between capsules. In: Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017. p. 3859–69.

  12. Zhu K, Chen Y, Ghamisi P, Jia X, Benediktsson JA. Deep convolutional capsule network for hyperspectral image spectral and spectral-spatial classification. Remote Sens. 2019;11(3):223.

    Article  Google Scholar 

  13. Paoletti ME, Haut JM, Fernandez-Beltran R, Plaza J, Plaza A, Li J, Pla F. Capsule networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2018;57(4):2145–60.

    Article  Google Scholar 

  14. Zhu Z, Peng G, Chen Y, Gao H. A convolutional neural network based on a capsule network with strong generalization for bearing fault diagnosis. Neurocomputing. 2019;323:62–75.

    Article  Google Scholar 

  15. Bregler C, Covell M, Slaney M. Video rewrite: driving visual speech with audio. In: Proceedings of the 24th annual conference on Computer graphics and interactive techniques; 1997. p. 353–60.

  16. Alexander O, Rogers M, Lambeth W, Chiang JY, Ma WC, Wang CC, Debevec P. The Digital Emily Project: achieving a photorealistic digital actor. IEEE Comput Graph Appl. 2010;30(4):20–31.

    Article  Google Scholar 

  17. Dale K, Sunkavalli K, Johnson MK, Vlasic D, Matusik W, Pfister H. Video face replacement. ACM Trans Graph. 2011;30(6):1–10.

    Article  Google Scholar 

  18. Garrido P, Valgaerts L, Sarmadi H, Steiner I, Varanasi K, Perez P, Theobalt C. VDub: modifying face video of actors for plausible visual alignment to a dubbed audio track. Comput Graph Forum. 2015;34(2):193–204.

    Article  Google Scholar 

  19. Thies J, Zollhöfer M, Nießner M, Valgaerts L, Stamminger M, Theobalt C. Real-time expression transfer for facial reenactment. ACM Trans Graph. 2015;34(6):183–91.

    Article  Google Scholar 

  20. Thies J, Zollhöfer M, Stamminger M, Theobalt C, Nießner M. FaceVR: real-time facial reenactment and eye gaze control in virtual reality. arXiv preprint; 2016. https://arxiv.org/abs/1610.03151.

  21. Thies J, Zollhöfer M, Theobalt C, Stamminger M, Nießner M. HeadOn: real-time reenactment of human portrait videos. ACM Trans Graph. 2018;37(4):1–3.

    Article  Google Scholar 

  22. Kim H, Elgharib M, Zollhöfer M, Seidel HP, Beeler T, Richardt C, Theobalt C. Neural style-preserving visual dubbing. ACM Trans Graph. 2019;38(6):1–3.

    Google Scholar 

  23. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial networks. Commun ACM. 2020;63(11):139–44.

    Article  MathSciNet  Google Scholar 

  24. Nirkin Y, Keller Y, Hassner T. FSGAN: subject agnostic face swapping and reenactment. In: Proceedings of the IEEE/CVF international conference on computer vision; 2019. p. 7184–93.

  25. Tripathy S, Kannala J, Rahtu E. ICface: interpretable and controllable face reenactment using GANs. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; 2020. p. 3385–94.

  26. FaceSwap. www.github.com/MarekKowalski/FaceSwap. Accessed 10 May 2021.

  27. Thies J, Zollhofer M, Stamminger M, Theobalt C, Nießner M. Face2Face: real-time face capture and reenactment of RGB videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 2387–95.

  28. DeepFakes. www.github.com/deepfakes/faceswap. Accessed 10 May 2021.

  29. Thies J, Zollhöfer M, Nießner M. Deferred neural rendering: image synthesis using neural textures. ACM Trans Graph. 2019;38(4):1–2.

    Article  Google Scholar 

  30. Fridrich J, Kodovsky J. Rich models for steganalysis of digital images. IEEE Trans Inf Forensics Secur. 2012;7(3):868–82.

    Article  Google Scholar 

  31. Cozzolino D, Poggi G, Verdoliva L. Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In: Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security; 2017. p. 159–164.

  32. Lyu S, Pan X, Zhang X. Exposing region splicing forgeries with blind local noise estimation. Int J Comput Vis. 2014;110(2):202–21.

    Article  Google Scholar 

  33. Popescu AC, Farid H. Exposing digital forgeries in color filter array interpolated images. IEEE Trans Signal Process. 2005;53(10):3948–59.

    Article  MathSciNet  MATH  Google Scholar 

  34. Gallagher AC, Chen T. Image authentication by detecting traces of demosaicing. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 2008; IEEE. p. 1–8.

  35. Dirik AE, Nasir M. Image tamper detection based on demosaicing artifacts. In: 16th IEEE International Conference on Image Processing (ICIP). IEEE; 2009.

  36. Ho JS, Au OC, Zhou J, Guo Y. Inter-channel demosaicking traces for digital image forensics. In: 2010 IEEE International Conference on Multimedia and Expo. IEEE; 2010. p. 1475–80.

  37. Bianchi T, Piva A. Image forgery localization via block-grained analysis of JPEG artifacts. IEEE Trans Inf Forensics Secur. 2012;7(3):1003–17.

    Article  Google Scholar 

  38. Fan Z, De Queiroz RL. Identification of bitmap compression history: JPEG detection and quantizer estimation. IEEE Trans Image Process. 2003;12(2):230–5.

    Article  Google Scholar 

  39. Luo W, Qu Z, Huang J, Qiu G. A novel method for detecting cropped and recompressed image block. In: 2007 IEEE International Conference on Acoustics Speech and Signal Processing-ICASSP’07. IEEE; 2007. (Vol. 2, pp. II-217).

  40. Li W, Yuan Y, Yu N. Passive detection of doctored JPEG image via block artifact grid extraction. Signal Process. 2009;89(9):1821–9.

    Article  MATH  Google Scholar 

  41. Lin Z, He J, Tang X, Tang CK. Fast, automatic and fine-grained tampered JPEG image detection via DCT coefficient analysis. Pattern Recognit. 2009;42(11):2492–501.

    Article  MATH  Google Scholar 

  42. Iakovidou C, Zampoglou M, Papadopoulos S, Kompatsiaris Y. Content-aware detection of JPEG grid inconsistencies for intuitive image forensics. J Vis Commun Image Represent. 2018;54:155–70.

    Article  Google Scholar 

  43. Zhou P, Han X, Morariu VI, Davis LS. Two-stream neural networks for tampered face detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE; 2017. p. 1831–9.

  44. Zabalza J, Ren J, Zheng J, Han J, Zhao H, Li S, Marshall S. Novel two-dimensional singular spectrum analysis for effective feature extraction and data classification in hyperspectral imaging. IEEE Trans Geosci Remote Sens. 2015;53(8):4418–33.

    Article  Google Scholar 

  45. Güera D, Delp EJ. DeepFake video detection using recurrent neural networks. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE; 2018. p. 1–6.

  46. Sabir E, Cheng J, Jaiswal A, AbdAlmageed W, Masi I, Natarajan P. Recurrent convolutional strategies for face manipulation detection in videos. Interfaces (GUI). 2019;3(1):80–7.

    Google Scholar 

  47. Nguyen HH, Tieu TN, Nguyen-Son HQ, Nozick V, Yamagishi J, Echizen I. Modular convolutional neural network for discriminating between computer-generated images and photographic images. In: Proceedings of the 13th international conference on availability, reliability and security; 2018. p. 1–10.

  48. Nguyen HH, Fang F, Yamagishi J, Echizen I. Multi-task learning for detecting and segmenting manipulated facial images and videos. arXiv preprint; 2019. https://arxiv.org/abs/1906.06876.

  49. Nguyen HH, Yamagishi J, Echizen I. Capsule-forensics: using capsule networks to detect forged images and videos. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2019. p. 2307–2311.

  50. Nguyen HH, Yamagishi J, Echizen I. Use of a capsule network to detect fake images and videos. arXiv preprint; 2019. https://arxiv.org/abs/1910.12467.

  51. Amerini I, Galteri L, Caldelli R, Del Bimbo A. DeepFake video detection through optical flow based CNN. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops; 2019.

  52. Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M. FaceForensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019. p. 1–11.

  53. Dogonadze N, Obernosterer J, Hou J. Deep face forgery detection. arXiv preprint; 2020. https://arxiv.org/abs/2004.11804.

  54. Li L, Bao J, Zhang T, Yang H, Chen D, Wen F, Guo B. Face x-ray for more general face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020. p. 5001–10.

  55. Zhang K, Zhang Z, Li Z, Qiao Y. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett. 2016;23(10):1499–503.

    Article  Google Scholar 

  56. Cao Q, Shen L, Xie W, Parkhi OM, Zisserman A. VGGFace2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE; 2018. p. 67–74.

  57. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence; 2017.

  58. Li S, Zhao D, Wu X, Tian Z, Li A, Wang Z. Functional immunization of networks based on message passing. Appl Math Comput. 2020;366:124728.

  59. Rahmouni N, Nozick V, Yamagishi J, Echizen I. Distinguishing computer graphics from natural images using convolution neural networks. In: 2017 IEEE Workshop on Information Forensics and Security (WIFS). IEEE; 2017. p. 1–6.

  60. Bayar B, Stamm MC. A deep learning approach to universal image manipulation detection using a new convolutional layer. In: Proceedings of the 4th ACM workshop on information hiding and multimedia security; 2016. p. 5–10.

  61. Liu H, Simonyan K, Yang Y. Darts: differentiable architecture search. arXiv preprint; 2018. https://arxiv.org/abs/1806.09055.

  62. Baek JY, Yoo YS, Bae SH. Generative adversarial ensemble learning for face forensics. IEEE Access. 2020;8:45421–31.

    Article  Google Scholar 

  63. Afchar D, Nozick V, Yamagishi J, Echizen I. MesoNet: a compact facial video forgery detection network. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE; 2018. p. 1–7.

Download references

Funding

This work was partly supported by National Natural Science Foundation of China (No. 61972106, U1803263, 61902082), The Major Key Project of PCL (No. PCL2021A02), National Key research and Development Plan (No. 2019QY1406), Key-Area Research and Development Program of Guangdong Province (No. 2019B010136003), Dongguan Innovative Research Team Program (No. 2018607201008), Guangdong Higher Education Innovation Group (No. 2020KCXTD007), Guangzhou Higher Education Innovation Group (No. 202032854), Guangzhou Key research and Development Plan (No. 202206030001), Key Laboratory of the Education Department of Guangdong Province (No. 2019KSYS009), and Scientific and Technological Planning Projects of Guangdong Province (No. 2021A0505030074).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Weihong Han or Shudong Li.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, K., Han, W., Li, S. et al. IR-Capsule: Two-Stream Network for Face Forgery Detection. Cogn Comput 15, 13–22 (2023). https://doi.org/10.1007/s12559-022-10008-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-022-10008-4

Keywords

Navigation