Abstract
The spread of face forgery videos is a serious threat to information credibility, calling for effective detection algorithms to identify them. Most existing methods have assumed a shared or centralized training set. However, in practice, data may be distributed on devices of different enterprises that cannot be centralized to share due to security and privacy restrictions. In this article, we propose a Federated Learning face forgery detection framework to train a global model collaboratively while keeping data on local devices. In order to make the detection model more robust, we propose a novel Inconsistency-Capture module (ICM) to capture the dynamic inconsistencies between adjacent frames of face forgery videos. The ICM contains two parallel branches. The first branch takes the whole face of adjacent frames as input to calculate a global inconsistency representation. The second branch focuses only on the inter-frame variation of critical regions to capture the local inconsistency. To the best of our knowledge, this is the first work to apply federated learning to face forgery video detection, which is trained with decentralized data. Extensive experiments show that the proposed framework achieves competitive performance compared with existing methods that are trained with centralized data, with higher-level security and privacy guarantee.
- [1] . 2018. MesoNet: A compact facial video forgery detection network. In IEEE International Workshop on Information Forensics and Security (WIFS’18), Hong Kong, China, December 11–13, 2018. IEEE, 1–7.Google Scholar
- [2] . 2019. Protecting world leaders against deep fakes. In IEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPR’19), Long Beach, CA, USA, June 16-20, 2019. IEEE, 38–45.Google Scholar
- [3] . 2019. Deepfake video detection through optical flow based CNN. In Proceedings of the IEEE/CVF International Conference on Computer Vision WorkshopsICCV Workshops 2019, Seoul, Korea (South), October 27–28, 2019. IEEE, 1205–1207.Google ScholarCross Ref
- [4] . 2018. Towards open-set identity preserving face synthesis. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. IEEE, 6713–6722.Google Scholar
- [5] . 2016. A deep learning approach to universal image manipulation detection using a new convolutional layer. In Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security, IH&MMSec 2016, Vigo, Galicia, Spain, June 20-22, 2016. ACM, 5–10.Google Scholar
- [6] . 2021. Locally GAN-generated face detection based on an improved Xception. Information Sciences 572 (2021), 16–28.Google ScholarDigital Library
- [7] . 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19), Salt Lake City, UT, USA, June 18–22, 2018. IEEE, 8789–8797.Google ScholarCross Ref
- [8] . 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17), Honolulu, HI, USA, July 21–26, 2017. IEEE, 1800–1807.Google ScholarCross Ref
- [9] . 2017. Recasting residual-based local descriptors as convolutional neural networks: An application to image forgery detection. In Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security (IH&MMSec’17), Philadelphia, PA, USA, June 20–22, 2017. ACM, 159–164.Google ScholarDigital Library
- [10] . 2018. ForensicTransfer: Weakly-supervised domain adaptation for forgery detection. arXiv:1812.02510.Google Scholar
- [11] . 2019. Retrieved February 14, 2022 from http://www.github.com/deepfakes/faceswap.Google Scholar
- [12] . 2020. RetinaFace: Single-Shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20), Seattle, WA, USA, June 13–19, 2020. IEEE, 5202–5211.Google ScholarCross Ref
- [13] . 2020. Watch your up-convolution: CNN based generative deep neural networks are failing to reproduce spectral distributions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20), Seattle, WA, USA, June 13–19, 2020. IEEE, 7887–7896.Google ScholarCross Ref
- [14] . 2019. Retrieved February 14, 2022 from www.github.com/MarekKowalski/FaceSwap.Google Scholar
- [15] . 2012. Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security 7, 3 (2012), 868–882.Google ScholarDigital Library
- [16] . 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014), 2672–2680.Google ScholarDigital Library
- [17] . 2018. Deepfake video detection using recurrent neural networks. In 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS’18) Auckland, New Zealand, November 27–30, 2018. IEEE, 1–6.Google ScholarCross Ref
- [18] . 2021. Lips don’t lie: A generalisable and robust approach to face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21), Virtual Event, June 19–25, 2021. IEEE, 5039–5049.Google ScholarCross Ref
- [19] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16), Las Vegas, NV, USA, June 27–30, 2016. IEEE, 770–778.Google ScholarCross Ref
- [20] . 2019. AttGAN: Facial attribute editing by only changing what you want. IEEE Transactions on Image Processing 28, 11 (2019), 5464–5478.Google ScholarDigital Library
- [21] . 2021. Hawk: Rapid android malware detection through heterogeneous graph attention networks. IEEE Transactions on Neural Networks and Learning Systems PP, 99 (2021), 1–15.Google ScholarCross Ref
- [22] . 2019. Robust fake news detection over time and attack. ACM Transactions on Intelligent Systems and Technology 11, 1 (2019), 1–23.Google ScholarDigital Library
- [23] . 2019. Measuring the effects of non-identical data distribution for federated visual classification. arXiv:1909.06335.Google Scholar
- [24] . 2020. FakeRetouch: Evading deepfakes detection via the guidance of deliberate noise. arXiv:2009.09213.Google Scholar
- [25] . 2021. Industrial federated topic modeling. ACM Transactions on Intelligent Systems and Technology 12, 1 (2021), 1–22.Google ScholarDigital Library
- [26] . 2017. Progressive growing of GANs for improved quality, stability, and variation. arXiv:1710.10196.Google Scholar
- [27] . 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19), Long Beach, CA, USA, June 16–20, 2019. IEEE, 4401–4410.Google ScholarCross Ref
- [28] . 2017. Fast face-swap using convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17), Venice, Italy, October 22-29, 2017. IEEE, 3697–3705.Google ScholarCross Ref
- [29] . 2021. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21), Virtual Event, June 19–25, 2021. IEEE, 6458–6467.Google ScholarCross Ref
- [30] . 2020. Face x-ray for more general face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20), Seattle, WA, USA, June 13-19, 2020. IEEE, 5000–5009.Google ScholarCross Ref
- [31] . 2019. Privacy-preserving federated brain tumour segmentation. In International Workshop on Machine Learning in Medical Imaging. Springer, 133–141.Google Scholar
- [32] . 2020. Sharp multiple instance learning for deepfake video detection. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, October 12–16, 2020. ACM, 1864–1872.Google ScholarDigital Library
- [33] . 2018. In ictu oculi: Exposing AI created fake videos by detecting eye blinking. In IEEE International Workshop on Information Forensics and Security (WIFS’18), Hong Kong, China, December 11–13, 2018. IEEE, 1–7.Google Scholar
- [34] . 2018. Exposing deepfake videos by detecting face warping artifacts. arXiv:1811.00656Google Scholar
- [35] . 2020. Celeb-DF: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20), Seattle, WA, USA, June 13–19, 2020. IEEE, 3204–3213.Google ScholarCross Ref
- [36] . 2019. Hiding faces in plain sight: Disrupting AI face synthesis with adversarial perturbations. arXiv:1906.09288.Google Scholar
- [37] . 2020. A real-time action representation with temporal encoding and deep compression. IEEE Transactions on Circuits and Systems for Video Technology 31, 2 (2020), 647–660.Google ScholarCross Ref
- [38] . 2021. Recent advances in monocular 2D and 3D human pose estimation: A deep learning perspective. arXiv:2104.11536.Google Scholar
- [39] . 2019. Real-world image datasets for federated learning. arXiv:1910.11089.Google Scholar
- [40] . 2021. Generalizing face forgery detection with high-frequency features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Event, June 19–25, 2021. IEEE, 16317–16326.Google ScholarCross Ref
- [41] . 2020. Two-branch recurrent network for isolating deepfakes in videos. In Computer Vision - ECCV 2020-16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII (Lecture Notes in Computer Science), Vol. 12352. Springer, 667–684.Google Scholar
- [42] . 2017. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Articial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA (Proceedings of Machine Learning Research), Vol. 54. PMLR, 1273–1282.Google Scholar
- [43] . 2019. Multi-task learning for detecting and segmenting manipulated facial images and videos. arXiv:1906.06876.Google Scholar
- [44] . 2019. Capsule-forensics: Using capsule networks to detect forged images and videos. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). IEEE, 2307–2311.Google ScholarCross Ref
- [45] . 2021. Differentially private federated knowledge graphs embedding. In 30th ACM International Conference on Information and Knowledge Management (CIKM’21), Virtual Event, Queensland, Australia, November 1–5, 2021, , , , , and (Eds.). ACM, 1416–1425. Google ScholarDigital Library
- [46] . 2021. Streaming social event detection and evolution discovery in heterogeneous information networks. ACM Transactions on Knowledge Discovery from Data 15, 5 (2021), 89:1–89:33. Google ScholarDigital Library
- [47] . 2022. Lime: Low-cost incremental learning for dynamic heterogeneous information networks. IEEE Transactions on Computers 71, 3 (2022), 628–642.Google ScholarDigital Library
- [48] . 2016. Invertible conditional GANs for image editing. arXiv:1611.06355.Google Scholar
- [49] . 2020. DeepRhythm: Exposing deepfakes with attentional visual heartbeat rhythms. In Proceedings of the 28th ACM International Conference on Multimedia, Virtual Event/Seattle, WA, USA, October 12–16, 2020. ACM, 4318–4327.Google ScholarDigital Library
- [50] . 2020. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Proceedings of the 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII, Computer Vision - ECCV 2020. Lecture Notes in Computer Science, Vol. 12357. Springer, 86–103.Google Scholar
- [51] . 2017. Distinguishing computer graphics from natural images using convolution neural networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, 2017. IEEE, 1822–1830.Google Scholar
- [52] . 2017. Transferable deep-CNN features for detecting digital and print-scanned morphed face images. In 2017 IEEE Workshop on Information Forensics and Security, WIFS 2017, Rennes, France, December 4–7, 2017. IEEE, 1–6.Google Scholar
- [53] . 2019. SysML: The new frontier of machine learning systems. arXiv preprint 1904.03257 (2019).Google Scholar
- [54] . 2020. FedPAQ: A communication-efficient federated learning method with periodic averaging and quantization. In The 23rd International Conference on Articial Intelligence and Statistics, AISTATS 2020, 26–28 August 2020, Online [Palermo, Sicily, Italy] (Proceedings of Machine Learning Research), Vol. 108. PMLR, 2021–2031.Google Scholar
- [55] . 2019. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19), Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 1–11.Google ScholarCross Ref
- [56] . 2019. Recurrent convolutional strategies for face manipulation detection in videos. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2019, Long Beach, CA, USA, June 16–20, 2019. IEEE, 80–87.Google Scholar
- [57] . 2017. Dynamic routing between capsules. arXiv:1710.09829.Google Scholar
- [58] . 2018. On the convergence of federated optimization in heterogeneous networks. arXiv:1812.06127 3.Google Scholar
- [59] . 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17), Venice, Italy, October 22–29, 2017. IEEE, 618–626.Google ScholarCross Ref
- [60] . 2021. PRRNet: Pixel-Region relation network for face forgery detection. Pattern Recognition 116 (2021), 107950.Google ScholarDigital Library
- [61] . 2019. Complement face forensic detection and localization with faciallandmarks. arXiv:1910.05455.Google Scholar
- [62] . 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics 38, 4 (2019), 1–12.Google ScholarDigital Library
- [63] . 2016. Face2face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16), Las Vegas, NV, USA, June 27–30, 2016. IEEE, 2387–2395.Google ScholarDigital Library
- [64] . 2021. Interpretable and trustworthy deepfake detection via dynamic prototypes. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1973–1983.Google ScholarCross Ref
- [65] . 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 2579–2605.Google Scholar
- [66] . 2020. Federated learning with matched averaging. arXiv:2002.06440.Google Scholar
- [67] . 2020. Tackling the objective inconsistency problem in heterogeneous federated optimization. arXiv:2007.07481.Google Scholar
- [68] . 2020. R-Net: A relationship network for efficient and accurate scene text detection. IEEE Transactions on Multimedia 23 (2020), 1316–1329.Google ScholarDigital Library
- [69] . 2020. ContourNet: Taking a further step toward accurate arbitrary-shaped scene text detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20), Seattle, WA, USA, June 13–19, 2020. IEEE, 11750–11759.Google ScholarCross Ref
- [70] . 2020. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security 15 (2020), 3454–3469.Google ScholarDigital Library
- [71] . 2020. AU-assisted graph attention convolutional network for micro-expression recognition. In Proceedings of the 28th ACM International Conference on Multimedia, Virtual Event/Seattle, WA, USA, October 12–16, 2020. 2871–2880.Google ScholarDigital Library
- [72] . 2020. An overview of facial micro-expression analysis: Data, methodology and challenge. arXiv:2012.11307.Google Scholar
- [73] . 2021. Privacy-preserving federated depression detection from multi-source mobile health data. IEEE Transactions on Industrial Informatics (2021), 1–1. Google ScholarCross Ref
- [74] . 2020. Hierarchical soft quantization for skeleton-based human action recognition. IEEE Transactions on Multimedia 23 (2020), 883–898.Google ScholarCross Ref
- [75] . 2019. Exposing deep fakes using inconsistent head poses. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19), Brighton, United Kingdom, May 12–17, 2019. IEEE, 8261–8265.Google ScholarCross Ref
- [76] . 2021. Multimodal inputs driven talking face generation with spatial-temporal dependency. IEEE Transactions on Circuits and Systems for Video Technology 31, 1 (2021), 203–216.Google ScholarDigital Library
- [77] . 2019. Mining audio, text and visual information for talking face generation. In IEEE International Conference on Data Mining (ICDM’19), Beijing, China, November 8–11, 2019. IEEE, 787–795.Google ScholarCross Ref
- [78] . 2017. Two-stream neural networks for tampered face detection. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17), Honolulu, HI, USA, July 21–26, 2017. IEEE, 1831–1839.Google ScholarCross Ref
- [79] . 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, October 22–29, 2017. 2223–2232.Google ScholarCross Ref
Index Terms
Dynamic-Aware Federated Learning for Face Forgery Video Detection
Recommendations
Multi-Scale Feature Enhancement Network for Face Forgery Detection
ICMVA '23: Proceedings of the 2023 6th International Conference on Machine Vision and ApplicationsNowadays, synthesizing realistic fake face images and videos becomes easy benefiting from the advance in generation technology. With the popularity of face forgery, abuse of the technology occurs from time to time, which promotes the research on face ...
FedForgery: Generalized Face Forgery Detection With Residual Federated Learning
With the continuous development of deep learning in the field of image generation models, a large number of vivid forged faces have been generated and spread on the Internet. These high-authenticity artifacts could grow into a threat to society security. ...
Copy-move forgery detection
The authenticity and reliability of digital images are increasingly important due to the ease in modifying such images. Thus, the capability to identify image manipulation is a current research focus, and a key domain in digital image authentication is ...
Comments