skip to main content
research-article
Free Access
Just Accepted

Deepfake Detection Using Spatiotemporal Transformer

Online AM:23 January 2024Publication History
Skip Abstract Section

Abstract

Recent advances in generative models and the availability of large-scale benchmarks have made deepfake video generation and manipulation easier. Nowadays, the number of new hyper-realistic deepfake videos used for negative purposes is dramatically increasing, thus creating the need for effective deepfake detection methods. Although many existing deepfake detection approaches, particularly CNN-based methods, show promising results, they suffer from several drawbacks. In general, poor generalization results have been obtained under unseen/new deepfake generation methods. The crucial reason for the above defect is that CNN-based methods focus on the local spatial artifacts, which are unique for every manipulation method. Therefore, it is hard to learn the general forgery traces of different manipulation methods without considering the dependencies that extend beyond the local receptive field. To address this problem, this paper proposes a framework that combines aper proposes a framework that combines with Vision Transformer (ViT) to improve detection accuracy and enhance generalizability. Our method, named HCiT, exploits the advantages of CNNs to extract meaningful local features, as well as the VIT’s self-attention mechanism to learn discriminative global contextual dependencies in a frame-level image explicitly. In this hybrid architecture, the high-level feature maps extracted from the CNN are fed into the ViT model that determines whether a specific video is fake or real. Experiments were performed on Faceforensics++, DeepFake Detection Challenge preview, Celeb datasets, and the results show that the proposed method significantly outperforms the state-of-the-art methods. In addition, the HCiT method shows a great capacity for generalization on datasets covering various techniques of deepfake generation. The source code is available at: https://github.com/KADDAR-Bachir/HCiT

References

  1. Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. Mesonet: a compact facial video forgery detection network. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 1–7. https://doi.org/10.1109/WIFS.2018.8630761Google ScholarGoogle ScholarCross RefCross Ref
  2. Akshay Agarwal, Richa Singh, Mayank Vatsa, and Afzel Noore. 2017. Swapped! digital face presentation attack detection via weighted local magnitude pattern. In 2017 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 659–665.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Shruti Agarwal, Hany Farid, Yuming Gu, Mingming He, Koki Nagano, and Hao Li. 2019. Protecting World Leaders Against Deep Fakes. In CVPR workshops, Vol.  1. 38.Google ScholarGoogle Scholar
  4. Henry Ajder, Giorgio Patrini, Francesco Cavalli, and Laurence Cullen. 2019. The state of deepfakes: Landscape, threats, and impact. Amsterdam: Deeptrace(2019).Google ScholarGoogle Scholar
  5. Zahid Akhtar and Dipankar Dasgupta. 2019. A comparative evaluation of local feature descriptors for deepfakes detection. In 2019 IEEE International Symposium on Technologies for Homeland Security (HST). IEEE, 1–5. https://doi.org/10.1109/HST47167.2019.9033005Google ScholarGoogle ScholarCross RefCross Ref
  6. Irene Amerini, Leonardo Galteri, Roberto Caldelli, and Alberto Del Bimbo. 2019. Deepfake video detection through optical flow based cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0–0. https://doi.org/10.1109/ICCVW.2019.00152Google ScholarGoogle ScholarCross RefCross Ref
  7. Mauro Barni, Ehsan Nowroozi, and Benedetta Tondi. 2018. Detection of adaptive histogram equalization robust against JPEG compression. In 2018 International Workshop on Biometrics and Forensics (IWBF). IEEE, 1–8. https://doi.org/10.1109/IWBF.2018.8401564Google ScholarGoogle ScholarCross RefCross Ref
  8. Belhassen Bayar and Matthew C Stamm. 2016. A deep learning approach to universal image manipulation detection using a new convolutional layer. In Proceedings of the 4th ACM workshop on information hiding and multimedia security. 5–10. https://doi.org/10.1145/2909827.2930786Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Belhassen Bayar and Matthew C Stamm. 2018. Constrained convolutional neural networks: A new approach towards general purpose image manipulation detection. IEEE Transactions on Information Forensics and Security 13, 11(2018), 2691–2706. https://doi.org/10.1109/TIFS.2018.2825953Google ScholarGoogle ScholarCross RefCross Ref
  10. Mikołaj Bińkowski, Dougal J Sutherland, Mi-chael Arbel, and Arthur Gretton. 2018. Demystifying mmd gans. arXiv preprint arXiv:1801.01401(2018).Google ScholarGoogle Scholar
  11. Dmitri Bitouk, Neeraj Kumar, Samreen Dhillon, Peter Belhumeur, and Shree K Nayar. 2008. Face swapping: automatically replacing faces in photographs. In ACM SIGGRAPH 2008 papers. 1–8. https://doi.org/10.1145/1399504.1360638Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Junyi Cao, Chao Ma, Taiping Yao, Shen Chen, Shouhong Ding, and Xiaokang Yang. 2022. End-to-end reconstruction-classification learning for face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4113–4122.Google ScholarGoogle ScholarCross RefCross Ref
  13. Bobby Chesney and Danielle Citron. 2019. Deep fakes: a looming challenge for privacy, democracy, and national security. Calif. L. Rev. 107(2019), 1753. https://doi.org/10.2139/ssrn.3213954Google ScholarGoogle ScholarCross RefCross Ref
  14. François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1251–1258. https://doi.org/10.1109/CVPR.2017.195Google ScholarGoogle ScholarCross RefCross Ref
  15. Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2015. Splicebuster: A new blind image splicing detector. In 2015 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 1–6. https://doi.org/10.1109/WIFS.2015.7368565Google ScholarGoogle ScholarCross RefCross Ref
  16. Davide Cozzolino and Luisa Verdoliva. 2019. Noiseprint: a CNN-based camera model fingerprint. IEEE Transactions on Information Forensics and Security 15 (2019), 144–159. https://doi.org/10.1109/TIFS.2019.2916364Google ScholarGoogle ScholarCross RefCross Ref
  17. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 248–255. https://doi.org/10.1109/CVPR.2009.5206848Google ScholarGoogle ScholarCross RefCross Ref
  18. Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, and Stefanos Zafeiriou. 2019. Retinaface: Single-stage dense face localisation in the wild. arXiv preprint arXiv:1905.00641(2019).Google ScholarGoogle Scholar
  19. Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton Ferrer. 2019. The deepfake detection challenge (dfdc) preview dataset. arXiv preprint arXiv:1910.08854(2019).Google ScholarGoogle Scholar
  20. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929(2020).Google ScholarGoogle Scholar
  21. Stéphane d’Ascoli, Hugo Touvron, Matthew L Leavitt, Ari S Morcos, Giulio Biroli, and Levent Sagun. 2021. Convit: Improving vision transformers with soft convolutional inductive biases. In International Conference on Machine Learning. PMLR, 2286–2296. https://doi.org/10.1088/1742-5468/ac9830Google ScholarGoogle ScholarCross RefCross Ref
  22. Hany Farid. 2016. Photo forensics. MIT press. https://doi.org/10.7551/mitpress/10451.001.0001Google ScholarGoogle ScholarCross RefCross Ref
  23. Jessica Fridrich and Jan Kodovsky. 2012. Rich models for steganalysis of digital images. IEEE Transactions on information Forensics and Security 7, 3(2012), 868–882. https://doi.org/10.1109/TIFS.2012.2190402Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Miroslav Goljan and Jessica Fridrich. 2015. CFA-aware features for steganalysis of color images. In Media Watermarking, Security, and Forensics 2015, Vol.  9409. SPIE, 279–291. https://doi.org/10.1117/12.2078399Google ScholarGoogle ScholarCross RefCross Ref
  25. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144. https://doi.org/10.1145/3422622Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. David Güera and Edward J Delp. 2018. Deepfake video detection using recurrent neural networks. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1–6. https://doi.org/10.1109/AVSS.2018.8639163Google ScholarGoogle ScholarCross RefCross Ref
  27. Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2017. Learning spatio-temporal features with 3D residual networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3154–3160. https://doi.org/10.1109/ICCVW.2017.373Google ScholarGoogle ScholarCross RefCross Ref
  28. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence 37, 9(2015), 1904–1916.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pat-tern recognition. 770–778. https://doi.org/10.1109/CVPR.2016.90Google ScholarGoogle ScholarCross RefCross Ref
  30. Young-Jin Heo, Young-Ju Choi, Young-Woon Lee, and Byung-Gyu Kim. 2021. Deepfake Detection Scheme Based on Vision Transformer and Distillation. arXiv preprint arXiv:2104.01353(2021).Google ScholarGoogle Scholar
  31. Minyoung Huh, Andrew Liu, Andrew Owens, and Alexei A Efros. 2018. Fighting fake news: Image splice detection via learned self-consistency. In Proceedings of the European conference on computer vision (ECCV). 101–117. https://doi.org/10.1007/978-3-030-01252-6_7Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Bachir Kaddar, Sid Ahmed Fezza, Wassim Hamidouche, Zahid Akhtar, and Abdenour Hadid. 2023. On the effectiveness of handcrafted features for deepfake video detection. Journal of Electronic Imaging 32, 5 (2023), 053033–053033.Google ScholarGoogle ScholarCross RefCross Ref
  33. Tero Karras, Timo Aila, Samuli Laine, and Jaak-ko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196(2017).Google ScholarGoogle Scholar
  34. Davis E King. 2009. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research 10 (2009), 1755–1758. https://doi.org/10.5555/1577069.1755843Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).Google ScholarGoogle Scholar
  36. Pavel Korshunov, Michael Halstead, Diego Castan, Martin Graciarena, Mitchell McLaren, Brian Burns, Aaron Lawson, and Sebastien Marcel. 2019. Tampered speaker inconsistency detection with phonetically aware audio-visual features. In International Conference on Machine Learning.Google ScholarGoogle Scholar
  37. Pavel Korshunov and Sébastien Marcel. 2018. Speaker inconsistency detection in tampered video. In 2018 26th European Signal Processing Conference (EUSIPCO). IEEE, 2375–2379. https://doi.org/10.23919/EUSIPCO.2018.8553270Google ScholarGoogle ScholarCross RefCross Ref
  38. Pavel Korshunov and Sébastien Marcel. 2021. Subjective and Objective Evaluation of Deepfake Videos. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2510–2514. https://doi.org/10.1109/ICASSP39728.2021.9414258Google ScholarGoogle ScholarCross RefCross Ref
  39. Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. 2020. Face x-ray for more general face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5001–5010.Google ScholarGoogle ScholarCross RefCross Ref
  40. Yuezun Li, Ming-Ching Chang, and Siwei Lyu. 2018. In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking. arXiv preprint arXiv:1806.02877(2018).Google ScholarGoogle Scholar
  41. Yuezun Li and Siwei Lyu. 1811. Exposing deepfake videos by detecting face warping artifacts. arXiv 2018. arXiv preprint arXiv:1811.00656(1811).Google ScholarGoogle Scholar
  42. Yuezun Li and Siwei Lyu. 2018. Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656(2018).Google ScholarGoogle Scholar
  43. Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3207–3216. https://doi.org/10.1109/CVPR42600.2020.00327Google ScholarGoogle ScholarCross RefCross Ref
  44. Decheng Liu, Zhan Dang, Chunlei Peng, Yu Zheng, Shuang Li, Nannan Wang, and Xinbo Gao. 2023. FedForgery: generalized face forgery detection with residual federated learning. IEEE Transactions on Information Forensics and Security (2023).Google ScholarGoogle Scholar
  45. Sara Mandelli, Nicolò Bonettini, Paolo Bestagini, Vincenzo Lipari, and Stefano Tubaro. 2018. Multiple JPEG compression detection through task-driven non-negative matrix factorization. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2106–2110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Bernard Marr. 2019. The best (and scariest) examples of AI-enabled deepfakes. Forbes. https://cutt. ly/vK0OcsP(2019).Google ScholarGoogle Scholar
  47. Francesco Marra, Diego Gragnaniello, Davide Cozzolino, and Luisa Verdoliva. 2018. Detection of gan-generated fake images over social networks. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE, 384–389. https://doi.org/10.1109/MIPR.2018.00084Google ScholarGoogle ScholarCross RefCross Ref
  48. Momina Masood, Mariam Nawaz, Khalid Mahmood Malik, Ali Javed, Aun Irtaza, and Hafiz Malik. 2022. Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward. Applied Intelligence(2022), 1–53. https://doi.org/10.1007/s10489-022-03766-zGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  49. Falko Matern, Christian Riess, and Marc Stamminger. 2019. Exploiting visual artifacts to expose deepfakes and face manipulations. In 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW). IEEE, 83–92. https://doi.org/10.1109/WACVW.2019.00020Google ScholarGoogle ScholarCross RefCross Ref
  50. Scott McCloskey and Michael Albright. 2018. Detecting gan-generated imagery using color cues. arXiv preprint arXiv:1812.08247(2018).Google ScholarGoogle Scholar
  51. Yisroel Mirsky and Wenke Lee. 2021. The creation and detection of deepfakes: A survey. ACM Computing Surveys (CSUR) 54, 1 (2021), 1–41. https://doi.org/10.1145/3425780Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Huaxiao Mo, Bolin Chen, and Weiqi Luo. 2018. Fake faces identification via convolutional neural network. In Proceedings of the 6th ACM Workshop on Information Hiding and Multimedia Security. 43–47. https://doi.org/10.1145/3206004.3206009Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Joao C Neves, Ruben Tolosana, Ruben Vera-Rodriguez, Vasco Lopes, Hugo Proença, and Julian Fierrez. 2020. Ganprintr: Improved fakes and evaluation of the state of the art in face manipulation detection. IEEE Journal of Selected Topics in Signal Processing 14, 5(2020), 1038–1048. https://doi.org/10.1109/JSTSP.2020.3007250Google ScholarGoogle ScholarCross RefCross Ref
  54. Huy H Nguyen, Fuming Fang, Junichi Yamagishi, and Isao Echizen. 2019. Multi-task learning for detecting and segmenting manipulated facial images and videos. In 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS). IEEE, 1–8. https://doi.org/10.1109/BTAS46853.2019.9185974Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Huy H Nguyen, Junichi Yamagishi, and Isao Echizen. 2019. Capsule-forensics: Using capsule networks to detect forged images and videos. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2307–2311. https://doi.org/10.1109/ICASSP.2019.8682602Google ScholarGoogle ScholarCross RefCross Ref
  56. Huy H Nguyen, Junichi Yamagishi, and Isao Echizen. 2019. Use of a capsule network to detect fake images and videos. arXiv preprint arXiv:1910.12467(2019).Google ScholarGoogle Scholar
  57. Xunyu Pan, Xing Zhang, and Siwei Lyu. 2012. Exposing image splicing with inconsistent local noise variances. In 2012 IEEE International Conference on Computational Photography (ICCP). IEEE, 1–10. https://doi.org/10.1109/ICCPhot.2012.6215223Google ScholarGoogle ScholarCross RefCross Ref
  58. Nicolas Rahmouni, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2017. Distinguishing computer graphics from natural images using convolution neural networks. In 2017 IEEE workshop on information forensics and security (WIFS). IEEE, 1–6. https://doi.org/10.1109/WIFS.2017.8267647Google ScholarGoogle ScholarCross RefCross Ref
  59. Judith A Redi, Wiem Taktak, and Jean-Luc Dugelay. 2011. Digital image forensics: a booklet for beginners. Multimedia Tools and Applications 51, 1 (2011), 133–162. https://doi.org/10.1007/s11042-010-0620-1Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2018. Faceforensics: A large-scale video dataset for forgery detection in human faces. arXiv preprint arXiv:1803.09179(2018).Google ScholarGoogle Scholar
  61. Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE International Conference on Computer Vision. 1–11. https://doi.org/10.1109/ICCV.2019.00009Google ScholarGoogle ScholarCross RefCross Ref
  62. Ritaban Roy, Indu Joshi, Abhijit Das, and Antitza Dantcheva. 2022. 3D CNN architectures and attention mechanisms for deepfake detection. In Handbook of Digital Face Manipulation and Detection: From DeepFakes to Morphing Attacks. Springer International Publishing Cham, 213–234.Google ScholarGoogle Scholar
  63. Ekraam Sabir, Jiaxin Cheng, Ayush Jaiswal, Wael AbdAlmageed, Iacopo Masi, and Prem Natarajan. 2019. Recurrent convolutional strategies for face manipulation detection in videos. Interfaces (GUI) 3, 1 (2019), 80–87.Google ScholarGoogle Scholar
  64. Rulin Shao, Zhouxing Shi, Jinfeng Yi, Pin-Yu Chen, and Cho-Jui Hsieh. 2021. On the adversarial robustness of visual transformers. arXiv preprint arXiv:2103.15670 2, 7 (2021).Google ScholarGoogle Scholar
  65. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).Google ScholarGoogle Scholar
  66. Marina Sokolova, Nathalie Japkowicz, and Stan Szpakowicz. 2006. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In AI 2006: Advances in Artificial Intelligence: 19th Australian Joint Conference on Artificial Intelligence, Hobart, Australia, December 4-8, 2006. Proceedings 19. Springer, 1015–1021. https://doi.org/=10.1007/11941439_114Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Ke Sun, Taiping Yao, Shen Chen, Shouhong Ding, Jilin Li, and Rongrong Ji. 2022. Dual contrastive learning for general face forgery detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.  36. 2316–2324.Google ScholarGoogle ScholarCross RefCross Ref
  68. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence, Vol.  31. https://doi.org/10.1609/aaai.v31i1.11231Google ScholarGoogle ScholarCross RefCross Ref
  69. Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, 6105–6114.Google ScholarGoogle Scholar
  70. Justus Thies, Michael Zollhöfer, Christian Theobalt, Marc Stamminger, and Matthias Nießner. 2018. Headon: Real-time reenactment of human portrait videos. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1–13. https://doi.org/10.1145/3197517.3201350Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, Aythami Morales, and Javier Ortega-Garcia. 2020. Deepfakes and beyond: A survey of face manipulation and fake detection. Information Fusion 64(2020), 131–148. https://doi.org/10.1016/j.inffus.2020.06.014Google ScholarGoogle ScholarCross RefCross Ref
  72. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017). https://doi.org/doi/10.5555/3295222.3295349Google ScholarGoogle Scholar
  73. Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning. 1096–1103. https://doi.org/10.1145/1390156.1390294Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Chengrui Wang and Weihong Deng. 2021. Representative forgery mining for fake face detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14923–14932.Google ScholarGoogle ScholarCross RefCross Ref
  75. Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Yu-Gang Jiang, and Ser-Nam Li. 2022. M2tr: Multi-modal multi-scale transformers for deepfake detection. In Proceedings of the 2022 International Conference on Multimedia Retrieval. 615–623. https://doi.org/10.1145/3512527.3531415Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Deressa Wodajo and Solomon Atnafu. 2021. Deepfake Video Detection Using Convolutional Vision Transformer. arXiv preprint arXiv:2102.11126(2021).Google ScholarGoogle Scholar
  77. Xin Yang, Yuezun Li, and Siwei Lyu. 2019. Exposing deep fakes using inconsistent head poses. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8261–8265. https://doi.org/10.1109/ICASSP.2019.8683164Google ScholarGoogle ScholarCross RefCross Ref
  78. Ning Yu, Larry S Davis, and Mario Fritz. 2019. Attributing fake images to gans: Learning and analyzing gan fingerprints. In Proceedings of the IEEE/CVF international conference on computer vision. 7556–7566. https://doi.org/10.1109/ICCV.2019.00765Google ScholarGoogle ScholarCross RefCross Ref
  79. Kun Yuan, Shaopeng Guo, Ziwei Liu, Aojun Zhou, Fengwei Yu, and Wei Wu. 2021. Incorporating convolution designs into visual transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 579–588. https://doi.org/10.1109/ICCV48922.2021.00062Google ScholarGoogle ScholarCross RefCross Ref
  80. Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters 23, 10 (2016), 1499–1503.Google ScholarGoogle ScholarCross RefCross Ref
  81. Ying Zhang, Lilei Zheng, and Vrizlynn LL Thing. 2017. Automated face swapping and its detection. In 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP). IEEE, 15–19. https://doi.org/10.1109/SIPROCESS.2017.8124497Google ScholarGoogle ScholarCross RefCross Ref
  82. Peng Zhou, Xintong Han, Vlad I Morariu, and Larry S Davis. 2017. Two-stream neural networks for tampered face detection. In 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, 1831–1839. https://doi.org/10.1109/CVPRW.2017.229Google ScholarGoogle ScholarCross RefCross Ref
  83. Yi Zhu and Shawn Newsam. 2017. Densenet for dense flow. In 2017 IEEE international conference on image processing (ICIP). IEEE, 790–794. https://doi.org/10.1109/ICIP.2017.8296389Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deepfake Detection Using Spatiotemporal Transformer
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications Just Accepted
          ISSN:1551-6857
          EISSN:1551-6865
          Table of Contents

          Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Online AM: 23 January 2024
          • Accepted: 8 January 2024
          • Revised: 20 November 2023
          • Received: 20 April 2023
          Published in tomm Just Accepted

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)514
          • Downloads (Last 6 weeks)156

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader