Abstract
Recent advances in generative models and the availability of large-scale benchmarks have made deepfake video generation and manipulation easier. Nowadays, the number of new hyper-realistic deepfake videos used for negative purposes is dramatically increasing, thus creating the need for effective deepfake detection methods. Although many existing deepfake detection approaches, particularly CNN-based methods, show promising results, they suffer from several drawbacks. In general, poor generalization results have been obtained under unseen/new deepfake generation methods. The crucial reason for the above defect is that CNN-based methods focus on the local spatial artifacts, which are unique for every manipulation method. Therefore, it is hard to learn the general forgery traces of different manipulation methods without considering the dependencies that extend beyond the local receptive field. To address this problem, this paper proposes a framework that combines aper proposes a framework that combines with Vision Transformer (ViT) to improve detection accuracy and enhance generalizability. Our method, named HCiT, exploits the advantages of CNNs to extract meaningful local features, as well as the VIT’s self-attention mechanism to learn discriminative global contextual dependencies in a frame-level image explicitly. In this hybrid architecture, the high-level feature maps extracted from the CNN are fed into the ViT model that determines whether a specific video is fake or real. Experiments were performed on Faceforensics++, DeepFake Detection Challenge preview, Celeb datasets, and the results show that the proposed method significantly outperforms the state-of-the-art methods. In addition, the HCiT method shows a great capacity for generalization on datasets covering various techniques of deepfake generation. The source code is available at: https://github.com/KADDAR-Bachir/HCiT
- Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. Mesonet: a compact facial video forgery detection network. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 1–7. https://doi.org/10.1109/WIFS.2018.8630761Google ScholarCross Ref
- Akshay Agarwal, Richa Singh, Mayank Vatsa, and Afzel Noore. 2017. Swapped! digital face presentation attack detection via weighted local magnitude pattern. In 2017 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 659–665.Google ScholarDigital Library
- Shruti Agarwal, Hany Farid, Yuming Gu, Mingming He, Koki Nagano, and Hao Li. 2019. Protecting World Leaders Against Deep Fakes. In CVPR workshops, Vol. 1. 38.Google Scholar
- Henry Ajder, Giorgio Patrini, Francesco Cavalli, and Laurence Cullen. 2019. The state of deepfakes: Landscape, threats, and impact. Amsterdam: Deeptrace(2019).Google Scholar
- Zahid Akhtar and Dipankar Dasgupta. 2019. A comparative evaluation of local feature descriptors for deepfakes detection. In 2019 IEEE International Symposium on Technologies for Homeland Security (HST). IEEE, 1–5. https://doi.org/10.1109/HST47167.2019.9033005Google ScholarCross Ref
- Irene Amerini, Leonardo Galteri, Roberto Caldelli, and Alberto Del Bimbo. 2019. Deepfake video detection through optical flow based cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 0–0. https://doi.org/10.1109/ICCVW.2019.00152Google ScholarCross Ref
- Mauro Barni, Ehsan Nowroozi, and Benedetta Tondi. 2018. Detection of adaptive histogram equalization robust against JPEG compression. In 2018 International Workshop on Biometrics and Forensics (IWBF). IEEE, 1–8. https://doi.org/10.1109/IWBF.2018.8401564Google ScholarCross Ref
- Belhassen Bayar and Matthew C Stamm. 2016. A deep learning approach to universal image manipulation detection using a new convolutional layer. In Proceedings of the 4th ACM workshop on information hiding and multimedia security. 5–10. https://doi.org/10.1145/2909827.2930786Google ScholarDigital Library
- Belhassen Bayar and Matthew C Stamm. 2018. Constrained convolutional neural networks: A new approach towards general purpose image manipulation detection. IEEE Transactions on Information Forensics and Security 13, 11(2018), 2691–2706. https://doi.org/10.1109/TIFS.2018.2825953Google ScholarCross Ref
- Mikołaj Bińkowski, Dougal J Sutherland, Mi-chael Arbel, and Arthur Gretton. 2018. Demystifying mmd gans. arXiv preprint arXiv:1801.01401(2018).Google Scholar
- Dmitri Bitouk, Neeraj Kumar, Samreen Dhillon, Peter Belhumeur, and Shree K Nayar. 2008. Face swapping: automatically replacing faces in photographs. In ACM SIGGRAPH 2008 papers. 1–8. https://doi.org/10.1145/1399504.1360638Google ScholarDigital Library
- Junyi Cao, Chao Ma, Taiping Yao, Shen Chen, Shouhong Ding, and Xiaokang Yang. 2022. End-to-end reconstruction-classification learning for face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4113–4122.Google ScholarCross Ref
- Bobby Chesney and Danielle Citron. 2019. Deep fakes: a looming challenge for privacy, democracy, and national security. Calif. L. Rev. 107(2019), 1753. https://doi.org/10.2139/ssrn.3213954Google ScholarCross Ref
- François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1251–1258. https://doi.org/10.1109/CVPR.2017.195Google ScholarCross Ref
- Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2015. Splicebuster: A new blind image splicing detector. In 2015 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 1–6. https://doi.org/10.1109/WIFS.2015.7368565Google ScholarCross Ref
- Davide Cozzolino and Luisa Verdoliva. 2019. Noiseprint: a CNN-based camera model fingerprint. IEEE Transactions on Information Forensics and Security 15 (2019), 144–159. https://doi.org/10.1109/TIFS.2019.2916364Google ScholarCross Ref
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 248–255. https://doi.org/10.1109/CVPR.2009.5206848Google ScholarCross Ref
- Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, and Stefanos Zafeiriou. 2019. Retinaface: Single-stage dense face localisation in the wild. arXiv preprint arXiv:1905.00641(2019).Google Scholar
- Brian Dolhansky, Russ Howes, Ben Pflaum, Nicole Baram, and Cristian Canton Ferrer. 2019. The deepfake detection challenge (dfdc) preview dataset. arXiv preprint arXiv:1910.08854(2019).Google Scholar
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929(2020).Google Scholar
- Stéphane d’Ascoli, Hugo Touvron, Matthew L Leavitt, Ari S Morcos, Giulio Biroli, and Levent Sagun. 2021. Convit: Improving vision transformers with soft convolutional inductive biases. In International Conference on Machine Learning. PMLR, 2286–2296. https://doi.org/10.1088/1742-5468/ac9830Google ScholarCross Ref
- Hany Farid. 2016. Photo forensics. MIT press. https://doi.org/10.7551/mitpress/10451.001.0001Google ScholarCross Ref
- Jessica Fridrich and Jan Kodovsky. 2012. Rich models for steganalysis of digital images. IEEE Transactions on information Forensics and Security 7, 3(2012), 868–882. https://doi.org/10.1109/TIFS.2012.2190402Google ScholarDigital Library
- Miroslav Goljan and Jessica Fridrich. 2015. CFA-aware features for steganalysis of color images. In Media Watermarking, Security, and Forensics 2015, Vol. 9409. SPIE, 279–291. https://doi.org/10.1117/12.2078399Google ScholarCross Ref
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144. https://doi.org/10.1145/3422622Google ScholarDigital Library
- David Güera and Edward J Delp. 2018. Deepfake video detection using recurrent neural networks. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1–6. https://doi.org/10.1109/AVSS.2018.8639163Google ScholarCross Ref
- Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2017. Learning spatio-temporal features with 3D residual networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3154–3160. https://doi.org/10.1109/ICCVW.2017.373Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence 37, 9(2015), 1904–1916.Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pat-tern recognition. 770–778. https://doi.org/10.1109/CVPR.2016.90Google ScholarCross Ref
- Young-Jin Heo, Young-Ju Choi, Young-Woon Lee, and Byung-Gyu Kim. 2021. Deepfake Detection Scheme Based on Vision Transformer and Distillation. arXiv preprint arXiv:2104.01353(2021).Google Scholar
- Minyoung Huh, Andrew Liu, Andrew Owens, and Alexei A Efros. 2018. Fighting fake news: Image splice detection via learned self-consistency. In Proceedings of the European conference on computer vision (ECCV). 101–117. https://doi.org/10.1007/978-3-030-01252-6_7Google ScholarDigital Library
- Bachir Kaddar, Sid Ahmed Fezza, Wassim Hamidouche, Zahid Akhtar, and Abdenour Hadid. 2023. On the effectiveness of handcrafted features for deepfake video detection. Journal of Electronic Imaging 32, 5 (2023), 053033–053033.Google ScholarCross Ref
- Tero Karras, Timo Aila, Samuli Laine, and Jaak-ko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196(2017).Google Scholar
- Davis E King. 2009. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research 10 (2009), 1755–1758. https://doi.org/10.5555/1577069.1755843Google ScholarDigital Library
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).Google Scholar
- Pavel Korshunov, Michael Halstead, Diego Castan, Martin Graciarena, Mitchell McLaren, Brian Burns, Aaron Lawson, and Sebastien Marcel. 2019. Tampered speaker inconsistency detection with phonetically aware audio-visual features. In International Conference on Machine Learning.Google Scholar
- Pavel Korshunov and Sébastien Marcel. 2018. Speaker inconsistency detection in tampered video. In 2018 26th European Signal Processing Conference (EUSIPCO). IEEE, 2375–2379. https://doi.org/10.23919/EUSIPCO.2018.8553270Google ScholarCross Ref
- Pavel Korshunov and Sébastien Marcel. 2021. Subjective and Objective Evaluation of Deepfake Videos. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2510–2514. https://doi.org/10.1109/ICASSP39728.2021.9414258Google ScholarCross Ref
- Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, Dong Chen, Fang Wen, and Baining Guo. 2020. Face x-ray for more general face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5001–5010.Google ScholarCross Ref
- Yuezun Li, Ming-Ching Chang, and Siwei Lyu. 2018. In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking. arXiv preprint arXiv:1806.02877(2018).Google Scholar
- Yuezun Li and Siwei Lyu. 1811. Exposing deepfake videos by detecting face warping artifacts. arXiv 2018. arXiv preprint arXiv:1811.00656(1811).Google Scholar
- Yuezun Li and Siwei Lyu. 2018. Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656(2018).Google Scholar
- Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3207–3216. https://doi.org/10.1109/CVPR42600.2020.00327Google ScholarCross Ref
- Decheng Liu, Zhan Dang, Chunlei Peng, Yu Zheng, Shuang Li, Nannan Wang, and Xinbo Gao. 2023. FedForgery: generalized face forgery detection with residual federated learning. IEEE Transactions on Information Forensics and Security (2023).Google Scholar
- Sara Mandelli, Nicolò Bonettini, Paolo Bestagini, Vincenzo Lipari, and Stefano Tubaro. 2018. Multiple JPEG compression detection through task-driven non-negative matrix factorization. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2106–2110.Google ScholarDigital Library
- Bernard Marr. 2019. The best (and scariest) examples of AI-enabled deepfakes. Forbes. https://cutt. ly/vK0OcsP(2019).Google Scholar
- Francesco Marra, Diego Gragnaniello, Davide Cozzolino, and Luisa Verdoliva. 2018. Detection of gan-generated fake images over social networks. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE, 384–389. https://doi.org/10.1109/MIPR.2018.00084Google ScholarCross Ref
- Momina Masood, Mariam Nawaz, Khalid Mahmood Malik, Ali Javed, Aun Irtaza, and Hafiz Malik. 2022. Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward. Applied Intelligence(2022), 1–53. https://doi.org/10.1007/s10489-022-03766-zGoogle ScholarDigital Library
- Falko Matern, Christian Riess, and Marc Stamminger. 2019. Exploiting visual artifacts to expose deepfakes and face manipulations. In 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW). IEEE, 83–92. https://doi.org/10.1109/WACVW.2019.00020Google ScholarCross Ref
- Scott McCloskey and Michael Albright. 2018. Detecting gan-generated imagery using color cues. arXiv preprint arXiv:1812.08247(2018).Google Scholar
- Yisroel Mirsky and Wenke Lee. 2021. The creation and detection of deepfakes: A survey. ACM Computing Surveys (CSUR) 54, 1 (2021), 1–41. https://doi.org/10.1145/3425780Google ScholarDigital Library
- Huaxiao Mo, Bolin Chen, and Weiqi Luo. 2018. Fake faces identification via convolutional neural network. In Proceedings of the 6th ACM Workshop on Information Hiding and Multimedia Security. 43–47. https://doi.org/10.1145/3206004.3206009Google ScholarDigital Library
- Joao C Neves, Ruben Tolosana, Ruben Vera-Rodriguez, Vasco Lopes, Hugo Proença, and Julian Fierrez. 2020. Ganprintr: Improved fakes and evaluation of the state of the art in face manipulation detection. IEEE Journal of Selected Topics in Signal Processing 14, 5(2020), 1038–1048. https://doi.org/10.1109/JSTSP.2020.3007250Google ScholarCross Ref
- Huy H Nguyen, Fuming Fang, Junichi Yamagishi, and Isao Echizen. 2019. Multi-task learning for detecting and segmenting manipulated facial images and videos. In 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS). IEEE, 1–8. https://doi.org/10.1109/BTAS46853.2019.9185974Google ScholarDigital Library
- Huy H Nguyen, Junichi Yamagishi, and Isao Echizen. 2019. Capsule-forensics: Using capsule networks to detect forged images and videos. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2307–2311. https://doi.org/10.1109/ICASSP.2019.8682602Google ScholarCross Ref
- Huy H Nguyen, Junichi Yamagishi, and Isao Echizen. 2019. Use of a capsule network to detect fake images and videos. arXiv preprint arXiv:1910.12467(2019).Google Scholar
- Xunyu Pan, Xing Zhang, and Siwei Lyu. 2012. Exposing image splicing with inconsistent local noise variances. In 2012 IEEE International Conference on Computational Photography (ICCP). IEEE, 1–10. https://doi.org/10.1109/ICCPhot.2012.6215223Google ScholarCross Ref
- Nicolas Rahmouni, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2017. Distinguishing computer graphics from natural images using convolution neural networks. In 2017 IEEE workshop on information forensics and security (WIFS). IEEE, 1–6. https://doi.org/10.1109/WIFS.2017.8267647Google ScholarCross Ref
- Judith A Redi, Wiem Taktak, and Jean-Luc Dugelay. 2011. Digital image forensics: a booklet for beginners. Multimedia Tools and Applications 51, 1 (2011), 133–162. https://doi.org/10.1007/s11042-010-0620-1Google ScholarDigital Library
- Andreas Rössler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2018. Faceforensics: A large-scale video dataset for forgery detection in human faces. arXiv preprint arXiv:1803.09179(2018).Google Scholar
- Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE International Conference on Computer Vision. 1–11. https://doi.org/10.1109/ICCV.2019.00009Google ScholarCross Ref
- Ritaban Roy, Indu Joshi, Abhijit Das, and Antitza Dantcheva. 2022. 3D CNN architectures and attention mechanisms for deepfake detection. In Handbook of Digital Face Manipulation and Detection: From DeepFakes to Morphing Attacks. Springer International Publishing Cham, 213–234.Google Scholar
- Ekraam Sabir, Jiaxin Cheng, Ayush Jaiswal, Wael AbdAlmageed, Iacopo Masi, and Prem Natarajan. 2019. Recurrent convolutional strategies for face manipulation detection in videos. Interfaces (GUI) 3, 1 (2019), 80–87.Google Scholar
- Rulin Shao, Zhouxing Shi, Jinfeng Yi, Pin-Yu Chen, and Cho-Jui Hsieh. 2021. On the adversarial robustness of visual transformers. arXiv preprint arXiv:2103.15670 2, 7 (2021).Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).Google Scholar
- Marina Sokolova, Nathalie Japkowicz, and Stan Szpakowicz. 2006. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In AI 2006: Advances in Artificial Intelligence: 19th Australian Joint Conference on Artificial Intelligence, Hobart, Australia, December 4-8, 2006. Proceedings 19. Springer, 1015–1021. https://doi.org/=10.1007/11941439_114Google ScholarDigital Library
- Ke Sun, Taiping Yao, Shen Chen, Shouhong Ding, Jilin Li, and Rongrong Ji. 2022. Dual contrastive learning for general face forgery detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 2316–2324.Google ScholarCross Ref
- Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 31. https://doi.org/10.1609/aaai.v31i1.11231Google ScholarCross Ref
- Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, 6105–6114.Google Scholar
- Justus Thies, Michael Zollhöfer, Christian Theobalt, Marc Stamminger, and Matthias Nießner. 2018. Headon: Real-time reenactment of human portrait videos. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1–13. https://doi.org/10.1145/3197517.3201350Google ScholarDigital Library
- Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, Aythami Morales, and Javier Ortega-Garcia. 2020. Deepfakes and beyond: A survey of face manipulation and fake detection. Information Fusion 64(2020), 131–148. https://doi.org/10.1016/j.inffus.2020.06.014Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017). https://doi.org/doi/10.5555/3295222.3295349Google Scholar
- Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning. 1096–1103. https://doi.org/10.1145/1390156.1390294Google ScholarDigital Library
- Chengrui Wang and Weihong Deng. 2021. Representative forgery mining for fake face detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14923–14932.Google ScholarCross Ref
- Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Yu-Gang Jiang, and Ser-Nam Li. 2022. M2tr: Multi-modal multi-scale transformers for deepfake detection. In Proceedings of the 2022 International Conference on Multimedia Retrieval. 615–623. https://doi.org/10.1145/3512527.3531415Google ScholarDigital Library
- Deressa Wodajo and Solomon Atnafu. 2021. Deepfake Video Detection Using Convolutional Vision Transformer. arXiv preprint arXiv:2102.11126(2021).Google Scholar
- Xin Yang, Yuezun Li, and Siwei Lyu. 2019. Exposing deep fakes using inconsistent head poses. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8261–8265. https://doi.org/10.1109/ICASSP.2019.8683164Google ScholarCross Ref
- Ning Yu, Larry S Davis, and Mario Fritz. 2019. Attributing fake images to gans: Learning and analyzing gan fingerprints. In Proceedings of the IEEE/CVF international conference on computer vision. 7556–7566. https://doi.org/10.1109/ICCV.2019.00765Google ScholarCross Ref
- Kun Yuan, Shaopeng Guo, Ziwei Liu, Aojun Zhou, Fengwei Yu, and Wei Wu. 2021. Incorporating convolution designs into visual transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 579–588. https://doi.org/10.1109/ICCV48922.2021.00062Google ScholarCross Ref
- Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE signal processing letters 23, 10 (2016), 1499–1503.Google ScholarCross Ref
- Ying Zhang, Lilei Zheng, and Vrizlynn LL Thing. 2017. Automated face swapping and its detection. In 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP). IEEE, 15–19. https://doi.org/10.1109/SIPROCESS.2017.8124497Google ScholarCross Ref
- Peng Zhou, Xintong Han, Vlad I Morariu, and Larry S Davis. 2017. Two-stream neural networks for tampered face detection. In 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW). IEEE, 1831–1839. https://doi.org/10.1109/CVPRW.2017.229Google ScholarCross Ref
- Yi Zhu and Shawn Newsam. 2017. Densenet for dense flow. In 2017 IEEE international conference on image processing (ICIP). IEEE, 790–794. https://doi.org/10.1109/ICIP.2017.8296389Google ScholarDigital Library
Index Terms
- Deepfake Detection Using Spatiotemporal Transformer
Recommendations
Deep Convolutional Pooling Transformer for Deepfake Detection
Recently, Deepfake has drawn considerable public attention due to security and privacy concerns in social media digital forensics. As the wildly spreading Deepfake videos on the Internet become more realistic, traditional detection techniques have failed ...
Efficient deepfake detection using shallow vision transformer
AbstractDeepfake is a deep learning-based technique that generates fake face images by mimicking the distribution of original images. Deepfake images can be used for malicious intent like creating fake news; hence, it is important to detect them at an ...
DeepFake detection with multi-scale convolution and vision transformer
AbstractWith the help of some modern image generative techniques, it is possible to generate or manipulate image or video contents without introducing any obvious visual artifacts. If these manipulated images/videos are abused, it probably has ...
Comments