skip to main content
10.1145/3369412.3395070acmconferencesArticle/Chapter ViewAbstractPublication Pagesih-n-mmsecConference Proceedingsconference-collections
short-paper

Exploiting Prediction Error Inconsistencies through LSTM-based Classifiers to Detect Deepfake Videos

Authors Info & Claims
Published:23 June 2020Publication History

ABSTRACT

The ability of artificial intelligence techniques to build synthesized brand new videos or to alter the facial expression of already existing ones has been efficiently demonstrated in the literature. The identification of such new threat generally known as Deepfake, but consisting of different techniques, is fundamental in multimedia forensics. In fact this kind of manipulated information could undermine and easily distort the public opinion on a certain person or about a specific event. Thus, in this paper, a new technique able to distinguish synthetic generated portrait videos from natural ones is introduced by exploiting inconsistencies due to the prediction error in the re-encoding phase. In particular, features based on inter-frame prediction error have been investigated jointly with a Long Short-Term Memory (LSTM) model network able to learn the temporal correlation among consecutive frames. Preliminary results have demonstrated that such sequence-based approach, used to distinguish between original and manipulated videos, highlights promising performances.

References

  1. Darius Afchar, Vincent Nozick, Junichi Yamagishi, and I Echizen. 2018. MesoNet: a Compact Facial Video Forgery Detection Network. 1--7. https://doi.org/10.1109/WIFS.2018.8630761Google ScholarGoogle Scholar
  2. Shruti Agarwal, Hany Farid, Yuming Gu, Mingming He, Koki Nagano, and Hao Li. 2019. Protecting World Leaders Against Deep Fakes. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.Google ScholarGoogle Scholar
  3. I. Amerini, R. Caldelli, V. Cappellini, F. Picchioni, and A. Piva. 2009. Analysis of denoising filters for photo response non uniformity noise extraction in source camera identification. In 2009 16th International Conference on Digital Signal Processing. 1--7. https://doi.org/10.1109/ICDSP.2009.5201240Google ScholarGoogle ScholarCross RefCross Ref
  4. I. Amerini, C. Li, and R. Caldelli. 2019. Social Network Identification Through Image Classification With CNN. IEEE Access, Vol. 7 (2019), 35264--35273. https://doi.org/10.1109/ACCESS.2019.2903876Google ScholarGoogle ScholarCross RefCross Ref
  5. Belhassen Bayar and Matthew C. Stamm. 2016. A Deep Learning Approach to Universal Image Manipulation Detection Using a New Convolutional Layer. In Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security (Vigo, Galicia, Spain) (MMSec '16). New York, NY, USA, 5--10. https://doi.org/10.1145/2909827.2930786Google ScholarGoogle Scholar
  6. A. Bharati, R. Singh, M. Vatsa, and K. W. Bowyer. 2016. Detecting Facial Retouching Using Supervised Deep Learning. IEEE Transactions on Information Forensics and Security, Vol. 11, 9 (Sep. 2016), 1903--1913. https://doi.org/10.1109/TIFS.2016.2561898Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Caroline Chan, Shiry Ginosar, Tinghui Zhou, and Alexei A. Efros. 2018. Everybody Dance Now. CoRR, Vol. abs/1808.07371 (2018). arxiv: 1808.07371 http://arxiv.org/abs/1808.07371Google ScholarGoogle Scholar
  8. M. Chen, J. Fridrich, M. Goljan, and J. Lukas. 2008. Determining Image Origin and Integrity Using Sensor Noise. IEEE Transactions on Information Forensics and Security, Vol. 3, 1 (2008), 74--90.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Francois Chollet. 2016. Xception: Deep Learning with Depthwise Separable Convolutions. arxiv: cs.CV/1610.02357Google ScholarGoogle Scholar
  10. V. Conotter, E. Bodnari, G. Boato, and H. Farid. 2014. Physiologically-based detection of computer generated faces in video. In 2014 IEEE International Conference on Image Processing (ICIP). 248--252. https://doi.org/10.1109/ICIP.2014.7025049Google ScholarGoogle ScholarCross RefCross Ref
  11. Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2017. Recasting Residual-based Local Descriptors as Convolutional Neural Networks: an Application to Image Forgery Detection. arxiv: cs.CV/1703.04615Google ScholarGoogle Scholar
  12. D. Dang-Nguyen, G. Boato, and F. G. B. De Natale. 2012. Discrimination between computer generated and natural human faces based on asymmetry information. In 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO). 1234--1238.Google ScholarGoogle Scholar
  13. Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2014. Long-term Recurrent Convolutional Networks for Visual Recognition and Description. CoRR, Vol. abs/1411.4389 (2014). arxiv: 1411.4389 http://arxiv.org/abs/1411.4389Google ScholarGoogle Scholar
  14. J. Fridrich and J. Kodovsky. 2012. Rich Models for Steganalysis of Digital Images. IEEE Transactions on Information Forensics and Security, Vol. 7, 3 (June 2012), 868--882. https://doi.org/10.1109/TIFS.2012.2190402Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber. 2017. LS™: A Search Space Odyssey. IEEE Transactions on Neural Networks and Learning Systems, Vol. 28, 10 (Oct 2017), 2222--2232. https://doi.org/10.1109/TNNLS.2016.2582924Google ScholarGoogle ScholarCross RefCross Ref
  16. D. Guera and E. J. Delp. 2018. Deepfake Video Detection Using Recurrent Neural Networks. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1--6. https://doi.org/10.1109/AVSS.2018.8639163Google ScholarGoogle Scholar
  17. N. Khanna, G. T.-C. Chiu, J. P. Allebach, and E. J. Delp. 2008. Forensic techniques for classifying scanner, computer generated and digital camera images. In Proc. of IEEE ICASSP. Las Vegas, USA.Google ScholarGoogle Scholar
  18. Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Niessner, Patrick Pérez, Christian Richardt, Michael Zollhöfer, and Christian Theobalt. 2018. Deep Video Portraits. ACM Trans. Graph., Vol. 37, 4, Article 163 (July 2018), 14 pages. https://doi.org/10.1145/3197517.3201283Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Iryna Korshunova, Wenzhe Shi, Joni Dambre, and Lucas Theis. 2016. Fast Face-swap Using Convolutional Neural Networks. arxiv: cs.CV/1611.09577Google ScholarGoogle Scholar
  20. S. Lyu and H. Farid. 2005. How realistic is photorealistic? IEEE Transactions on Signal Processing, Vol. 53, 2 (2005), 845--850.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Francesco Marra, Cristiano Saltori, Giulia Boato, and Luisa Verdoliva. 2019. Incremental learning for the detection and classification of GAN-generated images. arxiv: cs.CV/1910.01568Google ScholarGoogle Scholar
  22. F. Matern, C. Riess, and M. Stamminger. 2019. Exploiting Visual Artifacts to Expose Deepfakes and Face Manipulations. In 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW). 83--92. https://doi.org/10.1109/WACVW.2019.00020Google ScholarGoogle ScholarCross RefCross Ref
  23. Tian-Tsong Ng, Shih-Fu Chang, Jessie Hsu, Lexing Xie, and Mao-Pei Tsui. 2005. Physics-motivated Features for Distinguishing Photographic Images and Computer Graphics. In Proceedings of the 13th Annual ACM International Conference on Multimedia (Hilton, Singapore) (MULTIMEDIA '05). ACM, New York, NY, USA, 239--248. https://doi.org/10.1145/1101149.1101192Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Feng Pan and Jiwu Huang. 2011. Discriminating Computer Graphics Images and Natural Images Using Hidden Markov Tree Model. In Digital Watermarking, Hyoung-Joong Kim, Yun Qing Shi, and Mauro Barni (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 23--28.Google ScholarGoogle Scholar
  25. N. Rahmouni, V. Nozick, J. Yamagishi, and I. Echizen. 2017. Distinguishing computer graphics from natural images using convolution neural networks. In 2017 IEEE Workshop on Information Forensics and Security (WIFS). 1--6. https://doi.org/10.1109/WIFS.2017.8267647Google ScholarGoogle ScholarCross RefCross Ref
  26. Andreas Rö ssler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2018. FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces. CoRR, Vol. abs/1803.09179 (2018). arxiv: 1803.09179 http://arxiv.org/abs/1803.09179Google ScholarGoogle Scholar
  27. Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Niessner. 2019. FaceForensics+: Learning to Detect Manipulated Facial Images. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  28. Ekraam Sabir, Jiaxin Cheng, Ayush Jaiswal, Wael AbdAlmageed, Iacopo Masi, and Prem Natarajan. 2019. Recurrent Convolutional Strategies for Face Manipulation Detection in Videos. arxiv: cs.CV/1905.00582Google ScholarGoogle Scholar
  29. M. C. Stamm, W. S. Lin, and K. J. R. Liu. 2012. Temporal Forensics and Anti-Forensics for Motion Compensated Video. IEEE Transactions on Information Forensics and Security, Vol. 7, 4 (Aug 2012), 1315--1329. https://doi.org/10.1109/TIFS.2012.2205568Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Niessner. 2016. Demo of Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In ACM SIGGRAPH 2016 Emerging Technologies (Anaheim, California) (SIGGRAPH '16). ACM, New York, NY, USA, Article 5, 2 pages. https://doi.org/10.1145/2929464.2929475Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A. Efros. 2019. CNN-generated images are surprisingly easy to spot... for now. arxiv: cs.CV/1912.11035Google ScholarGoogle Scholar
  32. Weihong Wang and Hany Farid. 2006. Exposing Digital Forgeries in Video by Detecting Double MPEG Compression. MM and Sec, Vol. 2006, 37--47. https://doi.org/10.1145/1161366.1161375Google ScholarGoogle Scholar
  33. X. Yang, Y. Li, and S. Lyu. 2019. Exposing Deep Fakes Using Inconsistent Head Poses. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 8261--8265. https://doi.org/10.1109/ICASSP.2019.8683164Google ScholarGoogle ScholarCross RefCross Ref
  1. Exploiting Prediction Error Inconsistencies through LSTM-based Classifiers to Detect Deepfake Videos

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      IH&MMSec '20: Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security
      June 2020
      177 pages
      ISBN:9781450370509
      DOI:10.1145/3369412

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 June 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate128of318submissions,40%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader