skip to main content
10.1145/3604078.3604170acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdipConference Proceedingsconference-collections
research-article

A Cycle Architecture Based on Policy Gradient for Unsupervised Video Summarization

Published:26 October 2023Publication History

ABSTRACT

This paper proposes a cycle architecture based on policy gradients for unsupervised video summarization. Specifically, the Modified DSNet and DSN-attention net constitute a cycle architecture and promote each other in the training stage to achieve higher performance compared with the unsupervised methods that formulate video summarization as a sequential decision-making process. In the training stage, the DSN-attention net is trained by the policy gradient in combination with the additional MSE loss between the two outputs of the modified DSNet and DSN-attention net. Then the output of the DSN-attention net is taken for generating the labels to train the modified DSNet. As a result, a cycle architecture is built up for unsupervised video summarization. At the test stage, the final video summary is produced by the average fusion of the outputs of both the Modified DSNet and DSN attention net. Extensive experiments and analysis on two benchmark datasets demonstrate the effectiveness of our method and its superior performance in comparison with the state-of-the-art unsupervised methods.

References

  1. Dong Liang, Zongqi Wei, Han Sun, Huiyu Zhou. 2021. Robust Cross-Scene Foreground Segmentation in Surveillance Video. in 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China.Google ScholarGoogle Scholar
  2. Dong Liang, Bin Kang, Xinyu Liu, Pan Gao, Xiaoyang Tan, Shun'ichi Kaneko. 2021. Cross-scene foreground segmentation with supervised and unsupervised model communication. Pattern Recognition, vol. 117.Google ScholarGoogle Scholar
  3. E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris and I. Patras. Nov. 2021. Video Summarization Using Deep Neural Networks: A Survey. in Proceedings of the IEEE, vol. 109, no. 11, pp. 1838-1863, doi: 10.1109/JPROC.2021.3117472.Google ScholarGoogle ScholarCross RefCross Ref
  4. K. Zhang, W.-L. Chao, F. Sha, and K. Grauman. 2016. Video summarization with long short-term memory. in Proc. ECCV, Amsterdam, Netherlands, pp. 766–782, doi: 10.1007/978-3-319-46478_747.Google ScholarGoogle ScholarCross RefCross Ref
  5. Yi-Nung Chung, Tun-Chang Lu, Ming-Tsung Yeh, Yu-Xian Huang, and Chun-Yi Wu. June. 2015. Applying the Video Summarization Algorithm to Surveillance Systems. Journal of Image and Graphics, vol. 3, no. 1, pp. 20-24, doi: 10.18178/joig.3.1.20-24.Google ScholarGoogle ScholarCross RefCross Ref
  6. Mohamed Maher Ben Ismail and Ouiem Bchir. June 2015. CE Video Summarization Using Relational Motion Histogram Descriptor. Journal of Image and Graphics, vol. 3, no. 1, pp. 34-39, doi: 10.18178/joig.3.1.34-39.Google ScholarGoogle ScholarCross RefCross Ref
  7. E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris and I. Patras. Aug. 2021. AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization. IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 8, pp. 3278-3292, doi: 10.1109/TCSVT.2020.3037883.Google ScholarGoogle ScholarCross RefCross Ref
  8. B. Mahasseni, M. Lam, and S. Todorovic, Jul. 2017. Unsupervised video summarization with adversarial LSTM networks. in Proc. CVPR, Hawaii, USA, pp. 2982-2991, doi: 10.1109/CVPR.2017.318.Google ScholarGoogle ScholarCross RefCross Ref
  9. Y. Jung, D. Cho, D. Kim, S. Woo, I.S. Kweon. 2019. Discriminative feature learning for unsupervised video summarization. in Proc. AAAI, Hawaii, USA, pp. 8537-8544.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Li, Q. Ye, L. Zhang, Y. Li, X. Xu and L. Shao. 2021. Exploring global diverse attention via pairwise temporal relation for video summarization. Pattern Recognition, vol. 111, doi: 10.1016/j.patcog.2020.107677.Google ScholarGoogle ScholarCross RefCross Ref
  11. K. Zhou, Y. Qiao, and T. Xiang. 2018. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. in Proc. AAAI, New Orleans, Louisiana, USA, pp. 7582–7589.Google ScholarGoogle Scholar
  12. T. Liu, Q. Meng, J. -J. Huang, A. Vlontzos, D. Rueckert and B. Kainz. 2022. Video Summarization Through Reinforcement Learning With a 3D Spatio-Temporal U-Net. in IEEE Transactions on Image Processing, vol. 31, pp. 1573-1586, doi: 10.1109/TIP.2022.3143699.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Zhu, J. Lu, J. Li and J. Zhou. 2021. DSNet: A Flexible Detect-to-Summarize Network for Video Summarization. IEEE Trans. Image Process., vol. 30, pp. 948-962, doi: 10.1109/TIP.2020.3039886.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Szegedy, W. Liu, Y. 2015. Going deeper with convolutions. in Proc. CVPR, Boston, MA, USA, pp. 1-9, doi: 10.1109/CVPR.2015.7298594.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Ren, K. He, R. Girshick, J. Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. in Proc. NIPS, Montréal, CANADA, pp. 91-99.Google ScholarGoogle Scholar
  16. A. Vaswani, N. Shazeer, N. Parmar, 2017. Attention is all you need. in Proc. NIPS, Long Beach, CA, USA, pp. 5999-6009.Google ScholarGoogle Scholar
  17. D. Potapov, M. Douze, Z. Harchaoui, and C. Schmid. 2014. Category-specific video summarization. in Proc. ECCV, Zurich, Switzerland, pp. 540–555, doi: 10.1007/978-3-319-10599-4_35.Google ScholarGoogle ScholarCross RefCross Ref
  18. T. Lin, P. Goyal, R. Girshick, K. He and P. Dollar. 2017. Focal Loss for Dense Object Detection. in Proc. ICCV, Venice, Italy, pp. 2999-3007, doi: 10.1109/ICCV.2017.324.Google ScholarGoogle ScholarCross RefCross Ref
  19. Y. Song, J. Vallmitjana, A. Stent, A. Jaimes. 2015. TVSum: Summarizing web videos using titles. in Proc. CVPR, Boston, MA, USA, pp. 5179-5187, doi: 10.1109/CVPR.2015.7299154.Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Gygli, H. Grabner, H. Riemenschneider, L. Van Gool. 2014. Creating summaries from user videos. in Proc. ECCV, Zurich, Switzerland, pp. 505-520, doi: 10.1007/978-3-319-10584-0_33.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. E. F. De Avila, A. P. B. Lopes, A. Da Luz Jr., Jan. 2011. VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters, vol. 32, no. 1, pp. 56–68, doi: 10.1016/j.patrec.2010.08.004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Guoqiang Zhang. 2023. On Suppressing Range of Adaptive Stepsizes of Adam to Improve Generalisation Performance. arXiv preprint arXiv:2302.01029, doi: 10.48550/arXiv.2302.01029.Google ScholarGoogle ScholarCross RefCross Ref
  23. R. Zhong, R. Wang, Y. Zou, Z. Hong and M. Hu. 2022. Graph Attention Networks Adjusted Bi-LSTM for Video Summarization. IEEE Signal Processing Letters, doi: 10.1109/LSP.2021.3066349.Google ScholarGoogle ScholarCross RefCross Ref
  24. M. Hu, R. Hu, Z. Wang, 2022. Spatiotemporal two-stream LSTM network for unsupervised video summarization. Multimedia Tools and Applications, doi: 10.1007/s11042-022-12901-4.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Cycle Architecture Based on Policy Gradient for Unsupervised Video Summarization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing
      May 2023
      711 pages
      ISBN:9798400708237
      DOI:10.1145/3604078

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 October 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)15
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format