research-article

A Cycle Architecture Based on Policy Gradient for Unsupervised Video Summarization

Authors:
Yubo An

School of Information and Electronics, Beijing Institute of Technology, China

School of Information and Electronics, Beijing Institute of Technology, China

0000-0001-6111-697X
View Profile

,
Shenghui Zhao

School of Information and Electronics, Beijing Institute of Technology, China

School of Information and Electronics, Beijing Institute of Technology, China

0000-0002-0844-2319
View Profile

,
Guoqiang Zhang

School of Electrical and Data Engineering, University of Technology Sydney, Australia

School of Electrical and Data Engineering, University of Technology Sydney, Australia

0000-0003-4521-542X
View Profile

ICDIP '23: Proceedings of the 15th International Conference on Digital Image ProcessingMay 2023Article No.: 91Pages 1–7https://doi.org/10.1145/3604078.3604170

Published:26 October 2023Publication History

ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing

Pages 1–7

ABSTRACT

This paper proposes a cycle architecture based on policy gradients for unsupervised video summarization. Specifically, the Modified DSNet and DSN-attention net constitute a cycle architecture and promote each other in the training stage to achieve higher performance compared with the unsupervised methods that formulate video summarization as a sequential decision-making process. In the training stage, the DSN-attention net is trained by the policy gradient in combination with the additional MSE loss between the two outputs of the modified DSNet and DSN-attention net. Then the output of the DSN-attention net is taken for generating the labels to train the modified DSNet. As a result, a cycle architecture is built up for unsupervised video summarization. At the test stage, the final video summary is produced by the average fusion of the outputs of both the Modified DSNet and DSN attention net. Extensive experiments and analysis on two benchmark datasets demonstrate the effectiveness of our method and its superior performance in comparison with the state-of-the-art unsupervised methods.

References

Dong Liang, Zongqi Wei, Han Sun, Huiyu Zhou. 2021. Robust Cross-Scene Foreground Segmentation in Surveillance Video. in 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China.Google Scholar
Dong Liang, Bin Kang, Xinyu Liu, Pan Gao, Xiaoyang Tan, Shun'ichi Kaneko. 2021. Cross-scene foreground segmentation with supervised and unsupervised model communication. Pattern Recognition, vol. 117.Google Scholar
E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris and I. Patras. Nov. 2021. Video Summarization Using Deep Neural Networks: A Survey. in Proceedings of the IEEE, vol. 109, no. 11, pp. 1838-1863, doi: 10.1109/JPROC.2021.3117472.Google ScholarCross Ref
K. Zhang, W.-L. Chao, F. Sha, and K. Grauman. 2016. Video summarization with long short-term memory. in Proc. ECCV, Amsterdam, Netherlands, pp. 766–782, doi: 10.1007/978-3-319-46478_747.Google ScholarCross Ref
Yi-Nung Chung, Tun-Chang Lu, Ming-Tsung Yeh, Yu-Xian Huang, and Chun-Yi Wu. June. 2015. Applying the Video Summarization Algorithm to Surveillance Systems. Journal of Image and Graphics, vol. 3, no. 1, pp. 20-24, doi: 10.18178/joig.3.1.20-24.Google ScholarCross Ref
Mohamed Maher Ben Ismail and Ouiem Bchir. June 2015. CE Video Summarization Using Relational Motion Histogram Descriptor. Journal of Image and Graphics, vol. 3, no. 1, pp. 34-39, doi: 10.18178/joig.3.1.34-39.Google ScholarCross Ref
E. Apostolidis, E. Adamantidou, A. I. Metsai, V. Mezaris and I. Patras. Aug. 2021. AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization. IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 8, pp. 3278-3292, doi: 10.1109/TCSVT.2020.3037883.Google ScholarCross Ref
B. Mahasseni, M. Lam, and S. Todorovic, Jul. 2017. Unsupervised video summarization with adversarial LSTM networks. in Proc. CVPR, Hawaii, USA, pp. 2982-2991, doi: 10.1109/CVPR.2017.318.Google ScholarCross Ref
Y. Jung, D. Cho, D. Kim, S. Woo, I.S. Kweon. 2019. Discriminative feature learning for unsupervised video summarization. in Proc. AAAI, Hawaii, USA, pp. 8537-8544.Google ScholarDigital Library
P. Li, Q. Ye, L. Zhang, Y. Li, X. Xu and L. Shao. 2021. Exploring global diverse attention via pairwise temporal relation for video summarization. Pattern Recognition, vol. 111, doi: 10.1016/j.patcog.2020.107677.Google ScholarCross Ref
K. Zhou, Y. Qiao, and T. Xiang. 2018. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. in Proc. AAAI, New Orleans, Louisiana, USA, pp. 7582–7589.Google Scholar
T. Liu, Q. Meng, J. -J. Huang, A. Vlontzos, D. Rueckert and B. Kainz. 2022. Video Summarization Through Reinforcement Learning With a 3D Spatio-Temporal U-Net. in IEEE Transactions on Image Processing, vol. 31, pp. 1573-1586, doi: 10.1109/TIP.2022.3143699.Google ScholarDigital Library
W. Zhu, J. Lu, J. Li and J. Zhou. 2021. DSNet: A Flexible Detect-to-Summarize Network for Video Summarization. IEEE Trans. Image Process., vol. 30, pp. 948-962, doi: 10.1109/TIP.2020.3039886.Google ScholarDigital Library
C. Szegedy, W. Liu, Y. 2015. Going deeper with convolutions. in Proc. CVPR, Boston, MA, USA, pp. 1-9, doi: 10.1109/CVPR.2015.7298594.Google ScholarCross Ref
S. Ren, K. He, R. Girshick, J. Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. in Proc. NIPS, Montréal, CANADA, pp. 91-99.Google Scholar
A. Vaswani, N. Shazeer, N. Parmar, 2017. Attention is all you need. in Proc. NIPS, Long Beach, CA, USA, pp. 5999-6009.Google Scholar
D. Potapov, M. Douze, Z. Harchaoui, and C. Schmid. 2014. Category-specific video summarization. in Proc. ECCV, Zurich, Switzerland, pp. 540–555, doi: 10.1007/978-3-319-10599-4_35.Google ScholarCross Ref
T. Lin, P. Goyal, R. Girshick, K. He and P. Dollar. 2017. Focal Loss for Dense Object Detection. in Proc. ICCV, Venice, Italy, pp. 2999-3007, doi: 10.1109/ICCV.2017.324.Google ScholarCross Ref
Y. Song, J. Vallmitjana, A. Stent, A. Jaimes. 2015. TVSum: Summarizing web videos using titles. in Proc. CVPR, Boston, MA, USA, pp. 5179-5187, doi: 10.1109/CVPR.2015.7299154.Google ScholarCross Ref
M. Gygli, H. Grabner, H. Riemenschneider, L. Van Gool. 2014. Creating summaries from user videos. in Proc. ECCV, Zurich, Switzerland, pp. 505-520, doi: 10.1007/978-3-319-10584-0_33.Google ScholarCross Ref
S. E. F. De Avila, A. P. B. Lopes, A. Da Luz Jr., Jan. 2011. VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters, vol. 32, no. 1, pp. 56–68, doi: 10.1016/j.patrec.2010.08.004.Google ScholarDigital Library
Guoqiang Zhang. 2023. On Suppressing Range of Adaptive Stepsizes of Adam to Improve Generalisation Performance. arXiv preprint arXiv:2302.01029, doi: 10.48550/arXiv.2302.01029.Google ScholarCross Ref
R. Zhong, R. Wang, Y. Zou, Z. Hong and M. Hu. 2022. Graph Attention Networks Adjusted Bi-LSTM for Video Summarization. IEEE Signal Processing Letters, doi: 10.1109/LSP.2021.3066349.Google ScholarCross Ref
M. Hu, R. Hu, Z. Wang, 2022. Spatiotemporal two-stream LSTM network for unsupervised video summarization. Multimedia Tools and Applications, doi: 10.1007/s11042-022-12901-4.Google ScholarDigital Library

Index Terms

A Cycle Architecture Based on Policy Gradient for Unsupervised Video Summarization
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization

Recommendations

A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization
AI4TV '19: Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery

In this paper we present our work on improving the efficiency of adversarial training for unsupervised video summarization. Our starting point is the SUM-GAN model, which creates a representative summary based on the intuition that such a summary should ...
Read More
Unsupervised Video Summarization via Multi-source Features
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Video summarization aims at generating a compact yet representative visual summary that conveys the essence of the original video. The advantage of unsupervised approaches is that they do not require human annotations to learn the summarization ...
Read More
Unsupervised Extractive Text Summarization with Distance-Augmented Sentence Graphs
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Supervised summarization has made significant improvements in recent years by leveraging cutting-edge deep learning technologies. However, the true success of supervised methods relies on the availability of large quantity of human-generated summaries of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing
May 2023
711 pages
ISBN:9798400708237
DOI:10.1145/3604078

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cycle Architecture
DSN-Attention Net
Modified Dsnet
Shot Level
Unsupervised Video Summarization
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 15
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Cycle Architecture Based on Policy Gradient for Unsupervised Video Summarization

ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization

Unsupervised Video Summarization via Multi-source Features

Unsupervised Extractive Text Summarization with Distance-Augmented Sentence Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Cycle Architecture Based on Policy Gradient for Unsupervised Video Summarization

ICDIP '23: Proceedings of the 15th International Conference on Digital Image Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization

Unsupervised Video Summarization via Multi-source Features

Unsupervised Extractive Text Summarization with Distance-Augmented Sentence Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media