D $$^{{3}}$$ : Duplicate Detection Decontaminator for Multi-Athlete Tracking in Sports Videos

He, Rui; Fu, Zehua; Liu, Qingjie; Wang, Yunhong; Chen, Xunxun

doi:10.1007/978-3-031-26293-7_28

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13847))

Included in the following conference series:

Asian Conference on Computer Vision

367 Accesses
1 Citations

Abstract

Tracking multiple athletes in sports videos is a very challenging Multi-Object Tracking (MOT) task, since athletes often have the same appearance and are intimately covered with each other, making a common occlusion problem becomes an abhorrent duplicate detection. In this paper, the duplicate detection is newly and precisely defined as occlusion misreporting on the same athlete by multiple detection boxes in one frame. To address this problem, we meticulously design a novel transformer-based Duplicate Detection Decontaminator (D$^3$) for training, and a specific algorithm Rally-Hungarian (RH) for matching. Once duplicate detection occurs, D$^3$ immediately modifies the procedure by generating enhanced box losses. RH, triggered by the team sports substitution rules, is exceedingly suitable for sports videos. Moreover, to complement the tracking dataset that without shot changes, we release a new dataset based on sports video named RallyTrack. Extensive experiments on RallyTrack show that combining D$^3$ and RH can dramatically improve the tracking performance with 9.2 in MOTA and 4.5 in HOTA. Meanwhile, experiments on MOT-series and DanceTrack discover that D$^3$ can accelerate convergence during training, especially saving up to 80$\%$ of the original training time on MOT17. Finally, our model, which is trained only with volleyball videos, can be applied directly to basketball and soccer videos, which shows the priority of our method. Our dataset is available at https://github.com/heruihr/rallytrack.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bergmann, P., Meinhardt, T., Leal-Taixé, L.: Tracking without bells and whistles. In: International Conference on Computer Vision, pp. 941–951 (2019)
Google Scholar
Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J. Image Video Process. 2008, 246309 (2008)
Article Google Scholar
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Chapter Google Scholar
Bewley, A., Ge, Z., Ott, L., Ramos, F.T., Upcroft, B.: Simple online and realtime tracking. In: International Conference on Image Processing, pp. 3464–3468 (2016)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chavdarova, T., et al.: WILDTRACK: a multi-camera HD dataset for dense unscripted pedestrian detection. In: Computer Vision and Pattern Recognition, pp. 5030–5039 (2018)
Google Scholar
Chu, P., Wang, J., You, Q., Ling, H., Liu, Z.: TransMOT: spatial-temporal graph transformer for multiple object tracking. arxiv abs/2104.00194 (2021)
Google Scholar
Dave, A., Khurana, T., Tokmakov, P., Schmid, C., Ramanan, D.: TAO: a large-scale benchmark for tracking any object. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 436–454. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_26
Chapter Google Scholar
Dendorfer, P., et al.: MOT20: a benchmark for multi object tracking in crowded scenes. arxiv abs/2003.09003 (2020)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Google Scholar
Ellis, A., Ferryman, J.M.: PETS2010 and PETS2009 evaluation of results using individual ground truthed single views. In: International Conference on Advanced Video and Signal-Based Surveillance, pp. 135–142 (2010)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)
Google Scholar
Giancola, S., Amine, M., Dghaily, T., Ghanem, B.: SoccerNet: a scalable dataset for action spotting in soccer videos. In: Computer Vision and Pattern Recognition Workshops, pp. 1711–1721 (2018)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics. JMLR Proceedings, vol. 9, pp. 249–256 (2010)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Ho, K., Kardoost, A., Pfreundt, F.-J., Keuper, J., Keuper, M.: A two-stage minimum cost multicut approach to self-supervised multiple person tracking. In: Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J. (eds.) ACCV 2020. LNCS, vol. 12623, pp. 539–557. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69532-3_33
Chapter Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456 (2015)
Google Scholar
Jonathon Luiten, A.H.: Trackeval (2020). https://github.com/JonathonLuiten/TrackEval
Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82D, 35–45 (1960)
Article MathSciNet Google Scholar
Kong, L., Huang, D., Wang, Y.: Long-term action dependence-based hierarchical deep association for multi-athlete tracking in sports videos. IEEE Trans. Image Process. 29, 7957–7969 (2020)
Article MATH Google Scholar
Kong, L., Zhu, M., Ran, N., Liu, Q., He, R.: Online multiple athlete tracking with pose-based long-term temporal dependencies. Sensors 21(1), 197 (2021)
Article Google Scholar
Leal-Taixé, L., Milan, A., Reid, I.D., Roth, S., Schindler, K.: MOTChallenge 2015: towards a benchmark for multi-target tracking. arxiv abs/1504.01942 (2015)
Google Scholar
Lee, H., Kim, I., Kim, D.: VAN: versatile affinity network for end-to-end online multi-object tracking. In: Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J. (eds.) ACCV 2020. LNCS, vol. 12623, pp. 576–593. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69532-3_35
Chapter Google Scholar
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: International Conference on Computer Vision, pp. 2999–3007 (2017)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019)
Google Scholar
Lu, Z., Rathod, V., Votel, R., Huang, J.: RetinaTrack: online single stage joint detection and tracking. In: Computer Vision and Pattern Recognition, pp. 14656–14666 (2020)
Google Scholar
Luiten, J., et al.: HOTA: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vision 129, 548–578 (2020)
Article Google Scholar
Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: TrackFormer: multi-object tracking with transformers. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2022
Google Scholar
Milan, A., Leal-Taixé, L., Reid, I.D., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. arxiv abs/1603.00831 (2016)
Google Scholar
Niu, Z., Gao, X., Tian, Q.: Tactic analysis based on real-world ball trajectory in soccer video. Pattern Recogn. 45(5), 1937–1947 (2012)
Article Google Scholar
Pang, J., et al.: Quasi-dense similarity learning for multiple object tracking. In: Computer Vision and Pattern Recognition, pp. 164–173 (2021)
Google Scholar
Peng, J., et al.: Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 145–161. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_9
Chapter Google Scholar
Redmon, J., Farhadi, A.: YoloV3: an incremental improvement. arxiv abs/1804.02767 (2018)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Conference and Workshop on Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I.D., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Computer Vision and Pattern Recognition, pp. 658–666 (2019)
Google Scholar
Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Computer Vision and Pattern Recognition, pp. 2443–2451 (2020)
Google Scholar
Sun, P., et al.: DanceTrack: multi-object tracking in uniform appearance and diverse motion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Google Scholar
Sun, P., et al.: TransTrack: multiple-object tracking with transformer. arxiv abs/2012.15460 (2020)
Google Scholar
Tang, P., Wang, C., Wang, X., Liu, W., Zeng, W., Wang, J.: Object detection in videos by high quality object linking. IEEE Trans. Pattern Anal. Mach. Intell. 42(5), 1272–1278 (2020)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Conference and Workshop on Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Voigtlaender, P., et al.: MOTS: multi-object tracking and segmentation. In: Computer Vision and Pattern Recognition, pp. 7942–7951 (2019)
Google Scholar
Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 107–122. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_7
Chapter Google Scholar
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: International Conference on Image Processing, pp. 3645–3649 (2017)
Google Scholar
Xu, J., Cao, Y., Zhang, Z., Hu, H.: Spatial-temporal relation networks for multi-object tracking. In: International Conference on Computer Vision, pp. 3987–3997 (2019)
Google Scholar
Xu, N., et al.: YouTube-VOS: sequence-to-sequence video object segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 603–619. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_36
Chapter Google Scholar
Yaw, H.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)
Google Scholar
Yu, F., Li, W., Li, Q., Liu, Yu., Shi, X., Yan, J.: POI: multiple object tracking with high performance detection and appearance feature. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 36–42. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_3
Chapter Google Scholar
Yu, F., et al.: BDD100K: a diverse driving dataset for heterogeneous multitask learning. In: Computer Vision and Pattern Recognition, pp. 2633–2642 (2020)
Google Scholar
Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., Wei, Y.: MOTR: end-to-end multiple-object tracking with transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, ECCV 2022. LNCS, vol. 13687. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_38
Zhang, Y., et al.: ByteTrack: multi-object tracking by associating every detection box (2022)
Google Scholar
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 129(11), 3069–3087 (2021)
Article Google Scholar
Zhang, Z., Cheng, D., Zhu, X., Lin, S., Dai, J.: Integrated object detection and tracking with tracklet-conditioned detection. arxiv abs/1811.11167 (2018)
Google Scholar
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Chapter Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arxiv abs/1904.07850 (2019)
Google Scholar
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (2021)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant U20B2069 and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Laboratory of Intelligent Recognition and Image Processing (IRIP Lab), Beihang University (BUAA), Xueyuan Road No. 37, Haidian District, Beijing, China
Rui He, Zehua Fu, Qingjie Liu & Yunhong Wang
Hangzhou Innovation Institute, Beihang University, Hangzhou, China
Zehua Fu & Qingjie Liu
National Computer Network Emergency Response Technical Team/Coordination Center of China (CNCERT or CNCERT/CC), Beijing, China
Xunxun Chen

Authors

Rui He
View author publications
You can also search for this author in PubMed Google Scholar
Zehua Fu
View author publications
You can also search for this author in PubMed Google Scholar
Qingjie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yunhong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xunxun Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingjie Liu .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 32684 KB)

Supplementary material 3 (mp4 32452 KB)

Supplementary material 1 (pdf 556 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, R., Fu, Z., Liu, Q., Wang, Y., Chen, X. (2023). D$^{{3}}$: Duplicate Detection Decontaminator for Multi-Athlete Tracking in Sports Videos. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13847. Springer, Cham. https://doi.org/10.1007/978-3-031-26293-7_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-26293-7_28
Published: 11 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26292-0
Online ISBN: 978-3-031-26293-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

D\(^{{3}}\): Duplicate Detection Decontaminator for Multi-Athlete Tracking in Sports Videos

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 556 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

D\(^{{3}}\): Duplicate Detection Decontaminator for Multi-Athlete Tracking in Sports Videos

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 556 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation