Abstract
Tracking multiple athletes in sports videos is a very challenging Multi-Object Tracking (MOT) task, since athletes often have the same appearance and are intimately covered with each other, making a common occlusion problem becomes an abhorrent duplicate detection. In this paper, the duplicate detection is newly and precisely defined as occlusion misreporting on the same athlete by multiple detection boxes in one frame. To address this problem, we meticulously design a novel transformer-based Duplicate Detection Decontaminator (D\(^3\)) for training, and a specific algorithm Rally-Hungarian (RH) for matching. Once duplicate detection occurs, D\(^3\) immediately modifies the procedure by generating enhanced box losses. RH, triggered by the team sports substitution rules, is exceedingly suitable for sports videos. Moreover, to complement the tracking dataset that without shot changes, we release a new dataset based on sports video named RallyTrack. Extensive experiments on RallyTrack show that combining D\(^3\) and RH can dramatically improve the tracking performance with 9.2 in MOTA and 4.5 in HOTA. Meanwhile, experiments on MOT-series and DanceTrack discover that D\(^3\) can accelerate convergence during training, especially saving up to 80\(\%\) of the original training time on MOT17. Finally, our model, which is trained only with volleyball videos, can be applied directly to basketball and soccer videos, which shows the priority of our method. Our dataset is available at https://github.com/heruihr/rallytrack.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bergmann, P., Meinhardt, T., Leal-Taixé, L.: Tracking without bells and whistles. In: International Conference on Computer Vision, pp. 941–951 (2019)
Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J. Image Video Process. 2008, 246309 (2008)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Bewley, A., Ge, Z., Ott, L., Ramos, F.T., Upcroft, B.: Simple online and realtime tracking. In: International Conference on Image Processing, pp. 3464–3468 (2016)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chavdarova, T., et al.: WILDTRACK: a multi-camera HD dataset for dense unscripted pedestrian detection. In: Computer Vision and Pattern Recognition, pp. 5030–5039 (2018)
Chu, P., Wang, J., You, Q., Ling, H., Liu, Z.: TransMOT: spatial-temporal graph transformer for multiple object tracking. arxiv abs/2104.00194 (2021)
Dave, A., Khurana, T., Tokmakov, P., Schmid, C., Ramanan, D.: TAO: a large-scale benchmark for tracking any object. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 436–454. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_26
Dendorfer, P., et al.: MOT20: a benchmark for multi object tracking in crowded scenes. arxiv abs/2003.09003 (2020)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Ellis, A., Ferryman, J.M.: PETS2010 and PETS2009 evaluation of results using individual ground truthed single views. In: International Conference on Advanced Video and Signal-Based Surveillance, pp. 135–142 (2010)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)
Giancola, S., Amine, M., Dghaily, T., Ghanem, B.: SoccerNet: a scalable dataset for action spotting in soccer videos. In: Computer Vision and Pattern Recognition Workshops, pp. 1711–1721 (2018)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics. JMLR Proceedings, vol. 9, pp. 249–256 (2010)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Ho, K., Kardoost, A., Pfreundt, F.-J., Keuper, J., Keuper, M.: A two-stage minimum cost multicut approach to self-supervised multiple person tracking. In: Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J. (eds.) ACCV 2020. LNCS, vol. 12623, pp. 539–557. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69532-3_33
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456 (2015)
Jonathon Luiten, A.H.: Trackeval (2020). https://github.com/JonathonLuiten/TrackEval
Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82D, 35–45 (1960)
Kong, L., Huang, D., Wang, Y.: Long-term action dependence-based hierarchical deep association for multi-athlete tracking in sports videos. IEEE Trans. Image Process. 29, 7957–7969 (2020)
Kong, L., Zhu, M., Ran, N., Liu, Q., He, R.: Online multiple athlete tracking with pose-based long-term temporal dependencies. Sensors 21(1), 197 (2021)
Leal-Taixé, L., Milan, A., Reid, I.D., Roth, S., Schindler, K.: MOTChallenge 2015: towards a benchmark for multi-target tracking. arxiv abs/1504.01942 (2015)
Lee, H., Kim, I., Kim, D.: VAN: versatile affinity network for end-to-end online multi-object tracking. In: Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J. (eds.) ACCV 2020. LNCS, vol. 12623, pp. 576–593. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69532-3_35
Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: International Conference on Computer Vision, pp. 2999–3007 (2017)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019)
Lu, Z., Rathod, V., Votel, R., Huang, J.: RetinaTrack: online single stage joint detection and tracking. In: Computer Vision and Pattern Recognition, pp. 14656–14666 (2020)
Luiten, J., et al.: HOTA: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vision 129, 548–578 (2020)
Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: TrackFormer: multi-object tracking with transformers. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2022
Milan, A., Leal-Taixé, L., Reid, I.D., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. arxiv abs/1603.00831 (2016)
Niu, Z., Gao, X., Tian, Q.: Tactic analysis based on real-world ball trajectory in soccer video. Pattern Recogn. 45(5), 1937–1947 (2012)
Pang, J., et al.: Quasi-dense similarity learning for multiple object tracking. In: Computer Vision and Pattern Recognition, pp. 164–173 (2021)
Peng, J., et al.: Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 145–161. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_9
Redmon, J., Farhadi, A.: YoloV3: an incremental improvement. arxiv abs/1804.02767 (2018)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Conference and Workshop on Neural Information Processing Systems, pp. 91–99 (2015)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I.D., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Computer Vision and Pattern Recognition, pp. 658–666 (2019)
Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Computer Vision and Pattern Recognition, pp. 2443–2451 (2020)
Sun, P., et al.: DanceTrack: multi-object tracking in uniform appearance and diverse motion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Sun, P., et al.: TransTrack: multiple-object tracking with transformer. arxiv abs/2012.15460 (2020)
Tang, P., Wang, C., Wang, X., Liu, W., Zeng, W., Wang, J.: Object detection in videos by high quality object linking. IEEE Trans. Pattern Anal. Mach. Intell. 42(5), 1272–1278 (2020)
Vaswani, A., et al.: Attention is all you need. In: Conference and Workshop on Neural Information Processing Systems, pp. 5998–6008 (2017)
Voigtlaender, P., et al.: MOTS: multi-object tracking and segmentation. In: Computer Vision and Pattern Recognition, pp. 7942–7951 (2019)
Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 107–122. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_7
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: International Conference on Image Processing, pp. 3645–3649 (2017)
Xu, J., Cao, Y., Zhang, Z., Hu, H.: Spatial-temporal relation networks for multi-object tracking. In: International Conference on Computer Vision, pp. 3987–3997 (2019)
Xu, N., et al.: YouTube-VOS: sequence-to-sequence video object segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 603–619. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_36
Yaw, H.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)
Yu, F., Li, W., Li, Q., Liu, Yu., Shi, X., Yan, J.: POI: multiple object tracking with high performance detection and appearance feature. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 36–42. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_3
Yu, F., et al.: BDD100K: a diverse driving dataset for heterogeneous multitask learning. In: Computer Vision and Pattern Recognition, pp. 2633–2642 (2020)
Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., Wei, Y.: MOTR: end-to-end multiple-object tracking with transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, ECCV 2022. LNCS, vol. 13687. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_38
Zhang, Y., et al.: ByteTrack: multi-object tracking by associating every detection box (2022)
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 129(11), 3069–3087 (2021)
Zhang, Z., Cheng, D., Zhu, X., Lin, S., Dai, J.: Integrated object detection and tracking with tracklet-conditioned detection. arxiv abs/1811.11167 (2018)
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arxiv abs/1904.07850 (2019)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (2021)
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant U20B2069 and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 2 (mp4 32684 KB)
Supplementary material 3 (mp4 32452 KB)
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
He, R., Fu, Z., Liu, Q., Wang, Y., Chen, X. (2023). D\(^{{3}}\): Duplicate Detection Decontaminator for Multi-Athlete Tracking in Sports Videos. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13847. Springer, Cham. https://doi.org/10.1007/978-3-031-26293-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-26293-7_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26292-0
Online ISBN: 978-3-031-26293-7
eBook Packages: Computer ScienceComputer Science (R0)