Skip to main content

D\(^{{3}}\): Duplicate Detection Decontaminator for Multi-Athlete Tracking in Sports Videos

  • Conference paper
  • First Online:
Computer Vision – ACCV 2022 (ACCV 2022)

Abstract

Tracking multiple athletes in sports videos is a very challenging Multi-Object Tracking (MOT) task, since athletes often have the same appearance and are intimately covered with each other, making a common occlusion problem becomes an abhorrent duplicate detection. In this paper, the duplicate detection is newly and precisely defined as occlusion misreporting on the same athlete by multiple detection boxes in one frame. To address this problem, we meticulously design a novel transformer-based Duplicate Detection Decontaminator (D\(^3\)) for training, and a specific algorithm Rally-Hungarian (RH) for matching. Once duplicate detection occurs, D\(^3\) immediately modifies the procedure by generating enhanced box losses. RH, triggered by the team sports substitution rules, is exceedingly suitable for sports videos. Moreover, to complement the tracking dataset that without shot changes, we release a new dataset based on sports video named RallyTrack. Extensive experiments on RallyTrack show that combining D\(^3\) and RH can dramatically improve the tracking performance with 9.2 in MOTA and 4.5 in HOTA. Meanwhile, experiments on MOT-series and DanceTrack discover that D\(^3\) can accelerate convergence during training, especially saving up to 80\(\%\) of the original training time on MOT17. Finally, our model, which is trained only with volleyball videos, can be applied directly to basketball and soccer videos, which shows the priority of our method. Our dataset is available at https://github.com/heruihr/rallytrack.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bergmann, P., Meinhardt, T., Leal-Taixé, L.: Tracking without bells and whistles. In: International Conference on Computer Vision, pp. 941–951 (2019)

    Google Scholar 

  2. Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J. Image Video Process. 2008, 246309 (2008)

    Article  Google Scholar 

  3. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56

    Chapter  Google Scholar 

  4. Bewley, A., Ge, Z., Ott, L., Ramos, F.T., Upcroft, B.: Simple online and realtime tracking. In: International Conference on Image Processing, pp. 3464–3468 (2016)

    Google Scholar 

  5. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  6. Chavdarova, T., et al.: WILDTRACK: a multi-camera HD dataset for dense unscripted pedestrian detection. In: Computer Vision and Pattern Recognition, pp. 5030–5039 (2018)

    Google Scholar 

  7. Chu, P., Wang, J., You, Q., Ling, H., Liu, Z.: TransMOT: spatial-temporal graph transformer for multiple object tracking. arxiv abs/2104.00194 (2021)

    Google Scholar 

  8. Dave, A., Khurana, T., Tokmakov, P., Schmid, C., Ramanan, D.: TAO: a large-scale benchmark for tracking any object. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 436–454. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_26

    Chapter  Google Scholar 

  9. Dendorfer, P., et al.: MOT20: a benchmark for multi object tracking in crowded scenes. arxiv abs/2003.09003 (2020)

    Google Scholar 

  10. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Computer Vision and Pattern Recognition, pp. 248–255 (2009)

    Google Scholar 

  11. Ellis, A., Ferryman, J.M.: PETS2010 and PETS2009 evaluation of results using individual ground truthed single views. In: International Conference on Advanced Video and Signal-Based Surveillance, pp. 135–142 (2010)

    Google Scholar 

  12. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)

    Google Scholar 

  13. Giancola, S., Amine, M., Dghaily, T., Ghanem, B.: SoccerNet: a scalable dataset for action spotting in soccer videos. In: Computer Vision and Pattern Recognition Workshops, pp. 1711–1721 (2018)

    Google Scholar 

  14. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics. JMLR Proceedings, vol. 9, pp. 249–256 (2010)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  16. Ho, K., Kardoost, A., Pfreundt, F.-J., Keuper, J., Keuper, M.: A two-stage minimum cost multicut approach to self-supervised multiple person tracking. In: Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J. (eds.) ACCV 2020. LNCS, vol. 12623, pp. 539–557. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69532-3_33

    Chapter  Google Scholar 

  17. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning. JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456 (2015)

    Google Scholar 

  18. Jonathon Luiten, A.H.: Trackeval (2020). https://github.com/JonathonLuiten/TrackEval

  19. Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82D, 35–45 (1960)

    Article  MathSciNet  Google Scholar 

  20. Kong, L., Huang, D., Wang, Y.: Long-term action dependence-based hierarchical deep association for multi-athlete tracking in sports videos. IEEE Trans. Image Process. 29, 7957–7969 (2020)

    Article  MATH  Google Scholar 

  21. Kong, L., Zhu, M., Ran, N., Liu, Q., He, R.: Online multiple athlete tracking with pose-based long-term temporal dependencies. Sensors 21(1), 197 (2021)

    Article  Google Scholar 

  22. Leal-Taixé, L., Milan, A., Reid, I.D., Roth, S., Schindler, K.: MOTChallenge 2015: towards a benchmark for multi-target tracking. arxiv abs/1504.01942 (2015)

    Google Scholar 

  23. Lee, H., Kim, I., Kim, D.: VAN: versatile affinity network for end-to-end online multi-object tracking. In: Ishikawa, H., Liu, C.-L., Pajdla, T., Shi, J. (eds.) ACCV 2020. LNCS, vol. 12623, pp. 576–593. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69532-3_35

    Chapter  Google Scholar 

  24. Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. In: International Conference on Computer Vision, pp. 2999–3007 (2017)

    Google Scholar 

  25. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019)

    Google Scholar 

  26. Lu, Z., Rathod, V., Votel, R., Huang, J.: RetinaTrack: online single stage joint detection and tracking. In: Computer Vision and Pattern Recognition, pp. 14656–14666 (2020)

    Google Scholar 

  27. Luiten, J., et al.: HOTA: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vision 129, 548–578 (2020)

    Article  Google Scholar 

  28. Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: TrackFormer: multi-object tracking with transformers. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2022

    Google Scholar 

  29. Milan, A., Leal-Taixé, L., Reid, I.D., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. arxiv abs/1603.00831 (2016)

    Google Scholar 

  30. Niu, Z., Gao, X., Tian, Q.: Tactic analysis based on real-world ball trajectory in soccer video. Pattern Recogn. 45(5), 1937–1947 (2012)

    Article  Google Scholar 

  31. Pang, J., et al.: Quasi-dense similarity learning for multiple object tracking. In: Computer Vision and Pattern Recognition, pp. 164–173 (2021)

    Google Scholar 

  32. Peng, J., et al.: Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 145–161. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_9

    Chapter  Google Scholar 

  33. Redmon, J., Farhadi, A.: YoloV3: an incremental improvement. arxiv abs/1804.02767 (2018)

    Google Scholar 

  34. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Conference and Workshop on Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  35. Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I.D., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Computer Vision and Pattern Recognition, pp. 658–666 (2019)

    Google Scholar 

  36. Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: Computer Vision and Pattern Recognition, pp. 2443–2451 (2020)

    Google Scholar 

  37. Sun, P., et al.: DanceTrack: multi-object tracking in uniform appearance and diverse motion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  38. Sun, P., et al.: TransTrack: multiple-object tracking with transformer. arxiv abs/2012.15460 (2020)

    Google Scholar 

  39. Tang, P., Wang, C., Wang, X., Liu, W., Zeng, W., Wang, J.: Object detection in videos by high quality object linking. IEEE Trans. Pattern Anal. Mach. Intell. 42(5), 1272–1278 (2020)

    Article  Google Scholar 

  40. Vaswani, A., et al.: Attention is all you need. In: Conference and Workshop on Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  41. Voigtlaender, P., et al.: MOTS: multi-object tracking and segmentation. In: Computer Vision and Pattern Recognition, pp. 7942–7951 (2019)

    Google Scholar 

  42. Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 107–122. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_7

    Chapter  Google Scholar 

  43. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: International Conference on Image Processing, pp. 3645–3649 (2017)

    Google Scholar 

  44. Xu, J., Cao, Y., Zhang, Z., Hu, H.: Spatial-temporal relation networks for multi-object tracking. In: International Conference on Computer Vision, pp. 3987–3997 (2019)

    Google Scholar 

  45. Xu, N., et al.: YouTube-VOS: sequence-to-sequence video object segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 603–619. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_36

    Chapter  Google Scholar 

  46. Yaw, H.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2(1–2), 83–97 (1955)

    Google Scholar 

  47. Yu, F., Li, W., Li, Q., Liu, Yu., Shi, X., Yan, J.: POI: multiple object tracking with high performance detection and appearance feature. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 36–42. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_3

    Chapter  Google Scholar 

  48. Yu, F., et al.: BDD100K: a diverse driving dataset for heterogeneous multitask learning. In: Computer Vision and Pattern Recognition, pp. 2633–2642 (2020)

    Google Scholar 

  49. Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., Wei, Y.: MOTR: end-to-end multiple-object tracking with transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) European Conference on Computer Vision, ECCV 2022. LNCS, vol. 13687. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_38

  50. Zhang, Y., et al.: ByteTrack: multi-object tracking by associating every detection box (2022)

    Google Scholar 

  51. Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 129(11), 3069–3087 (2021)

    Article  Google Scholar 

  52. Zhang, Z., Cheng, D., Zhu, X., Lin, S., Dai, J.: Integrated object detection and tracking with tracklet-conditioned detection. arxiv abs/1811.11167 (2018)

    Google Scholar 

  53. Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28

    Chapter  Google Scholar 

  54. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arxiv abs/1904.07850 (2019)

    Google Scholar 

  55. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (2021)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant U20B2069 and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingjie Liu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 32684 KB)

Supplementary material 3 (mp4 32452 KB)

Supplementary material 1 (pdf 556 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, R., Fu, Z., Liu, Q., Wang, Y., Chen, X. (2023). D\(^{{3}}\): Duplicate Detection Decontaminator for Multi-Athlete Tracking in Sports Videos. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13847. Springer, Cham. https://doi.org/10.1007/978-3-031-26293-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26293-7_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26292-0

  • Online ISBN: 978-3-031-26293-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics