Abstract
To improve the adaptability of detectors, most existing domain adaptation algorithms adopt adversarial learning to align feature distributions between source and target datasets. Different from previous methods, this work explores the possibility of transferring detectors with only source domain data and style information of the target domain. Specifically, we propose three consistency regularizations to enhance the adaptation performance of the detector. First, the source domain and the synthetic domain share the same image content, and the supervision regularization fully exploits the source annotations, which narrows the domain gap and saves labeling costs. Second, prediction regularization improves the robustness of the detector to category semantics and location awareness in different domains. Third, self-discovering feature regularization projects the detector’s attention to object-related regions, which are more discriminative than background noise. In addition, our method can cooperate with the classic domain adaptation algorithm to further improve the generalization of the detector, which shows that both the content and style information of target domain images are crucial for the transfer process. Extensive experiments have been conducted on multiple detection benchmarks, including Foggy Cityscapes, Sim10k, KITTI, Clipart, and Watercolor datasets. The favorable performance compared with existing state-of-the-art methods confirms the effectiveness of the proposed consistency regularizations.
Similar content being viewed by others
Data Availability
The Cityscapes and Foggy Cityscapes datasets analyzed during the current study are available in the https://www.cityscapes-dataset.com/. The KITTI dataset analyzed during the current study is available in the http://www.cvlibs.net/datasets/kitti/index.php. The Sim10k dataset analyzed during the current study is available in the https://fcav.engin.umich.edu/projects/driving-in-the-matrix. The PASCAL VOC datasets analyzed during the current study are available in the http://host.robots.ox.ac.uk/pascal/VOC/. The Clipart and Watercolor datasets analyzed during the current study are available in the http://www.hal.t.u-tokyo.ac.jp/~inoue/projects/cross_domain_detection/datasets/.
References
Yuan D, Chang X, Li Z, He Z (2022) Learning adaptive spatial-temporal context-aware correlation filters for UAV tracking. ACM Trans Multimed Comput Commun Appl 18(3):1–18
Balid W, Tafish H, Refai HH (2017) Intelligent vehicle counting and classification sensor for real-time traffic surveillance. IEEE Trans Intell Transp Syst 19(6):1784–1794
Hasenjäger M, Heckmann M, Wersing H (2019) A survey of personalization for advanced driver assistance systems. IEEE Trans Intell Veh 5(2):335–344
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp 3213–3223
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: CVPR, pp 3354–3361
Johnson-Roberson M, Barto C, Mehta R, Sridhar SN, Rosaen K, Vasudevan R (2016) Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? arXiv preprint arXiv:1610.01983
Torralba A, Efros AA (2011) Unbiased look at dataset bias. In: CVPR, pp 1521–1528
Sakaridis C, Dai D, Van Gool L (2018) Semantic foggy scene understanding with synthetic data. Int J Comput Vis 126(9):973–992
Yuan D, Shu X, Liu Q, Zhang X, He Z (2022) Robust thermal infrared tracking via an adaptively multi-feature fusion model. Neural Comput Appl, 1–12
Xu C-D, Zhao X-R, Jin X, Wei X-S (2020) Exploring categorical regularization for domain adaptive object detection. In: CVPR, pp 11724–11733
Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: ICCV, pp 9627–9636
Patel VM, Gopalan R, Li R, Chellappa R (2015) Visual domain adaptation: a survey of recent advances. IEEE Signal Process Mag 32(3):53–69
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):2096
Long M, Zhu H, Wang J, Jordan MI (2016) Unsupervised domain adaptation with residual transfer networks. NeurIPS 29
Long M, Zhu H, Wang J, Jordan MI (2017) Deep transfer learning with joint adaptation networks. In: ICML, pp 2208–2217
Zhao S, Yue X, Zhang S, Li B, Zhao H, Wu B, Krishna R, Gonzalez JE, Sangiovanni-Vincentelli AL, Seshia SA et al (2020) A review of single-source deep unsupervised visual domain adaptation. IEEE Trans Neural Netw Learn Syst 33(2):473–493
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: NeurIPS, pp 91–99
Uijlings J, Sande KE, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: ECCV, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: CVPR, pp 779–788
Ganin Y, Lempitsky V (2015) Unsupervised domain adaptation by backpropagation. In: ICML, pp 1180–1189
Tsai Y-H, Hung W-C, Schulter S, Sohn K, Yang M-H, Chandraker M (2018) Learning to adapt structured output space for semantic segmentation. In: CVPR, pp 7472–7481
Wu M, Pan S, Zhou C, Chang X, Zhu X (2020) Unsupervised domain adaptive graph convolutional networks. In: WWW, pp 1457–1467
Chen Y, Li W, Sakaridis C, Dai D, Van Gool L (2018) Domain adaptive faster r-cnn for object detection in the wild. In: CVPR, pp 3339–3348
Saito K, Ushiku Y, Harada T, Saenko K (2019) Strong-weak distribution alignment for adaptive object detection. In: CVPR, pp 6956–6965
Zhu X, Pang J, Yang C, Shi J, Lin D (2019) Adapting object detectors via selective cross-domain alignment. In: CVPR, pp 687–696
He Z, Zhang L (2019) Multi-adversarial faster-rcnn for unrestricted object detection. In: ICCV, pp 6668–6677
Zheng Y, Huang D, Liu S, Wang Y (2020) Cross-domain object detection through coarse-to-fine feature adaptation. In: CVPR, pp 13766–13775
He Z, Zhang L (2020) Domain adaptive object detection via asymmetric tri-way faster-rcnn. In: ECCV, pp 309–324
Wu A, Liu R, Han Y, Zhu L, Yang Y (2021) Vector-decomposed disentanglement for domain-invariant object detection. In: ICCV, pp 9342–9351
Hsu H-K, Yao C-H, Tsai Y-H, Hung W-C, Tseng H-Y, Singh M, Yang M-H (2020) Progressive domain adaptation for object detection. In: WACV, pp 749–757
Zhang Y, Wang Z, Mao Y (2021) Rpn prototype alignment for domain adaptive object detector. In: CVPR, pp 12425–12434
Hsu C-C, Tsai Y-H, Lin Y-Y, Yang M-H (2020) Every pixel matters: center-aware feature alignment for domain adaptive object detector. In: ECCV, pp 733–748
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Inoue N, Furuta R, Yamasaki T, Aizawa K (2018) Cross-domain weakly-supervised object detection through progressive domain adaptation. In: CVPR, pp 5001–5009
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. NeurIPS 32:8026–8037
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Kim T, Jeong M, Kim S, Choi S, Kim C (2019) Diversify and match: a domain adaptive representation learning paradigm for object detection. In: CVPR, pp 12456–12465
Cai Q, Pan Y, Ngo C-W, Tian X, Duan L, Yao T (2019) Exploring object relation in mean teacher for cross-domain detection. In: CVPR, pp 11457–11466
Xu M, Wang H, Ni B, Tian Q, Zhang W (2020) Cross-domain detection via graph-induced prototype alignment. In: CVPR, pp 12355–12364
Chen C, Zheng Z, Ding X, Huang Y, Dou Q (2020) Harmonizing transferability and discriminability for adapting object detectors. In: CVPR, pp 8869–8878
Zhao G, Li G, Xu R, Lin L (2020) Collaborative training between region proposal localization and classification for domain adaptive object detection. In: ECCV, pp 86–102
Kim S, Choi J, Kim T, Kim C (2019) Self-training and adversarial background regularization for unsupervised domain adaptive one-stage object detection. In: ICCV, pp 6092–6101
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11)
Funding
This research was supported by the National Key Research and Development Program of China under Grant No. 2018AAA0100400, and the National Natural Science Foundation of China under Grants 91646207, 62076242, 62071466, and 61976208.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Informed consent
All authors gave informed consent.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tian, K., Zhang, C., Wang, Y. et al. Multi-level consistency regularization for domain adaptive object detection. Neural Comput & Applic 35, 18003–18018 (2023). https://doi.org/10.1007/s00521-023-08677-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08677-9