Skip to main content
Log in

Scene text recognition via dual character counting-aware visual and semantic modeling network

  • Letter
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Conclusion

In this work, we study character counting in STR from a new viewpoint, giving a principled framework showing that the counting information is involved in both visual decoding and semantic decoding. Based on the principled framework, we propose a novel scene text recognizer with a dual character counting-aware visual and semantic modeling network, where the counting information is fused in both vision and language branches. Experimental results demonstrate the effectiveness of our model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Jiang H, Xu Y L, Zhan Z, et al. Reciprocal feature learning via explicit and implicit tasks in scene text recognition. In: Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021. 287–303

  2. Xie Z, Huang Y, Zhu Y, et al. Aggregation cross-entropy for sequence recognition. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 6531–6540

  3. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. 6000–6010

  4. Yue X Y, Kuang Z H, Lin C H, et al. RobustScanner: dynamically enhancing positional clues for robust text recognition. In: Proceedings of European Conference on Computer Vision, 2020. 135–151

  5. Zhang B, Haddow B, Sennrich R. Revisiting end-to-end speech-to-text translation from scratch. In: Proceedings of International Conference on Machine Learning, 2022. 26193–26205

Download references

Acknowledgements

This work was supported by Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (Grant No. 202200049).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Zhu.

Additional information

Supporting information Appendixes A–C. The supporting information is available online at info.scichina.com and link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.

Supplementary File

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, K., Zhu, A., Iwana, B.K. et al. Scene text recognition via dual character counting-aware visual and semantic modeling network. Sci. China Inf. Sci. 67, 139101 (2024). https://doi.org/10.1007/s11432-023-3935-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-023-3935-8

Navigation