CanFuUI: A Canvas-Centric Web User Interface for Iterative Image Generation with Diffusion Models and ControlNet

Hu, Qihan; Xu, Zhenghui; Du, Peng; Zeng, Hao; Ma, Tongqing; Zhao, Youbing; Xie, Hao; Zhang, Peng; Liu, Shuting; Zang, Tongnian; Wang, Xuemei

doi:10.1007/978-981-99-7587-7_11

Qihan Hu⁷,
Zhenghui Xu⁷,
Peng Du⁸,
Hao Zeng⁷,
Tongqing Ma⁷,
Youbing Zhao ORCID: orcid.org/0000-0003-3677-6583^7,9,
Hao Xie⁷,
Peng Zhang¹⁰,
Shuting Liu¹⁰,
Tongnian Zang¹¹ &
…
Xuemei Wang⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1946))

Included in the following conference series:

International Conference on AI-generated Content

902 Accesses

Abstract

Today, various AI generation tools are emerging in succession. And the majority of existing tools are predominantly model-centric in design, resulting in steep learning curves and high usability thresholds for users. Moreover, current user interfaces lack built-in image editing capabilities, forcing users to rely on external software even for basic image editing tasks. Considering that most image generation is an iterative process, this limitation significantly hampers user experience and creative potential. Instead, this paper proposes a novel canvas-centric design that seamlessly integrates editing functionalities into the UI called CanFuUI, streamlining secondary image processing. Users can crop, modify, and annotation of specific regions of generated images within the same canvas in CanFuUI. Furthermore, canvas content is utilized as preprocessed images, directly integrated into the ControlNet preprocessing procedure, reinforcing the customization capabilities of AI-generated outputs.

Q. Hu and Z. Xu—Contributed equally to this work.

Many thanks to Mr. Yihui Shen for his generous funding support.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Stable diffusion ComfyUI. https://github.com/comfyanonymous/ComfyUI. Accessed 07 June 2023
Stable diffusion WebUI. https://github.com/db0/stable-diffusion-webui. Accessed 07 June 2023
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794. Curran Associates, Inc. (2021)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 27. Curran Associates, Inc. (2014)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851. Curran Associates, Inc. (2020)
Google Scholar
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. OpenReview.net (2022). https://openreview.net/forum?id=nZeVKeeFYf9
Jo, J., Lee, S., Hwang, S.J.: Score-based generative modeling of graphs via the system of stochastic differential equations. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. (eds.) Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 10362–10383. PMLR (2022)
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8107–8116 (2020). https://doi.org/10.1109/CVPR42600.2020.00813
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2021)
Google Scholar
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, Lille, France, vol. 37, pp. 2256–2265. PMLR (2015)
Google Scholar
Song, Y., Durkan, C., Murray, I., Ermon, S.: Maximum likelihood training of score-based diffusion models. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 1415–1428. Curran Associates, Inc. (2021)
Google Scholar
Zhang, L., Agrawala, M.: Adding conditional control to text-to-image diffusion models (2023)
Google Scholar

Download references

Acknowledgements

We would like to thank the Zhejiang Provincial Blended First Class Online and Offline Course “Three-dimensional Character Design” (No. Z202Y22513), the Ministry of Education’s Industry School Cooperation Collaborative Education Project “Research on PTA-Based Programming Training and Evaluation Model ” (No. 202101151011) as well as the 17th batch Educational Reform Projects of Communication University of Zhejiang: “Cultivation and Practice of Computational Thinking in the Age of AI” for the generous funding support of the work referred to in this paper.

Author information

Authors and Affiliations

Communication University of Zhejiang, Hangzhou, 310018, China
Qihan Hu, Zhenghui Xu, Hao Zeng, Tongqing Ma, Youbing Zhao, Hao Xie & Xuemei Wang
Uber Technologies Inc., 1725 3rd Street, San Francisco, CA, 94158, USA
Peng Du
University of Bedfordshire, Luton, LU1 3JU, UK
Youbing Zhao
Jiangsu Dongyin Intelligent Engineering Technology Research Institute, Nanjing, 211111, China
Peng Zhang & Shuting Liu
Jiangsu CRRC Digital Technology Co. Ltd., Nanjing, 210000, China
Tongnian Zang

Authors

Qihan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenghui Xu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Du
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Tongqing Ma
View author publications
You can also search for this author in PubMed Google Scholar
Youbing Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Hao Xie
View author publications
You can also search for this author in PubMed Google Scholar
Peng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shuting Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tongnian Zang
View author publications
You can also search for this author in PubMed Google Scholar
Xuemei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hao Zeng or Tongqing Ma .

Editor information

Editors and Affiliations

University of Science and Technology of China, Hefei, China
Feng Zhao
Tongji University, Shanghai, China
Duoqian Miao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, Q. et al. (2024). CanFuUI: A Canvas-Centric Web User Interface for Iterative Image Generation with Diffusion Models and ControlNet. In: Zhao, F., Miao, D. (eds) AI-generated Content. AIGC 2023. Communications in Computer and Information Science, vol 1946. Springer, Singapore. https://doi.org/10.1007/978-981-99-7587-7_11

Download citation

DOI: https://doi.org/10.1007/978-981-99-7587-7_11
Published: 02 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7586-0
Online ISBN: 978-981-99-7587-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CanFuUI: A Canvas-Centric Web User Interface for Iterative Image Generation with Diffusion Models and ControlNet