Abstract
Today, various AI generation tools are emerging in succession. And the majority of existing tools are predominantly model-centric in design, resulting in steep learning curves and high usability thresholds for users. Moreover, current user interfaces lack built-in image editing capabilities, forcing users to rely on external software even for basic image editing tasks. Considering that most image generation is an iterative process, this limitation significantly hampers user experience and creative potential. Instead, this paper proposes a novel canvas-centric design that seamlessly integrates editing functionalities into the UI called CanFuUI, streamlining secondary image processing. Users can crop, modify, and annotation of specific regions of generated images within the same canvas in CanFuUI. Furthermore, canvas content is utilized as preprocessed images, directly integrated into the ControlNet preprocessing procedure, reinforcing the customization capabilities of AI-generated outputs.
Q. Hu and Z. Xu—Contributed equally to this work.
Many thanks to Mr. Yihui Shen for his generous funding support.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Stable diffusion ComfyUI. https://github.com/comfyanonymous/ComfyUI. Accessed 07 June 2023
Stable diffusion WebUI. https://github.com/db0/stable-diffusion-webui. Accessed 07 June 2023
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 8780–8794. Curran Associates, Inc. (2021)
Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 27. Curran Associates, Inc. (2014)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851. Curran Associates, Inc. (2020)
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. OpenReview.net (2022). https://openreview.net/forum?id=nZeVKeeFYf9
Jo, J., Lee, S., Hwang, S.J.: Score-based generative modeling of graphs via the system of stochastic differential equations. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. (eds.) Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 10362–10383. PMLR (2022)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8107–8116 (2020). https://doi.org/10.1109/CVPR42600.2020.00813
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2021)
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, Lille, France, vol. 37, pp. 2256–2265. PMLR (2015)
Song, Y., Durkan, C., Murray, I., Ermon, S.: Maximum likelihood training of score-based diffusion models. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 1415–1428. Curran Associates, Inc. (2021)
Zhang, L., Agrawala, M.: Adding conditional control to text-to-image diffusion models (2023)
Acknowledgements
We would like to thank the Zhejiang Provincial Blended First Class Online and Offline Course “Three-dimensional Character Design” (No. Z202Y22513), the Ministry of Education’s Industry School Cooperation Collaborative Education Project “Research on PTA-Based Programming Training and Evaluation Model ” (No. 202101151011) as well as the 17th batch Educational Reform Projects of Communication University of Zhejiang: “Cultivation and Practice of Computational Thinking in the Age of AI” for the generous funding support of the work referred to in this paper.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hu, Q. et al. (2024). CanFuUI: A Canvas-Centric Web User Interface for Iterative Image Generation with Diffusion Models and ControlNet. In: Zhao, F., Miao, D. (eds) AI-generated Content. AIGC 2023. Communications in Computer and Information Science, vol 1946. Springer, Singapore. https://doi.org/10.1007/978-981-99-7587-7_11
Download citation
DOI: https://doi.org/10.1007/978-981-99-7587-7_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7586-0
Online ISBN: 978-981-99-7587-7
eBook Packages: Computer ScienceComputer Science (R0)