Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models

Gong, Chao; Chen, Kai; Wei, Zhipeng; Chen, Jingjing; Jiang, Yu-Gang

doi:10.1007/978-3-031-73668-1_5

Chao Gong^13,14,
Kai Chen^13,14,
Zhipeng Wei^13,14,
Jingjing Chen^13,14 &
…
Yu-Gang Jiang^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15111))

Included in the following conference series:

European Conference on Computer Vision

242 Accesses

Abstract

Text-to-image models encounter safety issues, including concerns related to copyright and Not-Safe-For-Work (NSFW) content. Despite several methods have been proposed for erasing inappropriate concepts from diffusion models, they often exhibit incomplete erasure, consume a lot of computing resources, and inadvertently damage generation ability. In this work, we introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 s without necessitating additional fine-tuning. Specifically, RECE efficiently leverages a closed-form solution to derive new target embeddings, which are capable of regenerating erased concepts within the unlearned model. To mitigate inappropriate content potentially represented by derived embeddings, RECE further aligns them with harmless concepts in cross-attention layers. The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts. Besides, to preserve the model’s generation ability, RECE introduces an additional regularization term during the derivation process, resulting in minimizing the impact on unrelated concepts during the erasure process. All the processes above are in closed-form, guaranteeing extremely efficient erasure in only 3 s. Benchmarking against previous approaches, our method achieves more efficient and thorough erasure with minor damage to original generation ability and demonstrates enhanced robustness against red-teaming tools. Code is available at https://github.com/CharlesGong12/RECE.

WARNING: This paper contains model outputs that may be offensive.

C. Gong and K. Chen—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers

Unveiling and Mitigating Memorization in Text-to-Image Diffusion Models Through Cross Attention

Latent Guard: A Safety Framework for Text-to-Image Generation

Notes

References

Chen, K., Wei, Z., Chen, J., Wu, Z., Jiang, Y.G.: Attacking video recognition models with bullet-screen comments. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 312–320 (2022)
Google Scholar
Chen, K., Wei, Z., Chen, J., Wu, Z., Jiang, Y.G.: GCMA: generative cross-modal transferable adversarial attacks from images to videos. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 698–708 (2023)
Google Scholar
Chin, Z.Y., Jiang, C.M., Huang, C.C., Chen, P.Y., Chiu, W.C.: Prompting4debugging: red-teaming text-to-image diffusion models by finding problematic prompts. arXiv preprint arXiv:2309.06135 (2023)
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. Adv. Neural. Inf. Process. Syst. 34, 8780–8794 (2021)
Google Scholar
Doersch, C.: Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908 (2016)
Gandikota, R., Materzynska, J., Fiotto-Kaufman, J., Bau, D.: Erasing concepts from diffusion models. arXiv preprint arXiv:2303.07345 (2023)
Gandikota, R., Orgad, H., Belinkov, Y., Materzyńska, J., Bau, D.: Unified concept editing in diffusion models. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5111–5120 (2024)
Google Scholar
Heng, A., Soh, H.: Selective amnesia: a continual learning approach to forgetting in deep generative models. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Google Scholar
Hessel, J., Holtzman, A., Forbes, M., Bras, R.L., Choi, Y.: Clipscore: a reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718 (2021)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)
Google Scholar
Hunter, T.: AI porn is easy to make now. For women, that’s a nightmare (2023)
Google Scholar
Kawar, B., et al.: Imagic: text-based real image editing with diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6007–6017 (2023)
Google Scholar
Kumari, N., Zhang, B., Wang, S.Y., Shechtman, E., Zhang, R., Zhu, J.Y.: Ablating concepts in text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22691–22702 (2023)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Nichol, A.Q., et al.: Glide: towards photorealistic image generation and editing with text-guided diffusion models. In: International Conference on Machine Learning, pp. 16784–16804. PMLR (2022)
Google Scholar
Orgad, H., Kawar, B., Belinkov, Y.: Editing implicit assumptions in text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7053–7061 (2023)
Google Scholar
Parmar, G., Zhang, R., Zhu, J.Y.: On aliased resizing and surprising subtleties in GAN evaluation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11410–11420 (2022)
Google Scholar
Praneeth, B.: Nudenet: neural nets for nudity classification, detection and selective censoring (2019)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, vol. 1, no. 2, p. 3 (2022)
Rando, J., Paleka, D., Lindner, D., Heim, L., Tramèr, F.: Red-teaming the stable diffusion safety filter. arXiv preprint arXiv:2210.04610 (2022)
Rombach, R.: Stable diffusion 2.0 release (2022)
Google Scholar
Rombach, R.: Tutorial: how to remove the safety filter in 5 seconds (2022)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Google Scholar
Rombach, R., Esser, P.: Stable diffusion v1-4 model card. Model Card (2022)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: fine tuning text-to-image diffusion models for subject-driven generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22500–22510 (2023)
Google Scholar
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural. Inf. Process. Syst. 35, 36479–36494 (2022)
Google Scholar
Schramowski, P., Brack, M., Deiseroth, B., Kersting, K.: Safe latent diffusion: mitigating inappropriate degeneration in diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22522–22531 (2023)
Google Scholar
Setty, R.: AI art generators hit with copyright suit over artists’ images (2023)
Google Scholar
StabilityAI: Stable diffusion 2.1 model card. Model Card (2022)
Google Scholar
Tsai, Y.L., et al.: Ring-a-bell! how reliable are concept removal methods for diffusion models? arXiv preprint arXiv:2310.10012 (2023)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wei, Z., et al.: Towards transferable adversarial attacks on image and video transformers. IEEE Trans. Image Process. 32, 6346–6358 (2023)
Article Google Scholar
Wei, Z., Chen, J., Wu, Z., Jiang, Y.G.: Adaptive cross-modal transferable adversarial attacks from images to videos. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
Google Scholar
Yang, L., et al.: Diffusion models: a comprehensive survey of methods and applications. ACM Comput. Surv. 56(4), 1–39 (2023)
Article Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Google Scholar
Zhang, Y., et al.: To generate or not? Safety-driven unlearned diffusion models are still easy to generate unsafe images... for now. arXiv preprint arXiv:2310.11868 (2023)

Download references

Acknowledgements

This project was supported by National Key R&D Program of China (No. 2021ZD0112804).

Author information

Authors and Affiliations

Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University, Shanghai, China
Chao Gong, Kai Chen, Zhipeng Wei, Jingjing Chen & Yu-Gang Jiang
Shanghai Collaborative Innovation Center on Intelligent Visual Computing, Shanghai, China
Chao Gong, Kai Chen, Zhipeng Wei, Jingjing Chen & Yu-Gang Jiang

Authors

Chao Gong
View author publications
You can also search for this author in PubMed Google Scholar
Kai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhipeng Wei
View author publications
You can also search for this author in PubMed Google Scholar
Jingjing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Gang Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingjing Chen .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6146 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gong, C., Chen, K., Wei, Z., Chen, J., Jiang, YG. (2025). Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15111. Springer, Cham. https://doi.org/10.1007/978-3-031-73668-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-73668-1_5
Published: 01 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73667-4
Online ISBN: 978-3-031-73668-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models