Skip to main content

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15111))

Included in the following conference series:

  • 242 Accesses

Abstract

Text-to-image models encounter safety issues, including concerns related to copyright and Not-Safe-For-Work (NSFW) content. Despite several methods have been proposed for erasing inappropriate concepts from diffusion models, they often exhibit incomplete erasure, consume a lot of computing resources, and inadvertently damage generation ability. In this work, we introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 s without necessitating additional fine-tuning. Specifically, RECE efficiently leverages a closed-form solution to derive new target embeddings, which are capable of regenerating erased concepts within the unlearned model. To mitigate inappropriate content potentially represented by derived embeddings, RECE further aligns them with harmless concepts in cross-attention layers. The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts. Besides, to preserve the model’s generation ability, RECE introduces an additional regularization term during the derivation process, resulting in minimizing the impact on unrelated concepts during the erasure process. All the processes above are in closed-form, guaranteeing extremely efficient erasure in only 3 s. Benchmarking against previous approaches, our method achieves more efficient and thorough erasure with minor damage to original generation ability and demonstrates enhanced robustness against red-teaming tools. Code is available at https://github.com/CharlesGong12/RECE.

WARNING: This paper contains model outputs that may be offensive.

C. Gong and K. Chen—Equal contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/nupurkmr9/concept-ablation.

  2. 2.

    https://github.com/ml-research/i2p.

References

  1. Chen, K., Wei, Z., Chen, J., Wu, Z., Jiang, Y.G.: Attacking video recognition models with bullet-screen comments. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 312–320 (2022)

    Google Scholar 

  2. Chen, K., Wei, Z., Chen, J., Wu, Z., Jiang, Y.G.: GCMA: generative cross-modal transferable adversarial attacks from images to videos. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 698–708 (2023)

    Google Scholar 

  3. Chin, Z.Y., Jiang, C.M., Huang, C.C., Chen, P.Y., Chiu, W.C.: Prompting4debugging: red-teaming text-to-image diffusion models by finding problematic prompts. arXiv preprint arXiv:2309.06135 (2023)

  4. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. Adv. Neural. Inf. Process. Syst. 34, 8780–8794 (2021)

    Google Scholar 

  5. Doersch, C.: Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908 (2016)

  6. Gandikota, R., Materzynska, J., Fiotto-Kaufman, J., Bau, D.: Erasing concepts from diffusion models. arXiv preprint arXiv:2303.07345 (2023)

  7. Gandikota, R., Orgad, H., Belinkov, Y., Materzyńska, J., Bau, D.: Unified concept editing in diffusion models. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5111–5120 (2024)

    Google Scholar 

  8. Heng, A., Soh, H.: Selective amnesia: a continual learning approach to forgetting in deep generative models. In: Advances in Neural Information Processing Systems, vol. 36 (2024)

    Google Scholar 

  9. Hessel, J., Holtzman, A., Forbes, M., Bras, R.L., Choi, Y.: Clipscore: a reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718 (2021)

  10. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)

    Google Scholar 

  11. Hunter, T.: AI porn is easy to make now. For women, that’s a nightmare (2023)

    Google Scholar 

  12. Kawar, B., et al.: Imagic: text-based real image editing with diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6007–6017 (2023)

    Google Scholar 

  13. Kumari, N., Zhang, B., Wang, S.Y., Shechtman, E., Zhang, R., Zhu, J.Y.: Ablating concepts in text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22691–22702 (2023)

    Google Scholar 

  14. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  15. Nichol, A.Q., et al.: Glide: towards photorealistic image generation and editing with text-guided diffusion models. In: International Conference on Machine Learning, pp. 16784–16804. PMLR (2022)

    Google Scholar 

  16. Orgad, H., Kawar, B., Belinkov, Y.: Editing implicit assumptions in text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7053–7061 (2023)

    Google Scholar 

  17. Parmar, G., Zhang, R., Zhu, J.Y.: On aliased resizing and surprising subtleties in GAN evaluation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11410–11420 (2022)

    Google Scholar 

  18. Praneeth, B.: Nudenet: neural nets for nudity classification, detection and selective censoring (2019)

    Google Scholar 

  19. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  20. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, vol. 1, no. 2, p. 3 (2022)

  21. Rando, J., Paleka, D., Lindner, D., Heim, L., Tramèr, F.: Red-teaming the stable diffusion safety filter. arXiv preprint arXiv:2210.04610 (2022)

  22. Rombach, R.: Stable diffusion 2.0 release (2022)

    Google Scholar 

  23. Rombach, R.: Tutorial: how to remove the safety filter in 5 seconds (2022)

    Google Scholar 

  24. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

    Google Scholar 

  25. Rombach, R., Esser, P.: Stable diffusion v1-4 model card. Model Card (2022)

    Google Scholar 

  26. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  27. Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: fine tuning text-to-image diffusion models for subject-driven generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22500–22510 (2023)

    Google Scholar 

  28. Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural. Inf. Process. Syst. 35, 36479–36494 (2022)

    Google Scholar 

  29. Schramowski, P., Brack, M., Deiseroth, B., Kersting, K.: Safe latent diffusion: mitigating inappropriate degeneration in diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22522–22531 (2023)

    Google Scholar 

  30. Setty, R.: AI art generators hit with copyright suit over artists’ images (2023)

    Google Scholar 

  31. StabilityAI: Stable diffusion 2.1 model card. Model Card (2022)

    Google Scholar 

  32. Tsai, Y.L., et al.: Ring-a-bell! how reliable are concept removal methods for diffusion models? arXiv preprint arXiv:2310.10012 (2023)

  33. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  34. Wei, Z., et al.: Towards transferable adversarial attacks on image and video transformers. IEEE Trans. Image Process. 32, 6346–6358 (2023)

    Article  Google Scholar 

  35. Wei, Z., Chen, J., Wu, Z., Jiang, Y.G.: Adaptive cross-modal transferable adversarial attacks from images to videos. IEEE Trans. Pattern Anal. Mach. Intell. (2023)

    Google Scholar 

  36. Yang, L., et al.: Diffusion models: a comprehensive survey of methods and applications. ACM Comput. Surv. 56(4), 1–39 (2023)

    Article  Google Scholar 

  37. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)

    Google Scholar 

  38. Zhang, Y., et al.: To generate or not? Safety-driven unlearned diffusion models are still easy to generate unsafe images... for now. arXiv preprint arXiv:2310.11868 (2023)

Download references

Acknowledgements

This project was supported by National Key R&D Program of China (No. 2021ZD0112804).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingjing Chen .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6146 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gong, C., Chen, K., Wei, Z., Chen, J., Jiang, YG. (2025). Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15111. Springer, Cham. https://doi.org/10.1007/978-3-031-73668-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73668-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73667-4

  • Online ISBN: 978-3-031-73668-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics