Skip to content

Projected Gradient Descent (PGD), inverted and amplified -> prompt & generate images with CLIP

Notifications You must be signed in to change notification settings

zer0int/CLIP-generative-adversarial

Repository files navigation

CLIP-generative-adversarial

Who needs a Diffusion U-Net or Transformer if you can just use the 'text encoder'...? 😉

examples

  • Uses Projected Gradient Descent (PGD), which is commonly used in adversarial robustness training
  • Inverts objective and amplifies perturbations for human perception
  • Perturbation towards (!) the prompt by default, making CLIP a self-generative AI 🙃
  • Also plots results 'success' (original vs. perturbated cosine similarity)
  • You can change the default behavior to "classic" adversarial example generation
  • See code comments for details and instructions

  • If you ever wondered why some strange word in a prompt for e.g. Stable Diffusion works - now you can find out!
  • For Stable Diffusion V1, the sole text encoder "guide" is CLIP ViT-L/14 (the model set by default in my code).
  • The diff between this and feature activation max visualization: We're using the whole model (output) to guide towards a text prompt.
  • To visualize indivual 'neurons' (features) in CLIP ViT, see my other repo: zer0int/CLIP-ViT-visualization
  • To get a CLIP opinion of what CLIP 'thinks' of an image, see: zer0int/CLIP-XAI-GUI

Requires: torch, torchvision, numpy, PIL, matplotlib, OpenCV, (skimage). Requires OpenAI/CLIP.

About

Projected Gradient Descent (PGD), inverted and amplified -> prompt & generate images with CLIP

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages