Adversarial AI

#NotHotDog

Slides

Literature

Blog posts:

Fast gradient sign method: https://jaketae.github.io/study/fgsm/
Towards a general theory of “adversarial examples,” the bizarre, hallucinatory motes in machine learning’s all-seeing eye https://boingboing.net/2019/03/08/hot-dog-or-not.html
Adversarial attacks: How to trick computer vision https://hackernoon.com/adversarial-attacks-how-to-trick-computer-vision-7484c4e85dc0 ### Scientific articles:
Explaining and harnessing adversarial examples https://arxiv.org/pdf/1412.6572.pdf
Intriguing properties of neural networks https://arxiv.org/pdf/1312.6199.pdf
Prompt Injection introduction: https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/

Exercise: Generating adversarial examples & LLM Prompt Injection

Use cleverhans: https://github.com/cleverhans-lab/cleverhans. (Python/ML) Try out the MNIST tutorial with torch! Afterwards, experiment with different image classification datasets, different models, or different attacks (other than fast gradient sign method). How does the epsilon value in FGSM affect the adversarial process? Setup tip: Do PIP install for torch, torchvision and cleverhans if buggy.
Try out LLM prompt injection attacks with: https://gandalf.lakera.ai/ (Web) (less techical than above, how many levels can you solve? What kind of prompts worked?)

Target nets:

Dataset ressources:

GANs resources

https://machinelearningmastery.com/resources-for-getting-started-with-generative-adversarial-networks/

More interesting links

Attacking LLMs tutorial

https://gandalf.lakera.ai/