Skip to content

Commit

Permalink
Various clarifications in evasion attacks
Browse files Browse the repository at this point in the history
  • Loading branch information
robvanderveer authored Nov 26, 2024
1 parent 61e3603 commit 6485906
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion content/ai_exchange/content/docs/2_threats_through_use.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,10 @@ Evasion: an attacker fools the model by crafting input to mislead it into perfor

Impact: Integrity of model behaviour is affected, leading to issues from unwanted model output (e.g. failing fraud detection, decisions leading to safety issues, reputation damage, liability).

A typical attacker goal with Evasion is to slightly change a certain input (say an image, or a text) to pass a certain test that normally would not be passed. Such small changes (perturbations) lead to a large (and false) modification of its outputs. The modified inputs are often called *adversarial examples*. Evasion attacks can also be categorized into physical (e.g. changing the real world to influence for example a camera image) and digital (e.g. changing a digital image).
A typical attacker goal with Evasion is to find out how to slightly change a certain input (say an image, or a text) to fool the model. The advantage of slight change is that it is harder to detect by humans or by an automated detection of unusal input, and it is typically easier to perform (e.g. slightly change an email message by adding a word so it still sends the same message, but it fools the model in for example deciding it is not a phishing message).
Such small changes (call 'perturbations') lead to a large (and false) modification of its outputs. The modified inputs are often called *adversarial examples*.

Evasion attacks can be categorized into physical (e.g. changing the real world to influence for example a camera image) and digital (e.g. changing a digital image). Furthermore, they can be categorized in either untargeted (any wrong output) and targeted (a specific wrong output). Note that Evastion of a binary classifier (i.e. yes/no) belongs to both categories.

Example 1: slightly changing traffic signs so that self-driving cars may be fooled.
![](/images/inputphysical.png)
Expand All @@ -96,6 +99,8 @@ See [MITRE ATLAS - Evade ML model](https://atlas.mitre.org/techniques/AML.T0015)

**Controls for evasion:**

An Evasion attack typically consists of first searching for the inputs that mislead the model, and then applying it. That initial search can be very intensive, as it requires trying many variations of input. Therefore, limiting access to the model with for example Rate limiting mitigates the risk, but still leaves the possibility of using a so-called transfer attack (see [Closed box evasion](/goto/closedboxevasion/) to search for the inputs in another, similar, model.

- See [General controls](/goto/generalcontrols/), especially [Limiting the effect of unwanted behaviour](/goto/limitunwanted/)
- See [controls for threats through use](/goto/threatsuse/)
- The below control(s), each marked with a # and a short name in capitals
Expand Down

0 comments on commit 6485906

Please sign in to comment.