2019-11-29

my definition of an adversarial attack

Based on conversations with Soledad Villar, Teresa Huang, Zach Martin, Greg Scanlon, and Eva Wang (all NYU), I worked today on establishing criteria for a successful adversarial attack against a regression in the natural sciences (like astronomy). The idea is you add a small, irrelevant amount u to your data x and it changes the labels y by an unexpectedly large amount. Or, to be more specific:

  • The L2 norm (u.u) of the vector u should be equal to a small number Q
  • The vector u should be orthogonal to your expectation v of the gradient of the function dy/dx
  • The change in the inferred labels at x+u relative to x should be much larger than you would get for the same-length move in the v direction!
The first criterion is that the change is small. The second is that it is irrelevant. The third is that it produces a big change in the regression's output. One issue is that you can only execute this when you have v, or an expectation for dy/dx independent of your regression model. That's true in some contexts (like spectroscopic parameter estimation) but not others.

No comments:

Post a Comment