Today Teresa Huang (JHU) re-started our conversations about adversarial attacks against popular machine-learning methods in astrophysics. We started this project (ages ago, now) thinking about test-time attacks: You have a trained model, how does it fail you at test time? But since then, we have learned a huge amount about training-time attacks: If you add a tiny change to your training data, can you make a huge change to your model? I think some machine-learning methods popular in astronomy are going to be very susceptible to both kinds of attacks!
When we discussed these ideas in the before times, one of the objections was that adversarial attacks are artificial and meaningless. I don't agree: If a model can be easily attacked, it is not robust. If you get a strange and interesting result in a scientific investigation when you are using such a model, how do you know you didn't just get accidentally pwned by your noise draw? Since—in the natural sciences—we are trying to learn how the world works, we can't be putting in model components or pipeline components that are capable of leading us very seriously astray.
No comments:
Post a Comment