`def make_adversarial_attack(X, target_y, model, max_iter=100, verbose=True):
X_adv = X.clone()
X_adv = X_adv.requires_grad_()
learning_rate = 1
target = torch.tensor([target_y], dtype=torch.long, device=X.device)
for i in range(max_iter):
scores = model(X_adv)
targe_score = scores[:, target_y]
val, idx = scores.max(dim=1)
if scores.data.max(1)[1][0].item() == target_y:
break
loss = F.cross_entropy(scores, target)
# print loss
print('Iteration %d, Loss: %.3f, target score %.3f, max score %.3f' % (i, loss.item(), targe_score, val))
loss.backward()
dx = learning_rate * X_adv.grad.data / X_adv.grad.data.norm()
# print(dx)
X_adv.data -= dx.data
X_adv.grad.data.zero_()
return X_adv`
In a simple implementation of adversarial attacks, my understanding is that during normal training, we use the loss function for evaluation and apply gradient descent to minimize the loss, thereby correctly classifying the input. In the context of adversarial attacks, a target class target_y is given, and the goal is to find a perturbed input X_adv that is incorrectly classified as target_y. My understanding was that, similar to normal training, we should use gradient descent to lower the loss and increase the score. However, the materials I’ve read and GPT both suggest using gradient ascent instead. I’m finding this very confusing, especially since using gradient descent seems to work while gradient ascent does not. Could anyone explain why this is the case? thank u a lot:)
1