I am currently fine-tuning BERT using the bert-base-uncased model from Hugging Face. I have set the learning rate for the classification head to 5e-3 and for the other parameters to 5e-5. However, my training loss looks very strange, as shown in the attached figure. At the same time, my validation loss has started to increase. Why is this happening? I have tried using warm-up and various schedulers, but they don’t seem to have much effect. I urgently need help.[enter image description here](https://i.sstatic.net/2FSu3ZM6.png)
param_optimizer = list(model.named_parameters()) no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight'] optimizer_parameters = [ {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay) and 'classifier' not in n], 'lr': args.learning_rate, 'weight_decay': 1e-3}, {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay) and 'classifier' in n], 'lr': 5e-3, 'weight_decay': 1e-3}, {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay) and 'classifier' not in n], 'lr': args.learning_rate, 'weight_decay': 0.0}, {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay) and 'classifier' in n], 'lr': 5e-3, 'weight_decay': 0.0} ] base_lr, max_lr = 3e-5, 5e-5 optimizer = optim.AdamW(optimizer_parameters, lr=args.learning_rate, eps=1e-8) # epsilon # best:5e-5 total_steps = len(train_loader) * args.num_epochs lr_scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps = total_steps*0.1, num_training_steps = total_steps)
叶伟欣 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.