Lately I’m trying to fine-tune a T5-based model and compare the performance when using Seq2SeqTrainer
of HuggingFace and only using Pytorch class.
With my pure Pytorch approach, at first I only use Adam optimizer and set lr = 2e-5. And performance reach 0.8 accuracy on a small test set. Then I’ll take a look at parameter and example of HuggingFace. I tried to use AdamW instead and combine with scheduler_learning_rate
I also use clip_grad_norm
from torch.optim import AdamW
k_lr = 5e-5
k_adam_eps = 1e-8
from transformers import get_linear_schedule_with_warmup
k_warmup_steps = 0.1*30*len(train_dataloader)
total_steps = 30*len(train_dataloader)
optimizer = AdamW(model.parameters(), lr=k_lr, eps=k_adam_eps)
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=k_warmup_steps,
num_training_steps=total_steps)
...
input_ids = batch['input_ids'].to(device)
input_attention_mask = batch['input_attention_mask'].to(device)
labels = batch['labels'].to(device)
labels_attention_mask = batch['labels_attention_mask'].to(device)
loss, logits = model(input_ids=input_ids,
input_attention_mask=input_attention_mask,
output_ids=labels,
output_attention_mask=labels_attention_mask)
train_loss.append(loss.item())
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
scheduler.step()
And the performance increased to 0.891 but still lower than training by Seq2SeqTrainer, it reach 0.91 (just one more correct sample). I mean it can be approximate but when I observe loss and changing of learning rate, it’s still different in loss. Like the loss of first batch of pure Pytorch I got 21.3 but using Trainer I got 42.
One more thing. Training using Trainer of HuggingFace makes it really stable in performance, like once it get to local optima it nearly doesn’t change in performance if continue training (like when it reach 0.91 in test set, it still get that performance to later epochs), when I train with pure Pytorch, It’s quite unstable and only reach 0.89 like 2 or 3 time in 30 epochs.
Like I don’t like to use Hugging Face Trainer, It’s not easy to read as well as hard to modify like Pytorch Lightning or custom class using pure Pytorch
tjns is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.