I’m training a GPT2LMHeadModel
in Python using huggingface’s transformers
library. The task is next token prediction. If I understand correctly, if this object is provided a labels
argument, it should automatically compute loss for each pair of tokens, i.e. if I have input [1,2,3]
, then the object should compute the loss for each of the (input, next_token) combinations: ([1], [2])
and ([1,2], [3])
My dataset includes weights for each sample. Is there a way for me to include these weights in the training process? Note that these are not class weights, but weights for each individual record in my train set. In a more “vanilla” training scenario with PyTroch, I would just set reduction='none'
on my criterion, then multiple the result by the batch’s weights. I’m not sure how I would implement this efficiently, accounting for all the prediction tasks that go into a single sequence.