I’ve been trying to use GPT2 to calculate the loss of a given sentence. In principle I could this in two ways, one is to pass the labels to the model, another is from the logits.
Taking CrossEntropyLoss as the loss function (https://github.com/huggingface/transformers/blob/391db836ab7ed2ca61c51a7cf1b135b6ab92be58/transformers/modeling_gpt2.py#L539) I can calculate this using the same labels.
However, when I do this I find an inconsistent result. For example:
<code>from transformers import AutoModelForCausalLM, AutoTokenizer
import torch.nn.functional as F
gmodel = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2", truncation_side='right', pad_side='right')
tokenizer.pad_token = tokenizer.eos_token
test1 = 'I am the first test string'
test2 = 'I am the second test string'
tokens1 = tokenizer(test1, max_length=1024, padding=True, truncation=True, return_tensors="pt")
tokens2 = tokenizer(test2, max_length=1024, padding=True, truncation=True, return_tensors="pt")
t1 = gmodel(**tokens1, labels=tokens1["input_ids"])
t2 = gmodel(**tokens2, labels=tokens2["input_ids"])
print(F.cross_entropy(t1.logits.view(-1, tokenizer.vocab_size), tokens1['input_ids'].view(-1), ignore_index=tokenizer.eos_token_id), F.cross_entropy(t2.logits.view(-1, tokenizer.vocab_size), tokens2['input_ids'].view(-1), ignore_index=tokenizer.eos_token_id))
<code>from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import torch.nn.functional as F
gmodel = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2", truncation_side='right', pad_side='right')
tokenizer.pad_token = tokenizer.eos_token
test1 = 'I am the first test string'
test2 = 'I am the second test string'
with torch.no_grad():
tokens1 = tokenizer(test1, max_length=1024, padding=True, truncation=True, return_tensors="pt")
tokens2 = tokenizer(test2, max_length=1024, padding=True, truncation=True, return_tensors="pt")
t1 = gmodel(**tokens1, labels=tokens1["input_ids"])
t2 = gmodel(**tokens2, labels=tokens2["input_ids"])
print(t1.loss, t2.loss)
print(F.cross_entropy(t1.logits.view(-1, tokenizer.vocab_size), tokens1['input_ids'].view(-1), ignore_index=tokenizer.eos_token_id), F.cross_entropy(t2.logits.view(-1, tokenizer.vocab_size), tokens2['input_ids'].view(-1), ignore_index=tokenizer.eos_token_id))
</code>
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import torch.nn.functional as F
gmodel = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2", truncation_side='right', pad_side='right')
tokenizer.pad_token = tokenizer.eos_token
test1 = 'I am the first test string'
test2 = 'I am the second test string'
with torch.no_grad():
tokens1 = tokenizer(test1, max_length=1024, padding=True, truncation=True, return_tensors="pt")
tokens2 = tokenizer(test2, max_length=1024, padding=True, truncation=True, return_tensors="pt")
t1 = gmodel(**tokens1, labels=tokens1["input_ids"])
t2 = gmodel(**tokens2, labels=tokens2["input_ids"])
print(t1.loss, t2.loss)
print(F.cross_entropy(t1.logits.view(-1, tokenizer.vocab_size), tokens1['input_ids'].view(-1), ignore_index=tokenizer.eos_token_id), F.cross_entropy(t2.logits.view(-1, tokenizer.vocab_size), tokens2['input_ids'].view(-1), ignore_index=tokenizer.eos_token_id))
Results in an output of
<code>tensor(6.9905) tensor(7.3378)
tensor(7.5244) tensor(7.3836)
<code>tensor(6.9905) tensor(7.3378)
tensor(7.5244) tensor(7.3836)
</code>
tensor(6.9905) tensor(7.3378)
tensor(7.5244) tensor(7.3836)
Which is not only inconsistent, which may be due to the model using a different loss function, however their order shifts in that the loss of the first string is the smaller one in the first instance but larger in the second.