Thiết kế website giá rẻ

Question

I am trying to do a machine translation from Hindi to Sanskrit using NLLB model. But I get the below warning, and the training does not progress:

UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all '

The warning is coming when training the pretrained NLLB model `facebook/nllb-200-1.3B
The input data is ~40k Hindi sentences.

Detailed warning:

<code>/home//.conda/envs/dict/lib/python3.8/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning

warnings.warn(

0%| | 0/4968 [00:10<?, ?it/s]

/home//.conda/envs/dict/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.

warnings.warn('Was asked to gather along dimension 0, but all '

0%| | 1/9936 [00:03<9:49:55, 3.56s/it]

</code>

<code>/home//.conda/envs/dict/lib/python3.8/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( 0%| | 0/4968 [00:10<?, ?it/s] /home//.conda/envs/dict/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all ' 0%| | 1/9936 [00:03<9:49:55, 3.56s/it] </code>

/home//.conda/envs/dict/lib/python3.8/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|                                                                                                                                        | 0/4968 [00:10<?, ?it/s]
/home//.conda/envs/dict/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
  warnings.warn('Was asked to gather along dimension 0, but all '

  0%|                                                                                                                              | 1/9936 [00:03<9:49:55,  3.56s/it]

If you notice above, the training does not progress after 0% and just stays the same. The terminal gets hanged and Ctrl-C does not work.

The code of the preprocessing done for the data:

<code>def preprocess_function(examples):

inputs = [example + ' </s>' + f' <2{s_lang}>' for example in examples[source_lang]]

targets = [f'<2{t_lang}> ' + example + ' </s>' for example in examples[target_lang]]

model_inputs = tokenizer.batch_encode_plus(inputs, max_length=max_input_length, truncation=True,padding='max_length')

with tokenizer.as_target_tokenizer():

labels = tokenizer.batch_encode_plus(targets, max_length=max_input_length, truncation=True,padding='max_length')

model_inputs['labels'] = labels['input_ids']

return model_inputs

</code>

<code>def preprocess_function(examples): inputs = [example + ' </s>' + f' <2{s_lang}>' for example in examples[source_lang]] targets = [f'<2{t_lang}> ' + example + ' </s>' for example in examples[target_lang]] model_inputs = tokenizer.batch_encode_plus(inputs, max_length=max_input_length, truncation=True,padding='max_length') with tokenizer.as_target_tokenizer(): labels = tokenizer.batch_encode_plus(targets, max_length=max_input_length, truncation=True,padding='max_length') model_inputs['labels'] = labels['input_ids'] return model_inputs </code>

def preprocess_function(examples):

        inputs = [example + ' </s>' + f' <2{s_lang}>' for example in examples[source_lang]]

        targets = [f'<2{t_lang}> ' + example + ' </s>' for example in examples[target_lang]]
        model_inputs = tokenizer.batch_encode_plus(inputs, max_length=max_input_length, truncation=True,padding='max_length')

        with tokenizer.as_target_tokenizer():
            labels = tokenizer.batch_encode_plus(targets, max_length=max_input_length, truncation=True,padding='max_length')

        model_inputs['labels'] = labels['input_ids']
        
        return model_inputs

Data after preprocessing and tokenisation:

<code>DatasetDict({

train: Dataset({

features: ['input_ids', 'attention_mask', 'labels'],

num_rows: 39729

})

val: Dataset({

features: ['input_ids', 'attention_mask', 'labels'],

num_rows: 2210

})

test: Dataset({

features: ['input_ids', 'attention_mask', 'labels'],

num_rows: 2214

})

</code>

<code>DatasetDict({ train: Dataset({ features: ['input_ids', 'attention_mask', 'labels'], num_rows: 39729 }) val: Dataset({ features: ['input_ids', 'attention_mask', 'labels'], num_rows: 2210 }) test: Dataset({ features: ['input_ids', 'attention_mask', 'labels'], num_rows: 2214 }) }) </code>

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 39729
    })
    val: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 2210
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 2214
    })
})

The code of model params and training:

<code>training_args = Seq2SeqTrainingArguments(

evaluation_strategy="epoch",

save_strategy='epoch',

learning_rate=2e-5,

auto_find_batch_size=True,

output_dir="./output_dir",

weight_decay=0.01,

save_total_limit=1,

num_train_epochs=4,

predict_with_generate=True,

fp16=False,

push_to_hub=False,

remove_unused_columns = False,

)

data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

trainer = Seq2SeqTrainer(

model=model,

tokenizer=tokenizer,

args=training_args,

train_dataset=dataset['train'],

data_collator=data_collator,

compute_metrics=compute_metrics,

)

print("nStarting trainingn")

# torch.cuda.empty_cache()

print(trainer.train())

</code>

<code>training_args = Seq2SeqTrainingArguments( evaluation_strategy="epoch", save_strategy='epoch', learning_rate=2e-5, auto_find_batch_size=True, output_dir="./output_dir", weight_decay=0.01, save_total_limit=1, num_train_epochs=4, predict_with_generate=True, fp16=False, push_to_hub=False, remove_unused_columns = False, ) data_collator = DataCollatorForSeq2Seq(tokenizer, model=model) trainer = Seq2SeqTrainer( model=model, tokenizer=tokenizer, args=training_args, train_dataset=dataset['train'], data_collator=data_collator, compute_metrics=compute_metrics, ) print("nStarting trainingn") # torch.cuda.empty_cache() print(trainer.train()) </code>

training_args = Seq2SeqTrainingArguments(
    evaluation_strategy="epoch",
    save_strategy='epoch',
    learning_rate=2e-5,
    auto_find_batch_size=True,
    output_dir="./output_dir",
    weight_decay=0.01,
    save_total_limit=1,
    num_train_epochs=4,
    predict_with_generate=True,
    fp16=False,
    push_to_hub=False,
    remove_unused_columns = False,
)
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

trainer = Seq2SeqTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=dataset['train'],
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)
print("nStarting trainingn")

# torch.cuda.empty_cache()
print(trainer.train())

Any idea why this warning is coming and why the training isn’t happening?

Thiết kế website giá rẻ

Danh mục

Seq2seq trainer doesn’t execute after UserWarning about dimension