Thiết kế website giá rẻ

Question

I have been trying to reproduce the results of this repo-
https://github.com/sefcom/VarBERT/tree/main

I was able to train the BERT model for MLM objective. But in the Constrained Masked Language Model training I’m consistently facing an error –

`/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/transformers/optimization.py:429: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
05/14/2024 06:58:30 - INFO - __main__ -   ***** Running training *****
05/14/2024 06:58:30 - INFO - __main__ -     Num examples = 4509495
05/14/2024 06:58:30 - INFO - __main__ -     Num Epochs = 30
05/14/2024 06:58:30 - INFO - __main__ -     Instantaneous batch size per GPU = 32
05/14/2024 06:58:30 - INFO - __main__ -     Total train batch size (w. parallel, distributed & accumulation) = 32
05/14/2024 06:58:30 - INFO - __main__ -     Gradient Accumulation steps = 1
05/14/2024 06:58:30 - INFO - __main__ -     Total optimization steps = 4227660
05/14/2024 06:58:30 - INFO - __main__ -     Starting fine-tuning.
Epoch:   0%|                                                                                                                                                                           | 0/30 [00:00<?, ?it/s/opt/conda/conda-bld/pytorch_1711403408687/work/aten/src/ATen/native/cuda/Loss.cu:250: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [26,0,0] Assertion `t >= 0 && t < n_classes` failed.t/s]
Iteration:   0%|                                                                                                                                                                   | 0/140922 [00:01<?, ?it/s]
Epoch:   0%|                                                                                                                                                                           | 0/30 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/home/hprakash/VarBERT/varbert/cmlm/training.py", line 944, in <module>
    main()
  File "/home/hprakash/VarBERT/varbert/cmlm/training.py", line 892, in main
    global_step, tr_loss = train(args, train_dataset, model, tokenizer)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hprakash/VarBERT/varbert/cmlm/training.py", line 488, in train
    outputs = model(inputs,labels=labels)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hprakash/VarBERT/varbert/cmlm/training.py", line 139, in forward
    masked_lm_loss = loss_fct(prediction_scores.view(-1, vocab_size), labels.view(-1))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/torch/nn/modules/loss.py", line 1179, in forward
    return F.cross_entropy(input, target, weight=self.weight,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/torch/nn/functional.py", line 3059, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.`

I found some articles online where it was said this is caused due to discrepancies between the model’s expected vocabulary size and the size defined in tokenizers config. I made changes accordingly but the error still persists. I need to figure this issue out to be able to further train the model.

Danh mục