I have been trying to reproduce the results of this repo-
https://github.com/sefcom/VarBERT/tree/main
I was able to train the BERT model for MLM objective. But in the Constrained Masked Language Model training I’m consistently facing an error –
`/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/transformers/optimization.py:429: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
05/14/2024 06:58:30 - INFO - __main__ - ***** Running training *****
05/14/2024 06:58:30 - INFO - __main__ - Num examples = 4509495
05/14/2024 06:58:30 - INFO - __main__ - Num Epochs = 30
05/14/2024 06:58:30 - INFO - __main__ - Instantaneous batch size per GPU = 32
05/14/2024 06:58:30 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 32
05/14/2024 06:58:30 - INFO - __main__ - Gradient Accumulation steps = 1
05/14/2024 06:58:30 - INFO - __main__ - Total optimization steps = 4227660
05/14/2024 06:58:30 - INFO - __main__ - Starting fine-tuning.
Epoch: 0%| | 0/30 [00:00<?, ?it/s/opt/conda/conda-bld/pytorch_1711403408687/work/aten/src/ATen/native/cuda/Loss.cu:250: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [26,0,0] Assertion `t >= 0 && t < n_classes` failed.t/s]
Iteration: 0%| | 0/140922 [00:01<?, ?it/s]
Epoch: 0%| | 0/30 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/hprakash/VarBERT/varbert/cmlm/training.py", line 944, in <module>
main()
File "/home/hprakash/VarBERT/varbert/cmlm/training.py", line 892, in main
global_step, tr_loss = train(args, train_dataset, model, tokenizer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hprakash/VarBERT/varbert/cmlm/training.py", line 488, in train
outputs = model(inputs,labels=labels)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hprakash/VarBERT/varbert/cmlm/training.py", line 139, in forward
masked_lm_loss = loss_fct(prediction_scores.view(-1, vocab_size), labels.view(-1))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/torch/nn/modules/loss.py", line 1179, in forward
return F.cross_entropy(input, target, weight=self.weight,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hprakash/.conda/envs/HP/lib/python3.11/site-packages/torch/nn/functional.py", line 3059, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.`
I found some articles online where it was said this is caused due to discrepancies between the model’s expected vocabulary size and the size defined in tokenizers config. I made changes accordingly but the error still persists. I need to figure this issue out to be able to further train the model.
user25031131 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.