Thiết kế website giá rẻ

Question

So I’m working with really big tensors from pytorch, and as a result of a certain operation i need my tensor to mantain really big values(wich represent some indexes), but obviously overflow makes all numbers in the tensor assume the value -9223372036854775808.
This is the repo https://github.com/SamuelMastrelli/neural-astar. When i try to launch train_maps.py, the outputs clearly indicates me some overflow, as the variable with the negativa number i printed has to be a loc for another tensor:

scripts/train_maps.py:21: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path="config", config_name="train_maps")
torch.Size([1, 1, 300, 300]) torch.Size([1, 1, 300, 300]) torch.Size([1, 1, 300, 300]) torch.Size([1, 1, 300, 300])
tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]]])
tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]]])
tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]]])
tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]]])
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Missing logger folder: model/maps/lightning_logs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name          | Type         | Params
-----------------------------------------------
0 | planner       | NeuralAstar  | 391 K 
1 | vanilla_astar | VanillaAstar | 9     
-----------------------------------------------
391 K     Trainable params
18        Non-trainable params
391 K     Total params
1.566     Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Sanity Checking DataLoader 0:   0%|                                                               | 0/2 [00:00<?, ?it/s]/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
tensor([-9223372036854775808], device='cuda:0')
torch.Size([1, 90000])
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1098, in _run
    results = self._run_stage()
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1177, in _run_stage
    self._run_train()
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1190, in _run_train
    self._run_sanity_check()
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1262, in _run_sanity_check
    val_loop.run()
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 152, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 137, in advance
    output = self._evaluation_step(**kwargs)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 234, in _evaluation_step
    output = self.trainer._call_strategy_hook(hook_name, *kwargs.values())
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1480, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 390, in validation_step
    return self.model.validation_step(*args, **kwargs)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/neural_astar/utils/training.py", line 68, in validation_step
    outputs = self.forward(map_designs, start_maps, goal_maps)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/neural_astar/utils/training.py", line 53, in forward
    return self.planner(map_designs, start_maps, goal_maps)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/neural_astar/planner/astar.py", line 207, in forward
    return self.perform_astar(
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/neural_astar/planner/astar.py", line 63, in perform_astar
    astar_outputs = astar(
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/neural_astar/planner/differentiable_astar.py", line 260, in forward
    path_maps = backtrack(start_maps, goal_maps, parents, t)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/neural_astar/planner/differentiable_astar.py", line 128, in backtrack
    loc = parents[range(num_samples), loc]
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "scripts/train_maps.py", line 63, in main
    trainer.fit(module, train_loader, val_loader)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 603, in fit
    call._call_and_handle_interrupt(
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 63, in _call_and_handle_interrupt
    trainer._teardown()
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1161, in _teardown
    self.strategy.teardown()
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 496, in teardown
    self.lightning_module.cpu()
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/lightning_lite/utilities/device_dtype_mixin.py", line 78, in cpu
    return super().cpu()
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 738, in cpu
    return self._apply(lambda t: t.cpu())
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 602, in _apply
    param_applied = fn(param)
  File "/home/mastrelli/neural-astar/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 738, in <lambda>
    return self._apply(lambda t: t.cpu())
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Thiết kế website giá rẻ

Danh mục

How can i avoid python overflow in pytorch tensors?