W1206 18:53:57.353000 18802 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 18835 closing signal SIGTERM
W1206 18:53:57.354000 18802 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 18836 closing signal SIGTERM
W1206 18:53:57.354000 18802 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 18837 closing signal SIGTERM
W1206 18:53:57.354000 18802 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 18839 closing signal SIGTERM
W1206 18:53:57.354000 18802 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 18840 closing signal SIGTERM
W1206 18:53:57.354000 18802 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 18841 closing signal SIGTERM
W1206 18:53:57.355000 18802 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 18842 closing signal SIGTERM
E1206 18:53:57.387000 18802 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 3 (pid: 18838) of binary: /home/chacha/anaconda3/envs/drive/bin/python
Traceback (most recent call last):
File "/home/chacha/anaconda3/envs/drive/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/chacha/anaconda3/envs/drive/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/chacha/anaconda3/envs/drive/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1159, in launch_command
multi_gpu_launcher(args)
File "/home/chacha/anaconda3/envs/drive/lib/python3.10/site-packages/accelerate/commands/launch.py", line 793, in multi_gpu_launcher
distrib_run.run(args)
File "/home/chacha/anaconda3/envs/drive/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/home/chacha/anaconda3/envs/drive/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/chacha/anaconda3/envs/drive/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/home/drive/DriveDreamer-main/DriveDreamer-main/dreamer-train/dreamer_train/distributed/run_task.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-12-06_18:53:57
host : DESKTOP-EJP3C2O.localdomain
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 18838)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Traceback (most recent call last):
File "/home/drive/DriveDreamer-main/DriveDreamer-main/./dreamer-train/projects/launch.py", line 35, in <module>
main()
File "/home/drive/DriveDreamer-main/DriveDreamer-main/./dreamer-train/projects/launch.py", line 32, in main
launch_from_config(config_path, ','.join(opts.runners))
File "/home/drive/DriveDreamer-main/DriveDreamer-main/dreamer-train/dreamer_train/distributed/launch.py", line 175, in launch_from_config
launcher.launch('{} --config {} --runners {}'.format(file_path, config_path, runners))
File "/home/drive/DriveDreamer-main/DriveDreamer-main/dreamer-train/dreamer_train/distributed/launch.py", line 159, in launch
os.remove(self.hostfile_path)
FileNotFoundError: [Errno 2] No such file or directory: '_tmp/2024-12-06-185351_hostfile'
The code is from github: https://github.com/JeffWang987/DriveDreamer
When the code running, this code will create a config file and a hostfile file with timestamps, but the hostfile can’t create correctly.
New contributor
user28658056 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2