I have a time-consuming task, so I use parallel to parallelize my script.
my run.sh
:
parallel --no-notice
--verbose
--progress
-j 4
--ungroup './scripts/chain_base.sh {1} {2} {3}'
::: 150 200 ::: 100 200 500 1000 ::: default v1 < /dev/null
my chain_base.sh
like this :
TOKEN_NUM=$1
NUM_ITER=$2
INIT_METHOD=$3
# Reset the SECONDS variable
SECONDS=0
k="5"
for s in $(eval echo "{0..$((k-1))}"); do
python src/abc.py
--num_iter ${NUM_ITER}
--do_kmeans --k $k
--kmeans_split $s
--init_method ${INIT_METHOD}
--num_tokens ${TOKEN_NUM}
done
# Print the end time and execution time
echo "End time: `date`"
echo "Execution time: $SECONDS seconds."
Start using script :
nohup bash scripts/run.sh > run.out &
The current information is.
- The
chain_base.sh
can be executed successfully and will not attempt to read any content from the terminal. - The Python program takes a long time to execute, approximately 40 minutes.
- The program always encounters the error message “
sh: 1: cannot open /dev/tty: No such device or address
“ - Encountered the following printed logs:
parallel: SIGHUP received. No new jobs will be started.
parallel: Waiting for these 4 jobs to finish. Send SIGTERM to stop now.
parallel: ./scripts/chain_base.sh 150 200 default
parallel: ./scripts/chain_base.sh 150 500 default
parallel: ./scripts/chain_base.sh 150 500 v1
parallel: ./scripts/chain_base.sh 150 200 v1
-
The program can run up to 4 processes in parallel, but some processes may encounter the error “
sh: 1: cannot open /dev/tty: No such device or address
” and stop, while other processes continue until all 4 processes encounter this error. When the program completely stops, it usually executes 8 tasks, which are the variables{1}
in parallel attempting to switch. -
Add
--no-notice
toparallel
command
Result: Useless -
Add
< /dev/null
toparallel
command
Result: Useless -
Switch to
tmux
instead of running as anohup
in the background
Result: Introduced new problems, mainly Python environment issues. The terminal started bytmux
cannot inherit the source terminal environment, and theconda environment
is base, causing the run to fail.
1