In the book https://www.gnu.org/software/parallel/parallel_tutorial.html#number-of-simultaneous-jobs
/usr/bin/time parallel -N0 sleep 1 :::: num128
(same number of job as cores)/usr/bin/time parallel -N0 --jobs 200% sleep 1 :::: num128
(2 jobs per core)/usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: num128
(as many jobs as possible)
I assume the book is refering to logical cores, not physical cores.
My machine is 13th Gen Intel i7-13700HX (16 Cores, 24 Logical Processors).
- Option 1 used 24 jobs and took 6.333 seconds
- Option 2 used 48 jobs and took 3.249 seconds
- Option 3 used 128 jobs and took 1.421 seconds
Question
How does option 2 put 2 jobs on 1 core and complete both in 1 second? (total input requires 3 iterations of job allocation and sleep)?
Shouldn’t the second job’s sleep be waiting for the first job’s sleep to complete, so each of 3 iterations would take 2 seconds for a total of 6 seconds?
Is parallel running both sleeps in the background on 1 core concurrently so their timing is overlapped?
What’s going on generally when you specify multiple jobs on 1 core using parallel?
Does it do multithreading or start all those jobs on the same logical core as background processes?
I wonder if this example of multiple jobs per core only works because multiple sleep timings can overlap when they are started in background, and if i changed to another command that cannot run concurrently, it would be pointless to start multiple jobs per core like options 2 and 3.
In option 3, what constrains the upper limit to “as many jobs in parallel as possible”?
Based on these 3 examples it makes me think i should always choose option 3 as the fastest. When is this not true anymore?
I tried pushing limits by increasing input size 2x using /usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: <(cat num128 num128)
and still it got things done within 1 iteration of 1.83 seconds.
Summary
I don’t understand the work allocation models of option 2 and 3, why option 2 is faster than i expected, why option 3 timing seems unaffected by how much more jobs over logical cores there are, and unaffected by increased input size, and whether which lessons are only specific to this sleep
command, and so may not be generalized
If you want to count iterations
Tweak above example commands by wrapping sleep with echo and observe printing cycles
'echo start job number {#} job slot {%};sleep 1;echo finish job number {#} job slot {%}'