I need to ask a question that have been bugging me for some time now:
If I have a single core and one OS thread, this thread will get 100% of the CPU time and all is good.
If I have a single core and two or more OS threads, they will share the CPU time using time slices.
So, is the time slices always the same amout of time no matter how many threads?
What I’m trying to get at, is the amout of work the CPU can do the same when I have two threads or 10000 threads?
I’m well aware that each individual thread will progress slower since they share a resource, but the actual amount of work the CPU can do, will it be the same?
e.g.
[T1 ] [T2 ] [T1 ] [T2 ] [T1 ] [T2 ] [T1 ] [T2 ] [T1 ] [T2 ]
[T1 ] [T2 ] [T3 ] [T4 ] [T5 ] [T6 ] [T1 ] [T2 ] [T3 ] [T4 ]
----time-------------------------------------------------->
img. 1
In the illustration above, there are 2 vs 6 threads, but the total amount of work would be the same.
Is this true?
Or are there something else that affects this when there are more threads, that cause each slice to be smaller or the context switch between the threads to be longer?
e.g.
[T1 ] [T2 ] [T1 ] [T2 ] [T1 ] [T2 ] [T1 ] [T2 ] [T1 ] [T2 ]
[T1 ] [T2 ] [T3 ] [T4 ] [T5 ] [T6 ] [T1 ]
----time-------------------------------------------------->
img. 2
I’m not asking if it is a good idea to use 1000 threads…
[Edit]
Trying to clarify what I’m trying to understand:
Given x amount of time, e.g. 1 minute.
And given that the code does not use locks or any other thread interrupting code.
If I have two threads, there will be y%
time spent on context switching.
If I have 1000 threads, will y
be a greater number?, or will it be the same as in the previous case?
1
If your quantum (the time each thread gets to run) is the same, then you’ll have the same number of context switches regardless of the number of threads (assuming no threads are blocked). The amount of time each thread gets will be affected, but the context switching overhead will be constant. This is in an idea world; obviously, in the real world, you’ll have locks and resource contention, so many processes will be waiting on I/O or otherwise not ready, so there will be more context switching overhead, and it’s possible that the amount of time spent switching can exceed the amount of time spent actually running user code. The amount of extra time is entirely dependent on your workload, which is why it’s not possible to generally predict the amount of overhead.
Context switching itself imposes an overhead on the system, because the OS has to perform book-keeping on which threads exist, which are running, which are ready, which have already received their fair share, etc. In a well-designed OS this overhead is not great, and usually it is not dependent on the number of threads (up to some hard maximum), but it is definitely there.
Everything else is system-dependent. For instance, a scheduler may decrease the nominal length of each slice depending on how many many candidates there are, or it may not. As always, a reasonable answer to decide what to do in your case can only come from running careful performance tests on your system.
1
Even with a single threaded app running on a single core machine you’re going to get context switches as the OS will want to run multiple applications, similar with 2 apps running on a dual core machine, you’ll still get switches. Its just that having many threads running mean you get more context switches.
On Windows the scheduled time slices (or quanta) is a fixed 10ms (IIRC). A thread can give up part of its slice by sleeping, which is why you occasionally see sleep(0) calls in code. Its a way to say “I’m only going to do nothing for a bit, someone else can have a go before they’re scheduled”.
Another thing that can cause a context switch is locking. If you lock on a waitable object, you’re giving up the rest of your slice – no point waiting for the object to be signaled by a different thread if that thread hasn’t had a chance to run after all!
This is the main cause of excessive switching – too much locking.
When a context switch happens, the CPU has to store away the state of the registers and load the registers that were stored for the other thread, possibly it will have to flush a cache to get different memory in that the other thread wants to work with. This takes time, and although its “CPU time” its wasted housekeeping, ie its not spent making your program do what its supposed to do.
For example, I once worked on an enterprise program that spent more time switching than doing useful work. Once we solved the locking bug, performance improved dramatically – suddenly the CPU coudl perform useful work rather than simply spin between threads.
4