I’m working through Iztik Ben-Gan book T-SQL Querying.
I have some trouble understanding why making a filter less restrictive will make the optimzer move away from a parallel plan to a serial one.
The first query goes for a parallel plan, but the second goes for a serial one.
SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROm dbo.Orders
WHERE orderid <=30000
SELECT [orderid], [custid], [empid], [shipperid], [orderdate], [filler]
FROm dbo.Orders
WHERE orderid <=490000
Both queries use this index in for a Clustered index scans :
CREATE CLUSTERED INDEX [idx_cl_od] ON [dbo].[Orders]
(
[orderdate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
The table has 1000000 rows with evenly distributed orderid’s.
Anyone who knows the reason why?
Paralel plan with high rows numbers.
Sveinung Tyssedal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
3
The serial execution plan will just be a clustered index scan with a filter predicate.
The parallel execution plan will be the clustered index scan with filter predicate running in parallel and a gather streams operator.
The operator cost of the gather streams operator depends on how many rows it is estimated to have to deal with.
The estimated cost of the clustered index scan in a serial vs parallel plan will have its CPU cost scaled down by the expected DOP and its IO cost left untouched.
The result of this is that there will be a tipping point where the serial plan is costed cheaper – i.e. the scale down in CPU costs for the plan operators running in parallel is outweighed by the additional cost of the extra parallelism operator(s).
For example if at DOP 1
the estimated I/O cost for the scan operator is 50
and the estimated CPU cost is 10
then the overall operator cost is 60
.
For a parallel plan at estimated available DOP of 4 the operator cost will be 52.5
(50 + (10/4)
)
So the parallel plan will be costed as cheaper as long as the cost of the gather streams operator is <= 7.5
in this example. And the more selective the predicate the lower the estimated cost of this operator will be.
You can use OPTION (MAXDOP 1)
to force a serial plan and OPTION(USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'))
to get a parallel one to see the execution plans and compare the costs at different estimated selectivities.