I want to create some aggregations of table. But when I look at execution plan I find out that there is massive row increase in sort (from 5 121 to 1 453 235 569). Why this is happend? What can cause this increase?
More info:
Original table has this schema:
CREATE TABLE event (
deviceid varchar NOT NULL,
"time" timestamptz NOT NULL,
signal_quality float8 NULL,
packet_loss float8 NULL,
ping float8 NULL,
jitter float8 NULL,
download float8 NULL,
upload float8 NULL,
latency float8 NULL,
profinet float8 NULL
);
Next table has data about start tests:
CREATE TABLE start_test (
deviceid varchar NOT NULL,
"time" timestamptz NOT NULL
);
I want to agregate data for each test. For one test there will be more rows, each row contains one type of aggregation – for example one row for minimum, one row for median,…
Here is subset of aggregation script:
WITH event_cycle AS (
SELECT
e.*,
sc.time as start_time,
sc.end_time as end_time
FROM (
SELECT
*
FROM event
) e
LEFT JOIN (
SELECT *, COALESCE(LEAD(time) OVER(PARTITION BY deviceid ORDER BY time), NOW()) as end_time
FROM start_test
) sc ON sc.time <= e.time AND e.time < sc.end_time AND e.deviceid = sc.deviceid
), bounds AS (
SELECT
start_time start_time_b,
min(end_time) end_time_b,
deviceid deviceid_b,
--low
PERCENTILE_DISC(0.005) WITHIN GROUP(ORDER BY signal_quality) as signal_quality_low,
--high
PERCENTILE_DISC(0.995) WITHIN GROUP(ORDER BY signal_quality) as signal_quality_high
FROM event_cycle
GROUP BY start_time, deviceid
), data_with_boundaries AS (
SELECT *
FROM event_cycle ec
LEFT JOIN bounds b ON b.start_time_b <= ec.time AND ec.time < b.end_time_b AND ec.deviceid = b.deviceid_b
)
SELECT *
FROM (
SELECT
min(time) as start_time,
max(time) as end_time,
deviceid,
'min' as agg_type,
min(signal_quality) as signal_quality
FROM event_cycle
GROUP BY start_time, deviceid
) min
UNION ALL
SELECT *
FROM (
SELECT
min(time) as start_time,
max(time) as end_time,
deviceid,
'median' as agg_type,
PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY signal_quality) as signal_quality
FROM event_cycle
GROUP BY start_time, deviceid
) median
UNION ALL
SELECT *
FROM (
SELECT
min(time) as start_time,
max(time) as end_time,
deviceid,
'ipr_avg' as agg_type,
avg(CASE WHEN signal_quality
BETWEEN signal_quality_low
AND signal_quality_high
THEN signal_quality
ELSE NULL
END) as signal_quality
FROM data_with_boundaries
GROUP BY start_time, deviceid
) ipr_avg
This is whole query plan:
Append (cost=65349092.63..2218756400.84 rows=120000 width=88) (actual time=108568.862..305159.053 rows=15363 loops=1)
CTE event_cycle
-> Hash Left Join (cost=652.57..23608823.05 rows=161244967 width=99) (actual time=11.336..107377.365 rows=855037 loops=1)
Hash Cond: ((event.deviceid)::text = (sc.deviceid)::text)
Join Filter: ((sc."time" <= event."time") AND (event."time" < sc.end_time))
Rows Removed by Join Filter: 1451937372
-> Seq Scan on event (cost=0.00..17802.37 rows=855037 width=83) (actual time=0.095..338.366 rows=855037 loops=1)
-> Hash (cost=588.92..588.92 rows=5092 width=27) (actual time=11.190..11.193 rows=5118 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 359kB
-> Subquery Scan on sc (cost=423.43..588.92 rows=5092 width=27) (actual time=8.076..10.517 rows=5118 loops=1)
-> WindowAgg (cost=423.43..538.00 rows=5092 width=27) (actual time=8.075..10.152 rows=5118 loops=1)
-> Sort (cost=423.43..436.16 rows=5092 width=19) (actual time=8.060..8.260 rows=5118 loops=1)
Sort Key: start_test.deviceid, start_test."time"
Sort Method: quicksort Memory: 592kB
-> Seq Scan on start_test (cost=0.00..109.92 rows=5092 width=19) (actual time=0.017..0.557 rows=5118 loops=1)
-> Subquery Scan on min (cost=41740269.58..44159744.08 rows=40000 width=88) (actual time=108568.860..108914.711 rows=5121 loops=1)
-> GroupAggregate (cost=41740269.58..44159344.08 rows=40000 width=96) (actual time=108568.860..108914.282 rows=5121 loops=1)
Group Key: event_cycle.start_time, event_cycle.deviceid
-> Sort (cost=41740269.58..42143382.00 rows=161244967 width=56) (actual time=108568.845..108751.590 rows=855037 loops=1)
Sort Key: event_cycle.start_time, event_cycle.deviceid
Sort Method: external merge Disk: 38608kB
-> CTE Scan on event_cycle (cost=0.00..3224899.34 rows=161244967 width=56) (actual time=11.338..108096.087 rows=855037 loops=1)
-> Subquery Scan on median (cost=41740269.58..44159844.08 rows=40000 width=88) (actual time=812.829..1215.500 rows=5121 loops=1)
-> GroupAggregate (cost=41740269.58..44159444.08 rows=40000 width=96) (actual time=812.828..1214.812 rows=5121 loops=1)
Group Key: event_cycle_1.start_time, event_cycle_1.deviceid
-> Sort (cost=41740269.58..42143382.00 rows=161244967 width=56) (actual time=812.797..1009.458 rows=855037 loops=1)
Sort Key: event_cycle_1.start_time, event_cycle_1.deviceid
Sort Method: external merge Disk: 38608kB
-> CTE Scan on event_cycle event_cycle_1 (cost=0.00..3224899.34 rows=161244967 width=56) (actual time=0.058..229.673 rows=855037 loops=1)
-> Subquery Scan on ipr_avg (cost=2035162059.84..2106827389.62 rows=40000 width=88) (actual time=194608.278..195027.597 rows=5121 loops=1)
-> GroupAggregate (cost=2035162059.84..2106826989.62 rows=40000 width=96) (actual time=194608.276..195027.074 rows=5121 loops=1)
Group Key: ec.start_time, ec.deviceid
-> Sort (cost=2035162059.84..2044120113.56 rows=3583221489 width=72) (actual time=194608.252..194833.440 rows=855037 loops=1)
Sort Key: ec.start_time, ec.deviceid
Sort Method: external merge Disk: 45552kB
-> Merge Left Join (cost=85903271.20..731689364.04 rows=3583221489 width=72) (actual time=1673.883..194023.850 rows=855037 loops=1)
Merge Cond: ((ec.deviceid)::text = (b.deviceid_b)::text)
Join Filter: ((b.start_time_b <= ec."time") AND (ec."time" < b.end_time_b))
Rows Removed by Join Filter: 1452792384
-> Sort (cost=41740269.58..42143382.00 rows=161244967 width=56) (actual time=739.639..1020.075 rows=855037 loops=1)
Sort Key: ec.deviceid
Sort Method: external merge Disk: 38608kB
-> CTE Scan on event_cycle ec (cost=0.00..3224899.34 rows=161244967 width=56) (actual time=0.027..182.877 rows=855037 loops=1)
-> Sort (cost=44163001.63..44163101.63 rows=40000 width=64) (actual time=934.232..51961.852 rows=1453235569 loops=1)
Sort Key: b.deviceid_b
Sort Method: quicksort Memory: 593kB
-> Subquery Scan on b (cost=41740269.58..44159944.08 rows=40000 width=64) (actual time=638.853..932.024 rows=5121 loops=1)
-> GroupAggregate (cost=41740269.58..44159544.08 rows=40000 width=64) (actual time=638.852..931.538 rows=5121 loops=1)
Group Key: event_cycle_2.start_time, event_cycle_2.deviceid
-> Sort (cost=41740269.58..42143382.00 rows=161244967 width=56) (actual time=638.826..788.470 rows=855037 loops=1)
Sort Key: event_cycle_2.start_time, event_cycle_2.deviceid
Sort Method: external merge Disk: 35424kB
-> CTE Scan on event_cycle event_cycle_2 (cost=0.00..3224899.34 rows=161244967 width=56) (actual time=0.027..184.881 rows=855037 loops=1)
Planning Time: 0.638 ms
Execution Time: 305182.448 ms
1