An ex colleague of mine has written many highly complex nested-view views using multiple ctes.
The end views are very fast.
It seems to me that his end views filter the initial ctes, making the rest of the ctes in the view much lighter.
However, when I try to replicate such views, the filter is only applied after all the ctes have run. This can result in huge amounts of reads and thus long execution times.
My thinking is that, because the filtered columns from the initial cte are used in the views’ end SELECT, that those should be filtered on at the start the query plan.
Table tasks has ~17million rows
Table tasklogs has ~90 million rows
All needed indexes are in place and enabled.
E.g.
CREATE OR REPLACE VIEW public.TaskDetails
AS
WITH ctetasks
AS
(
SELECT t_1.id AS taskid,
t_1.custid,
t_1.empid,
t_1.tasktimestamp
FROM tasks t_1
), ctetaskstatusinfo AS (
SELECT tl.taskid,
-- Multiple aggregation columns
FROM ctetasks t_1
LEFT JOIN taskslog tl ON t_1.taskid = tl.taskid
WHERE tl.params LIKE '%status%'::text
GROUP BY tal.taskid
)
-- Multiple other ctes
SELECT t.taskid AS id,
t.tasktimestamp,
t.empid,
t.custid,
task.*
FROM ctetasks t
JOIN ctetaskstatusinfo tsi ON t.taskid = tsi.taskid
JOIN tasks task ON t.taskid = task.id;
I would expect querying this view like this would filter the inital cte first.
SELECT *
FROM public.TaskDetails
WHERE 1 = 1
AND custid = 132
AND empid IN ('ab', 'cd', 'ef')
AND tasktimestamp BETWEEN '2024-01-01' AND '2024-02-01'
I can create functions to use parameters to create a temp table to replace the first cte, but functions are pretty clunky to make in MSSQL and PostgreSQL.
Am I creating my multiple views incorrectly or am I mistaken in how views with multiple ctes should filter and is there any behavior differences between mssql and postgresql?
Hope that makes sense.
Let me know if you can think of a more appropriate question title.