Consider this [v good] example from Using recursive CTE to generate a range of data
WITH RECURSIVE date_range AS (
SELECT '2023-01-01'::timestamp AS date
UNION ALL
SELECT date + interval '1 day'
FROM date_range
WHERE date < '2023-12-31'
)
SELECT *
FROM date_range;
I can understand how to create this query, but do not understand the underlying logic/rules executed by the database engine. Specifically: for the select date + 'interval 1 day' from date_range
: how does it know to only “consider” the very last added record [instead of taking all the previously added records and adding 1 to each of them]?
To illustrate more clearly: consider when we are at 2023-01-03
in the recursion. In that case we already presumably have 2023-01-01
, 2023-01-02
, and 2023-01-03
in the table. Why does the second query shown below not “find” three records and add 1 to all of those 3 records?
SELECT date + interval '1 day'
FROM date_range
WHERE date < '2023-12-31'
Instead what is happening is the engine only “finds” the very last record. 2023-01-03
and adds one to that to generate the single new record 2023-01-04
? Is that just something I take on “faith” for recursive
views?
7
consider when we are at
2023-01-03
in the recursion. In that case we already presumably have2023-01-01
,2023-01-02
, and2023-01-03
in the table. Why does the second query shown below not “find” three records and add 1 to all of those 3 records
Because there are 3 different tables (tuplestores) involved. Results
keep getting accumulated and that’s where the 2023-01-01
, 2023-01-02
, and 2023-01-03
live – the query doesn’t target that. Instead it targets work_table
which gets wiped and replaced with contents of intermediate_table
from previous iteration – at that point it only holds 2023-01-03
:
iteration | work_table | intermediate_table | results |
---|---|---|---|
1 | (‘2023-01-01’) non-recursive term |
(‘2023-01-01’+interval ‘1 day’) | (‘2023-01-01’), (‘2023-01-02’) |
2 | (‘2023-01-02’) previous values from intermediate |
(‘2023-01-02’+interval ‘1 day’) | (‘2023-01-01’), (‘2023-01-02’), (‘2023-01-03’) |
3 | (‘2023-01-03’) previous values from intermediate |
(‘2023-01-03’+interval ‘1 day’) | (‘2023-01-01’), (‘2023-01-02’), (‘2023-01-03’), (‘2023-01-04’) |
Behaviour above illustrates the description from the doc:
Evaluate the non-recursive term. (…) Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
So long as the working table is not empty, repeat these steps:
a) Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. (…) Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
b) Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table
The same description in src/backend/executor/nodeRecursiveunion.c
responsible for the feature, sounds a bit more cryptic:
* 1. evaluate non recursive term and assign the result to RT //ok
*
* 2. execute recursive terms //ok
*
* 2.1 WT := RT //ok
* 2.2 while WT is not empty repeat 2.3 to 2.6. if WT is empty returns RT
* 2.3 replace the name of recursive term with WT
* 2.4 evaluate the recursive term and store into WT
* 2.5 append WT to RT
* 2.6 go back to 2.2
But the code really does what it said on the tin, how it said it. Here’s the 2.2, 2.3, 2.6 part, even matching the names from the doc:
/* 2. Execute recursive term */
for (;;)
{
slot = ExecProcNode(innerPlan);
if (TupIsNull(slot))
{
/* Done if there's nothing in the intermediate table */
if (node->intermediate_empty)
break;
/* done with old working table ... */
tuplestore_end(node->working_table);
/* intermediate table becomes working table */
node->working_table = node->intermediate_table;
/* create new empty intermediate table */
node->intermediate_table = tuplestore_begin_heap(false, false,
work_mem);
node->intermediate_empty = true;
/* reset the recursive term */
innerPlan->chgParam = bms_add_member(innerPlan->chgParam,
plan->wtParam);
/* and continue fetching from recursive term */
continue;
}
If you think about it, the “taking all the previously added records and adding 1 to each of them” thing is exactly what it does. All added previously, as in previous iteration, not all processed since the beginning of operation. In your case, it started with a single row, so in each iteration all previously added records amount to 1. If you started with 2, you’d get 2 at a time, for as long as both match the WHERE
condition.
3
From the link noted by @nbk postgresql Recursive Queries: the key step is this:
Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
So for the query mentioned in the question the intermediate table would apparently be only the newly generated result [and not any others previously generated by the recursion].
The recursion means calling a peace of code from the same peace of code. The most important thing regarding recursions is to define how and when to escape from it. It’s done here with the WHERE clause. Make sure, if you use any kind of recursions, to define a safe exit first to avoid indefinite loops.
- a recursive cte is one peace of code that generates a set of rows. It has two parts that are unioned to give a complete resultset. But it is still one code declaring a cte. The delaration starts with:
WITH RECURSIVE date_range AS (
- first part (the fixed part) in your example doesn’t change the results through recursion steps – it gives you a starting point with just that 1 row
SELECT '2023-01-01'::timestamp AS date
/*
date
-------------------
2023-01-01 00:00:00
- the second part uses union to add another row and checks is it the right time for safe escape from the recursion
UNION ALL
SELECT date + interval '1 day'
FROM date_range
WHERE date < '2023-01-04' -- safe escape when date is < '2023-01-04'
/* 1st pass result
date
-------------------
2023-01-01 00:00:00 -- row from the first part ( read no. 2. )
2023-01-02 00:00:00 -- this is the value of date column at the moment
- If it is not time to exit the recursion the code as a whole calls itself to do the union part again and check if it is time to escape
/* 2nd pass
date
-------------------
2023-01-01 00:00:00 -- row from the first part ( read no. 2. )
2023-01-02 00:00:00 -- a row from the first pass union
2023-01-03 00:00:00 -- this is the value of date column at the moment
- that repeats until the escape condition is true – false is – get out of this right now ….