Several (20+) tables in my database define columns current_since::timestamptz
and current_until::timestamptz
. Each row in each of those tables follows the same rules:
- If
current_since
is null, the row is considered “draft”. - If
current_sice
is not null andcurrent_until
is null, the row is considered “current aftercurrent_since
moment, until forever“. - If
current_since
andcurrent_until
is set, then the row is considered to be current between the two moments.
Each table uses a btree index on (current_since, current_until)
There are many different SQL queries where I want to ensure I only load rows current at a particular time (usually an arbitrary user input). For that, a condition might look like this (NOW()
would usually be a variable holding the user input):
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM record WHERE
NOW() BETWEEN record.current_since AND record.current_until
OR (
record.current_since <= NOW()
AND record.current_until IS NULL
)
The execution plan (using SET enable_seqscan = OFF;
) is as follows:
Bitmap Heap Scan on record (cost=23.97..42.32 rows=374 width=300) (actual time=0.056..0.246 rows=374 loops=1)
Recheck Cond: (((now() >= current_since) AND (now() <= current_until)) OR ((current_since <= now()) AND (current_until IS NULL)))
Filter: (((now() >= current_since) AND (now() <= current_until)) OR ((current_since <= now()) AND (current_until IS NULL)))
Heap Blocks: exact=9
Buffers: shared hit=11
-> BitmapOr (cost=23.97..23.97 rows=374 width=0) (actual time=0.040..0.041 rows=0 loops=1)
Buffers: shared hit=2
-> Bitmap Index Scan on idx_current_at (cost=0.00..11.89 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1)
Index Cond: ((current_since <= now()) AND (current_until >= now()))
Buffers: shared hit=1
-> Bitmap Index Scan on idx_current_at (cost=0.00..11.89 rows=374 width=0) (actual time=0.034..0.034 rows=374 loops=1)
Index Cond: ((current_since <= now()) AND (current_until IS NULL))
Buffers: shared hit=1
Planning:
Buffers: shared hit=4
Planning Time: 0.194 ms
Execution Time: 0.324 ms
It gets bad quickly when multiple of said tables are part of a larger join query:
SELECT * FROM record
LEFT JOIN comment
ON comment.record_id = record.id
AND (
NOW() BETWEEN comment.current_since AND comment.current_until
OR (
comment.current_since <= NOW()
AND comment.current_until IS NULL
)
)
LEFT JOIN attachment
ON attachment.comment_id = comment.id
AND (
NOW() BETWEEN attachment.current_since AND attachment.current_until
OR (
attachment.current_since <= NOW()
AND attachment.current_until IS NULL
)
)
WHERE
NOW() BETWEEN record.current_since AND record.current_until
OR (
record.current_since <= NOW()
AND record.current_until IS NULL
)
^ that’s a lot of “clutter” IMO.
My question is: How can I avoid writing fairly complex conditions all the time manually? If I ever decide to change the logic, I will have to update many queries manually. Yet, I would rather do that than sacrifice performance.
I have tried to write a stored function returning a boolean:
CREATE FUNCTION is_current(
at TIMESTAMP WITH TIME ZONE,
current_since TIMESTAMP WITH TIME ZONE,
current_until TIMESTAMP WITH TIME ZONE
)
RETURNS BOOLEAN
LANGUAGE plpgsql
AS
$$
BEGIN
IF current_since IS NULL THEN
RETURN FALSE;
END IF;
IF current_since <= at AND current_until IS NULL THEN
RETURN TRUE;
END IF;
RETURN at BETWEEN current_since AND current_until;
END
$$;
Using it instead of the condition
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM record WHERE
is_current(NOW(), record.current_since, record.current_until)
But it was significantly slower, as I expected:
Seq Scan on record (cost=10000000000.00..10000000107.17 rows=125 width=300) (actual time=27.852..28.052 rows=374 loops=1)
Filter: is_current(now(), current_since, current_until)
Buffers: shared hit=9
Planning Time: 0.117 ms
JIT:
Functions: 2
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.400 ms, Inlining 3.039 ms, Optimization 16.067 ms, Emission 8.698 ms, Total 28.204 ms
Execution Time: 28.509 ms
Bonus question: is it so much slower because using a function like this prevents the query executor from accessing the columns directly, forcing it constantly to call the function and then somehow work with the boolean result?