Speeding up initial ltree, GiST-indexed queries in Postgres
Setup
I’m experimenting with ltree
and pattern searching. It’s pretty great, but I’m finding that first runs on searches are very slow compared to later runs. I don’t know if this is a general issue around loading and caching index pages, or something more specific to GiST indexes.
For background, I’m trying a few different ideas out in a Materialized View with the following column list:
+-------------------------------------------------+-----------+
| column_name | data_type |
+-------------------------------------------------+-----------+
| edge_sequence | uuid |
| total_seconds | bigint |
| source_node_count | bigint |
| node_count | integer |
| nodes_without_adjacent_duplicates_count | integer |
| distinct_states_count | integer |
| has_duplicate_adjacent_events | boolean |
| has_circular_links | boolean |
| state_names | citext[] |
| state_names_ltree | ltree |
| state_names_ltree_text | text |
| state_names_without_duplicate_adjacent_elements | citext[] |
| distinct_state_names_ltree | ltree |
| state_symbols | text[] |
| state_symbols_ltree | ltree |
| state_symbols_cstring | text |
+-------------------------------------------------+-----------+
Here’s the GiST index definition for the state_names_ltree
field:
DROP INDEX IF EXISTS edge_cycle_state_names_ltree_gist;
CREATE INDEX edge_cycle_state_names_ltree_gist
ON analytics.edge_cycle
USING GIST (state_names_ltree
gist_ltree_ops(siglen=100) -- siglen defaults to 8, can be set as high as 2024, in 4-byte increments
)
WITH (fillfactor = 100) -- The data will be in a Materialized View, or TRUNCATE-and-rebuild table.
WHERE node_count <= 24; -- Index entries are too long for very long chains (hundreds+)
Query and Plans
This query uses the index to find 581 rows out of the 7,745,994 in the data set:
SELECT edge_sequence,state_names_ltree
FROM edge_cycle
WHERE state_names_ltree ~ 'WaitingAfterSterilizer.*.Decon.*.Sterilizer|VerifyingAfterSterilizer'::lquery
AND node_count <= 24; -- IMPORTANT, the index we want the planner to pick includes this condition.
Here’s the plan on a cold cache:
Index Scan using edge_cycle_state_names_ltree_gist on edge_cycle (cost=0.41..186.90 rows=159 width=180) (actual time=184.459..126255.450 rows=581 loops=1)
Index Cond: (state_names_ltree ~ 'WaitingAfterSterilizer.*.Decon.*.Sterilizer|VerifyingAfterSterilizer'::lquery)
Buffers: shared hit=21 read=215134
I/O Timings: shared read=124327.468
Planning:
Buffers: shared hit=217 read=14
I/O Timings: shared read=10.586
Planning Time: 11.474 ms
Execution Time: 126255.839 ms
Here’s the plan on an early run, but not on a completely cold cache, where I cancelled the search before it finished. So, a luke-warm cache:
Index Scan using edge_cycle_state_names_ltree_gist on edge_cycle (cost=0.41..186.90 rows=159 width=180) (actual time=1.608..62277.936 rows=581 loops=1)
Index Cond: (state_names_ltree ~ 'WaitingAfterSterilizer.*.Decon.*.Sterilizer|VerifyingAfterSterilizer'::lquery)
Buffers: shared hit=102691 read=112443
I/O Timings: shared read=60442.362
Planning:
Buffers: shared hit=1
Planning Time: 0.246 ms
Execution Time: 62278.214 ms
And now the plan when running the same query again immediately after letting a slow-and-load query finish:
Index Scan using edge_cycle_state_names_ltree_gist on edge_cycle (cost=0.41..186.90 rows=159 width=180) (actual time=1.501..1505.078 rows=581 loops=1)
Index Cond: (state_names_ltree ~ 'WaitingAfterSterilizer.*.Decon.*.Sterilizer|VerifyingAfterSterilizer'::lquery)
Buffers: shared hit=215134
Planning:
Buffers: shared hit=1
Planning Time: 0.232 ms
Execution Time: 1505.176 ms
At the extremes, the query speed is around 84x different. This difference seems to be in the read
part of the buffers information:
Buffers: shared hit=21 read=215134
I/O Timings: shared read=124327.468
Buffers: shared hit=102691 read=112443
I/O Timings: shared read=60442.362
Buffers: shared hit=215134
So, it looks like the speed hit is coming from loading the index pages.
Speeding Things Up
I’ve had a look around for what settings I might use to speed up the initial search, but haven’t found anything promising yet. I’m not at all well versed in how to tune the Postgres cache. Either on the hardware or software configuration side. It’s easy to find mentions to GiST indexes being slower than other index types but, in this case, it’s plenty fast enough. Once the pages are loaded. I’ve been using a pg_trgm
GiST index for years, and it is fantastic.
Hardware, Version and Settings
We’re running 16.4 on an RDS db.m5.2xlarge
instance, which has a vCPU count of 8, and RAM of 32 GB. Here are the cache-related settings I found:
select name,setting,unit from pg_settings where name like '%cache%' order by 1;
/*
+----------------------+---------+------+
| name | setting | unit |
+----------------------+---------+------+
| debug_discard_caches | 0 | NULL |
| effective_cache_size | 2015397 | 8kB |
| plan_cache_mode | auto | NULL |
+----------------------+---------+------+
*/
effective_cache_size
is the only setting here that looks relevant. And, unless I’m messing up my arithmetic, that’s a pretty large cache out of 32GB?
I’ve tried changing FILLFACTOR to 100%
While writing this up, I considered the index FILLFACTOR
. The default for B-trees is 90%, I don’t know what the default is for GiST indexes. This data will be in a Materialized View, or wipe-and-replace table. meaning it doesn’t have to contend with updates. A GiST, without an explicit override, gets you NULL
in reloptions
:
SELECT pg_class.relname AS table_name,
pg_class.reloptions
FROM pg_class
JOIN pg_namespace
ON pg_namespace.oid = pg_class.relnamespace
WHERE pg_class.relname = 'edge_cycle_state_names_ltree_gist';
I ran an experiment, and compared the size of the index with the default fill, and then a 100% fill. It looks like GiST is also probably at a 90% FILLFACTOR
, at least when using gist_ltree_ops
:
select pg_size_pretty(pg_relation_size('edge_cycle_state_names_ltree_gist')); -- Default fill: 2,908 MB
select pg_size_pretty(pg_relation_size('edge_cycle_state_names_ltree_gist')); -- 100% fill: 2,601 MB
select 2601/2908::real; -- 0.894
Other Ideas I’ve Considered
Don’t Search Against a GiST Index
ltree
requires a GiST index, if you want indexed pattern searching. But here’s what I get when I bypass the index by not including the node_count <= 24
condition in the query:
Gather (cost=1000.00..1092064.13 rows=159 width=180) (actual time=7.350..2187.419 rows=613 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Seq Scan on edge_cycle (cost=0.00..1091048.23 rows=40 width=180) (actual time=15.969..2180.520 rows=123 loops=5)
Filter: (state_names_ltree ~ 'WaitingAfterSterilizer.*.Decon.*.Sterilizer|VerifyingAfterSterilizer'::lquery)
Rows Removed by Filter: 1549076
Planning Time: 0.348 ms
Execution Time: 2187.555 ms
That’s pretty decent. Slower than a fast GiST query, faster than a slow initial GiST query, more consistent to naive user. (Where “naive” = “any normal person who is not going to get into the weeds.”)
Experimenting with different siglen
values.
The range is 2-2024, in 4 byte increments, with a default of 8. I set it to 100. I came up with that number by throwing a dart at a board. Longer siglen
means larger index nodes, means more index pages, means slower initial load. Or so it seems. I can explore with a shorter siglen
. Any real-world suggestions on how to tune the siglen
would save me lot of trouble.
Reducing the Node Size with Symbols
In theory, an ltree
path can store up to 65,535 labels. In reality, it blows up when the full label path is somehwere under 12,000 bytes. Our raw paths are relatively short (I’m only including paths with lengths of 24 nodes or less, usually 7 or less), but the labels are relatively long. There are only around 60 distinct labels, so I can “symbolize” them as 2-char hex values. For example:
WaitingAfterSterilizer.Decon.Sterilizer
AO.01.04
The symbolized version is completely unreadable, but makes the nodes smaller. Meaning, more nodes in a page, less time loading pages. I’ve compared savings on index size recently, and the index size correlates with the path links, but not proportionally. I guess there’s some kind of overhead/price of admission, no matter what the data. After that, the data itself does make a difference. So, in my cases, I can save 30-60% of the index by converting the labels to shortened versions.
( haven’t explored the symbolization yet much for this loading problem, but smaller is smaller. I set up the symbolization system to simplify experimenting with NetworkX in Python. Also, to run some text similarity experiments in Postgres. (Levenshtein does surprisingly well on the symbolized paths, considering. Trigrams work poorly, which is what you would expect.) But, for ltree
path searches, I’d need to write an interface to translate paths into symbolized paths and….ugh. Even for me, that seems over-the-top. I guess that I could write a set-returning function that takes an ltree, parses it out, substitutes labels with their symbols, and runs the query. Pure SQL for this function should be possible. It could return all column, as the planner seems quite smart about pruning out columns that aren’t included in the outer query.
I’ve wandered around a bit here, but I’m unclear which of the various details and ideas included may be relevant. I’ll be grateful for any suggestions or warnings.