I have two tables that hold user information. Table A has about 7 million records, and table B has zero. Both have an indexe on user_id. My goal is to insert a record in table B that contains a few of the already existing fields in table A, and I am seeing some unexpected performance challenges when batching the process.
Thanks in advance, happy to answer any questions that might give more insight to my situation.
Right now, I am using this code
with toInsert as (
select a.user_id, a.created_at, a.registered_date
from a
left outer join b using (user_id)
where b.user_id is null
limit 50
)
insert into b (user_id, created_at, registered_date)
select user_id, created_at, registered_date
from toInsert
returning *;
However this takes about 3 minutes to run after table B starts growing, while I’d expect that to be much faster. The query plan indicates that it is doing index scans on both tables, and then a merge join on that result
Here is the query plan I’m seeing
Insert on b (cost=0.86..53.58 rows=50 width=48)
-> Subquery Scan on toinsert (cost=0.86..53.58 rows=50 width=48)
-> Limit (cost=0.86..53.33 rows=50 width=32)
-> Merge Anti Join (cost=0.86..6222611.33 rows=5929090 width=32)
Merge Cond: (a.user_id = b_1.user_id)
-> Index Scan using a_pkey on a (cost=0.43..6189771.44 rows=6460394 width=32)
-> Index Only Scan using b_pkey on b b_1 (cost=0.42..10047.61 rows=531304 width=16)
Is there a way to speed up the query? Why do the indexes perform so poorly? I’ve also tried where not in
instead of a left join, which was slightly faster, somehow, and was still doing a full table scan for each batch.
greaseCoder is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.