I need to upsert data to a PostgreSQL database from a high traffic application which needs to be optimized for write performance. The different rows to upsert in a batch will have values for different columns. They are not full updates of all columns. And these would be upserts so the rows would need to be inserted or updated.
My idea is to do an INSERT ON CONFLICT UPDATE where in the update part I’d use CASE WHEN conditions to write into the update part the data to update to for each row, based on the id’s. I cannot simply set the columns to EXCLUDED.column_name as not all incoming rows will have all rows set. Some will have NULLs.
I would like to know:
- Is this an ok way of going about this (see example below)?
- Are there performance issues in doing this the way it is shown below? Is the size of the query going to affect performance? Or using those CASE conditions in the update part? Other performance issues?
- If this is not the best way to do this, how would you go about it? Is there a standard way of doing a batch upsert with this kind of data with different columns provided for different rows to insert?
Example of what my idea for doing this is:
Schema (PostgreSQL v16)
CREATE TABLE employees (emp_id INTEGER, name TEXT, department TEXT, PRIMARY KEY (emp_id));
Query #1
INSERT INTO employees VALUES (1, 'john', 'sales');
Query #2
INSERT INTO employees (emp_id, name, department) VALUES (1, DEFAULT, 'it'),
(2, 'jack', 'sales')
ON CONFLICT (emp_id) DO UPDATE SET name = CASE
WHEN employees.emp_id=1 THEN employees.name
WHEN employees.emp_id=2 THEN 'jack' END,
department = CASE WHEN employees.emp_id=1 THEN 'it'
WHEN employees.emp_id=2 THEN 'sales' END
WHERE employees.emp_id IN (1, 2);
Expectation is that Query #2 inserted new employee jack and updated employee john’s department to ‘it’
Query #3
SELECT * FROM employees;
emp_id | name | department |
---|---|---|
1 | john | it |
2 | jack | sales |