I need to upsert data to a PostgreSQL database from a high traffic application which needs to be optimized for write performance. The different rows to upsert in a batch will have values for different columns. They are not full updates of all columns. And these would be upserts so the rows would need to be inserted or updated.
My idea is to do an INSERT ON CONFLICT UPDATE where in the update part I’d use CASE WHEN conditions to write into the update part the data to update to for each row, based on the id’s. I cannot simply set the columns to EXCLUDED.column_name as not all incoming rows will have all rows set. Some will have NULLs.
I would like to know:
- Is this an ok way of going about this (see example below)?
- Are there performance issues in doing this the way it is shown below? Is the size of the query going to affect performance? Or using those CASE conditions in the update part? Other performance issues?
- Is there a way to pass the parameters to the database just once instead of repeating them for the insert and update parts? Would using named parameters work for this or are they also passed twice?
- If this is not the best way to do this, how would you go about it? Is there a standard way of doing a batch upsert with this kind of data with different columns provided for different rows to insert?
Example of what my idea for doing this is:
Schema (PostgreSQL v16)
CREATE TABLE employees (emp_id INTEGER, name TEXT, department TEXT,
PRIMARY KEY (emp_id));
Query #1
INSERT INTO employees VALUES (1, 'john', 'sales');
Query #2
INSERT INTO employees (emp_id, name, department) VALUES (1, DEFAULT, 'it'),
(2, 'jack', 'sales')
ON CONFLICT (emp_id) DO UPDATE SET name = CASE
WHEN employees.emp_id=1 THEN employees.name
WHEN employees.emp_id=2 THEN 'jack' END,
department = CASE WHEN employees.emp_id=1 THEN 'it'
WHEN employees.emp_id=2 THEN 'sales' END
WHERE employees.emp_id IN (1, 2);
Expectation is that Query #2 inserted new employee jack and updated employee john’s department to ‘it’
Query #3
SELECT * FROM employees;
emp_id | name | department |
---|---|---|
1 | john | it |
2 | jack | sales |
1
CREATE TABLE employee (
emp_id INTEGER,
name TEXT,
dept TEXT,
PRIMARY KEY (emp_id)
);
INSERT INTO
employee
VALUES
(1, 'John', 'Sales');
MERGE INTO
employee AS e
USING
(
VALUES
(1, NULL, 'IT'),
(2, 'Jack', 'Sales')
)
AS t (emp_id, name, dept)
ON e.emp_id = t.emp_id
WHEN MATCHED THEN
UPDATE SET
name = COALESCE(t.name, e.name),
dept = COALESCE(t.dept, e.dept)
WHEN NOT MATCHED THEN
INSERT
( emp_id, name, dept)
VALUES
(t.emp_id, t.name, t.dept);
SELECT * FROM employee
emp_id | name | dept |
---|---|---|
1 | John | IT |
2 | Jack | Sales |
fiddle