Let’s say I have two tables (from uploaded csv files) and I want to do a diff based on an id+territory in the new file that wasn’t in the old file. The easiest way to do this is something like:
SELECT id, territory FROM this_week EXCEPT SELECT id, territory FROM last_week
However, what I’m trying to do is get ALL fields (in both tables — one row per key) that are generated by that difference. How could this be done?
Either postgres or bigquery is fine. Both have the EXCEPT
set op.
An example with data from Erwin’s answer:
WITH this_week (id,territory,name,other) AS (VALUES(1,'us','titanic','uhd'),(22,'us','spider','hd'),(3,'fr','new','hd')),
last_week (id,territory,name,other) AS (VALUES(1,'us','titanic','uhd'),(2,'us','spider','hd'))
SELECT * -- all columns of "this_week"
FROM this_week t
WHERE NOT EXISTS (
SELECT * FROM last_week l
WHERE t.id = l.id
AND t.territory = l.territory
);
Use NOT EXISTS
:
SELECT * -- all columns of "this_week"
FROM this_week t
WHERE NOT EXISTS (
SELECT FROM last_week l
WHERE t.id = l.id
AND t.territory = l.territory
);
And index on last_week (id, territory)
would typically help performance (a lot).
This shows all columns of this_week
.
I don’t see the point in adding columns of last_week
, which would be empty (null
) by definition of the query if you’d left-join.
Basics:
- Select rows which are not present in other table
Note a subtle difference:
EXCEPT
(when used without ALL
) folds duplicates. This query does not. You may want one or the other. Typically, you want this.
3
Comparing two identically structured tables (same column names in same ordinal position), with a primary key (unique and non nullable) pk:
SELECT t1.*
, t2.*
FROM Table1 t1
FULL JOIN Table2 t2
ON t1.pk = t2.pk
WHERE NOT EXISTS(
SELECT t1.*
INTERSECT
SELECT t2.*
)
This will give you all the differences between the two tables. If there is a matching pk in both tables, you will see records where the rest of the columns differ somewhere. If a row is in t1 and not in t2 (by pk) it will appear, if a row is in t2 and not t1 (by pk) it will appear.
If the tables aren’t identically structured you will need to expand the * in the subquery, and explicitly list out the columns you want to compare.
An example
--Create two identically strctured tables
CREATE TABLE Table1(PK BIGINT NOT NULL
, Col1 BIGINT
, Col2 BIGINT
)
;
CREATE TABLE Table2(PK BIGINT NOT NULL
, Col1 BIGINT
, Col2 BIGINT
)
;
--insert some data into them
INSERT INTO Table1(PK
, Col1
, Col2
)
VALUES
(1, 100, 1000) --Pk In both Table1 and Table2, and contents match
,(2, 101, 1001) --Pk In both Table1 and Table2, and contents do not match
,(3, 102, 1002) --Pk only in Table1
;
INSERT INTO Table2(PK
, Col1
, Col2
)
VALUES
(1, 100, 1000) --Pk In both Table1 and Table2, and contents match
,(2, 0, 0) --Pk In both Table1 and Table2, and contents do not match
,(4, 103, 1003) --Pk only in Table2
;
--Compare them
SELECT CASE
WHEN t1.PK IS NULL --This assumes that PK being NULL is due to the JOIN only!
THEN 'In Table2, and not in Table1'
WHEN t2.pk IS NULL --This assumes that PK being NULL is due to the JOIN only!
THEN 'In Table1, and not in Table2'
ELSE 'In both tables, but contents differ'
END AS ComparisonDescription
, t1.*
, t2.*
FROM Table1 t1
FULL JOIN Table2 t2
ON t1.pk = t2.pk
WHERE NOT EXISTS(
SELECT t1.* --IF not identically structured, select the common columns here
INTERSECT
SELECT t2.* --IF not identically structured, select the common columns here
)
;
Db<>Fiddle with the example