I have this table (myt) about people that had Covid-19:
CREATE TABLE myt (
name VARCHAR(50),
spouse VARCHAR(50),
covid VARCHAR(10),
gender VARCHAR(10),
height INT
);
INSERT INTO myt (name, spouse, covid, gender, height) VALUES
('red', 'pink', 'yes', 'male', 160),
('blue', NULL, 'no', 'male', 145),
('green', 'orange', 'yes', 'male', 159),
('pink', 'red', 'yes', 'female', 134),
('purple', NULL, 'no', 'female', 124),
('orange', 'green', 'no', 'female', 149);
The table looks like this:
name spouse covid gender height
red pink yes male 160
blue NULL no male 145
green orange yes male 159
pink red yes female 134
purple NULL no female 124
orange green no female 149
I want to answer the following question: If someone had Covid-19, did their spouse also have Covid-19?
I first tried a simple approach involving a self-join to only find situations where both partners had Covid:
SELECT a.name AS Person, a.spouse AS Spouse, a.covid AS Person_Covid, b.covid AS Spouse_Covid
FROM myt a
JOIN myt b ON a.spouse = b.name
WHERE a.covid = 'yes' AND b.covid = 'yes';
Now I want to include all names and all columns in the final result – and add an indicator to summarize the results.
I tried the following logic that builds off the previous approach using COALESCE and CASE WHEN statements:
SELECT
COALESCE(a.name, b.spouse) AS Partner1_Name,
a.covid AS Partner1_Covid,
a.gender AS Partner1_Gender,
a.height AS Partner1_Height,
COALESCE(b.name, a.spouse) AS Partner2_Name,
b.covid AS Partner2_Covid,
b.gender AS Partner2_Gender,
b.height AS Partner2_Height,
CASE
WHEN a.covid = 'yes' AND b.covid = 'yes' THEN 'both partners had covid'
WHEN a.covid = 'yes' AND b.covid = 'no' OR a.covid = 'no' AND b.covid = 'yes' THEN 'one partner had covid'
WHEN a.covid = 'no' AND b.covid = 'no' THEN 'neither partner had covid'
WHEN a.spouse IS NULL OR b.spouse IS NULL THEN 'unmarried'
END AS Covid_Status
FROM myt a
FULL OUTER JOIN myt b ON a.spouse = b.name;
Can someone please tell me if I have done this correctly? Have I overcomplicated the final result?
Thanks!