I’m working with a (Databricks) SQL database where I need to update the casing of the “State” field, which is stored as part of a JSON string inside the “Payload” column. The “State” field can have values like “granted”, “denied”, etc., and I need to ensure that these values are correctly capitalized (“Granted”, “Denied”).
Here’s the structure of the JSON stored in the “Payload” column:
{
"Data": {
"Key": "some_value",
"Language": "en",
"State": "granted",
"Url": "http://example.com",
"Source": "web",
"ExpKey": "exp_456",
"Timestamp": "2024-01-01T00:00:00Z"
},
"EventType": "consent"
}
The “State” field needs to be updated to the correct casing. I created a temporary view to extract and modify this value and then used a MERGE statement to update the original table.
Here’s the query I’m using:
-- Step 1: Create a temporary view that extracts the relevant data from payload
CREATE OR REPLACE TEMP VIEW temp_data_view AS
SELECT
from_json(Payload, 'STRUCT<
CorrelationKey: STRING,
Data: STRUCT<
Key: STRING,
Language: STRING,
State: STRING,
Url: STRING,
Source: STRING,
ExpKey: STRING,
Timestamp: TIMESTAMP>,
EventType: STRING>') AS parsed_data,
Key,
EventTimestamp,
CASE
WHEN from_json(Payload, 'STRUCT<Data: STRUCT<State: STRING>>').Data.State = 'granted' THEN 'Granted'
WHEN from_json(Payload, 'STRUCT<Data: STRUCT<State: STRING>>').Data.State = 'denied' THEN 'Denied'
ELSE from_json(Payload, 'STRUCT<Data: STRUCT<State: STRING>>').Data.State
END AS new_State
FROM schema_name.data_table_name;
-- Step 2: Update State values
MERGE INTO schema_name.data_table_name target
USING (
SELECT
Key,
EventTimestamp,
parsed_data,
new_State
FROM temp_data_view
) source
ON target.Key = source.Key
AND target.EventTimestamp = source.EventTimestamp
WHEN MATCHED THEN
UPDATE SET target.Payload = to_json(named_struct(
'Data', named_struct(
'Key', parsed_data.Data.Key,
'Language', parsed_data.Data.Language,
'State', source.new_State,
'Url', parsed_data.Data.Url,
'Source', parsed_data.Data.Source,
'ExpKey', parsed_data.Data.ExpKey,
'Timestamp', parsed_data.Data.Timestamp
),
'EventType', parsed_data.EventType
));
-- Step 3: Drop the temporary view
DROP VIEW IF EXISTS temp_data_view;
The issue I’m running into is that 1) it’s probably very inefficient to do it like this and 2) I can’t be sure there’s no duplicated records in the table I’m merging into, so I’m running into [DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_ROW_IN_MERGE] errors when I run it. Any suggestions?