I want to automate the obfuscation of some data in BigQuery. For example replacing all instances of [email protected]
in a table with abc@replaced
, where [email protected]
could be in any column.
My current approach is to find matches by turning the entire row into a JSON string:
SELECT *
FROM `project.dataset.table` a
WHERE CONTAINS_SUBSTR(LOWER(REGEXP_REPLACE(TO_JSON_STRING(a), " ", ""), "[email protected]"))
and then deleting the entire row with:
DELETE FROM `project.dataset.table` a
WHERE CONTAINS_SUBSTR(LOWER(REGEXP_REPLACE(TO_JSON_STRING(a), " ", "")), "[email protected]")
I’m having a hard time figuring out an approach that updates instead of deleting.
For some extra context these queries run as part of a script that scans hundreds of datasets and tables, so it must be mindful of cost.