We have an Excel file in file store that contains more than 10,000 columns of JSON data in each excel file.
for example, a sample like below.
json:
{"SearchName":"","Id":"","RequestType":"","StartDateUtc":"2022-12-01T00:00:00Z","EndDateUtc":"2023-04-28T00:00:00Z","RecordType":null,"Workload":"","Operations":[],"Users":["d503246e-285c-41bc-8b0a-bc79824146ea,[email protected],ab6019a4-851c-4af2-8ddc-1e03ee9be97a,[email protected],85ff7cda-5f2d-4d32-b51c-b88ad4d55b5a,[email protected],48168530-659c-44d3-8985-65f9b0af2b85,[email protected],0937a1e5-8a68-4573-ae9c-e13f9a2f3617,[email protected],c822dd8b-0b79-4c13-af1e-bc080b8108c5,[email protected],ca0de5ba-6ab2-4d34-b19d-ca702dcbdb8d,[email protected]"],"ObjectIds":[],"IPAddresses":[],"SiteIds":null,"AssociatedAdminUnits":[],"FreeText":"multifactor","ResultSize":0,"TimeoutInSeconds":345600,"ScopedAdminWithoutAdminUnits":false}
We just want to change the user hash value to normal mask values.
like this: [email protected]
into [email protected]
for whole excel file users.
Every time we manually copy users data and masking like below, it takes a lot of time for us. Then, after whatever output we got, we just replaced the hash value with the output.
import random
main=['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
l=["0e07209b-807b-4938-8bfd-f87cee98e924,[email protected],c747a82c-656e-40eb-9194-88c4a0f8061e"]
n=len(l)
print(n)
print(random.sample(main,n))
My question Is there any way in azure databricks replace whole Excel file user json key hash values to random users like this [email protected]
at a time and rewrite back to particular location