Flatten a JSON Nested Array in Azure Synapse pipeline

I am trying to flatten a JSON which has nested array objects with 1:M mapping. Please refer below JSON response.

{
    "result": [
        {
            "data": [
                {
                    "dimensions": [
                        "SYNTHETIC-1"
                    ],
                    "timestamps": [
                        17064756000,
                        17064752000,
                        17064751000,
                        17064750000
                    ],
                    "values": [
                        null,
                        null,
                        100,
                        99.6354
                    ]
                },
                {
                    "dimensions": [
                        "SYNTHETIC-2"
                    ],
                    "timestamps": [
                        17064755000,
                        17064754000,
                        17064753000,
                        17064752000
                    ],
                    "values": [
                        101,
                        null,
                        100,
                        99.6354
                    ]
                }
            ]
        }
    ]
}

The above JSON response is an output from a web activity in Azure Synapse. I have used Data Flow to flatten this JSON and unroll by first ‘result’ and then by ‘data’. This gives me dimensions[], timestamps[] and values[] array, but it is not giving the right output. The output I am getting is below:

dimensions    timestamps    values
SYNTHETIC-1   17064756000   null
SYNTHETIC-1   17064756000   null
SYNTHETIC-1   17064756000   100
SYNTHETIC-1   17064756000   99.6354
SYNTHETIC-1   17064752000   null
SYNTHETIC-1   17064752000   null
SYNTHETIC-1   17064752000   100
SYNTHETIC-1   17064752000   99.6354
SYNTHETIC-1   17064751000   null
SYNTHETIC-1   17064751000   null
SYNTHETIC-1   17064751000   100
SYNTHETIC-1   17064751000   99.6354
SYNTHETIC-1   17064750000   null
SYNTHETIC-1   17064750000   null
SYNTHETIC-1   17064750000   100
SYNTHETIC-1   17064750000   99.6354
SYNTHETIC-2   17064755000   101
SYNTHETIC-2   17064755000   null
SYNTHETIC-2   17064755000   100
SYNTHETIC-2   17064755000   99.6354
SYNTHETIC-2   17064754000   101
SYNTHETIC-2   17064754000   null
SYNTHETIC-2   17064754000   100
SYNTHETIC-2   17064754000   99.6354
SYNTHETIC-2   17064753000   101
SYNTHETIC-2   17064753000   null
SYNTHETIC-2   17064753000   100
SYNTHETIC-2   17064753000   99.6354
SYNTHETIC-2   17064752000   101
SYNTHETIC-2   17064752000   null
SYNTHETIC-2   17064752000   100
SYNTHETIC-2   17064752000   99.6354

The output I am expecting is like below:

dimensions    timestamps    values
SYNTHETIC-1   17064756000   null
SYNTHETIC-1   17064752000   null
SYNTHETIC-1   17064751000   100
SYNTHETIC-1   17064750000   99.6354
SYNTHETIC-2   17064755000   101
SYNTHETIC-2   17064754000   null
SYNTHETIC-2   17064753000   100
SYNTHETIC-2   17064752000   99.6354

Can someone please help me to achieve this?

TIA.

Currently, Dataflow doesn’t have any feature to flatten all the nested arrays as per your requirement. In your case, you can try the below workaround using combination of flattens and join transformations.

For some reason, your timestamps array is giving me only null values when the type is an integer array. So, I have set it to the string array. You need to change this array as per your requirement in the source dataset schema.

First, use flatten transformation like below to flatten the data array and to get the inner arrays as columns. Use Rule-based mapping for this.

It will give the result like below.

Then follow the transformations as shown in the above image.

Use flatten2 transformation to flatten the dimensions array. In flatten, give the dimension array for both Unroll by and Unroll root options. It will flatten this array.
Then take a select2 transformation and remove the timestamps column.
Take a new branch from flatten2 and add another select transformation select1 and here remove the values column.
Now, add another flatten transformation flatten3 to the select2 and flatten the array column values. For both Unroll by and Unroll root options, give values column only.
Similarly, add flatten4 to the select1 and here do it for timestamps array column.
After that, take SurrogateKey transformation surrogateKey1 to the flatten3 and create a key column key1 and give the step and start value as 1.
Similarly, do the same for flatten4 transformation. give the same key name as well.
Now, inner join both surrogate key transformations using join transformation based on the common column key1.
Now, it will have required results, use another select transformation and remove the duplicate columns and extra key1 column.

Now, you will get the desired result.

Use the below dataflow script to build your dataflow.

source(output(
        result as (data as (dimensions as string[], timestamps as string[], values as string[])[])[]
    ),
    allowSchemaDrift: true,
    validateSchema: false,
    ignoreNoFilesFound: false,
    documentForm: 'singleDocument') ~> source1
source1 foldDown(unroll(result.data, result),
    mapColumn(
        each(result.data,match(true()))
    ),
    skipDuplicateMapInputs: false,
    skipDuplicateMapOutputs: false) ~> flatten1
flatten1 foldDown(unroll(dimensions, dimensions),
    mapColumn(
        dimensions,
        timestamps,
        values
    ),
    skipDuplicateMapInputs: false,
    skipDuplicateMapOutputs: false) ~> flatten2
flatten2 select(mapColumn(
        dimensions,
        timestamps
    ),
    skipDuplicateMapInputs: true,
    skipDuplicateMapOutputs: true) ~> select1
flatten2 select(mapColumn(
        dimensions,
        values
    ),
    skipDuplicateMapInputs: true,
    skipDuplicateMapOutputs: true) ~> select2
select2 foldDown(unroll(values, values),
    mapColumn(
        dimensions,
        values
    ),
    skipDuplicateMapInputs: false,
    skipDuplicateMapOutputs: false) ~> flatten3
select1 foldDown(unroll(timestamps, timestamps),
    mapColumn(
        dimensions,
        timestamps
    ),
    skipDuplicateMapInputs: false,
    skipDuplicateMapOutputs: false) ~> flatten4
flatten3 keyGenerate(output(key1 as long),
    startAt: 1L,
    stepValue: 1L) ~> surrogateKey1
flatten4 keyGenerate(output(key1 as long),
    startAt: 1L,
    stepValue: 1L) ~> surrogateKey2
surrogateKey1, surrogateKey2 join(surrogateKey1@key1 == surrogateKey2@key1,
    joinType:'inner',
    matchType:'exact',
    ignoreSpaces: false,
    broadcast: 'auto')~> join1
join1 select(mapColumn(
        dimensions = flatten3@dimensions,
        values,
        dimensions = flatten4@dimensions,
        timestamps
    ),
    skipDuplicateMapInputs: true,
    skipDuplicateMapOutputs: true) ~> select3
select3 sink(allowSchemaDrift: true,
    validateSchema: false,
    input(
        result as (data as (dimensions as string[], timestamps as string[], values as string[])[])[]
    ),
    umask: 0022,
    preCommands: [],
    postCommands: [],
    skipDuplicateMapInputs: true,
    skipDuplicateMapOutputs: true) ~> sink1

Recognized by Microsoft Azure Collective

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 02:07

Thẻ: jsonazuregoogle-cloud-dataflowflattenazure-synapse-pipeline

Thiết kế website giá rẻ

Danh mục

Flatten a JSON Nested Array in Azure Synapse pipeline