problem
I have tried three approaches to map the output of of my custom skill to populate Edm.complex type field in my search index. None seem to populate the field. The need is that each document in the search index contains the following chunk_object
field.
index field definition
{"name": "chunk_object",
"type": "Edm.ComplexType",
"fields": [
{
"name": "chunk_content",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"stored": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": "standard.lucene",
"normalizer": null,
"dimensions": null,
"vectorSearchProfile": null,
"vectorEncoding": null,
"synonymMaps": []
},
{
"name": "page_start",
"type": "Edm.Int64",
"searchable": false,
"filterable": true,
"retrievable": true,
"stored": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": null,
"normalizer": null,
"dimensions": null,
"vectorSearchProfile": null,
"vectorEncoding": null,
"synonymMaps": []
},
{
"name": "page_end",
"type": "Edm.Int64",
"searchable": false,
"filterable": true,
"retrievable": true,
"stored": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": null,
"normalizer": null,
"dimensions": null,
"vectorSearchProfile": null,
"vectorEncoding": null,
"synonymMaps": []
},
{
"name": "chunk_idx",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"stored": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": "standard.lucene",
"normalizer": null,
"dimensions": null,
"vectorSearchProfile": null,
"vectorEncoding": null,
"synonymMaps": []
}
]
}
custom skill output
The output of custom skill is mapped to /document/jsonChunks/* . jsonChunks contains 239 objects.
{
"values": [
{
"recordId": "1",
"data": {
"jsonChunks": [
{
"chunk": "this is chunk 1",
"page_start": 1,
"page_end": 1,
"chunk_idx": "#1-file.pdf'"
},
{
"chunk": "this is chunk 2",
"page_start": 1,
"page_end": 1,
"chunk_idx": "#1-file.pdf'"
}
]
}
}
]
}
in-memory output of custom skill
-/jsonChunks Object[239]
-/*
-/chunk_content
-/page_start
-/page_end
-/chunk_idx
The in-memory enriched data structure for /document/jsonChunks/*
is as follows
my approach
I will share the shaper skill definition and the in-memory enriched structure for each approach.
approach 1
skill definition
{
"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"name": "#2",
"description": "",
"context": "/document",
"inputs": [
{
"name": "chunk_content",
"source": "/document/jsonChunks/*/chunk"
},
{
"name": "page_start",
"source": "/document/jsonChunks/*/page_start"
},
{
"name": "page_end",
"source": "/document/jsonChunks/*/page_end"
},
{
"name": "chunk_idx",
"source": "/document/jsonChunks/*/chunk_idx"
}
],
"outputs": [
{
"name": "output",
"targetName": "chunk_object"
}
]
}
in-memory output
/document/chunk_object Object
-/chunk_content Object[239]
-/*
-/page_start Object[239]
-/*
-/page_end Object[239]
-/*
-/chunk_idx Object[239]
-/*
approach 2
skill defintion
{
"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"name": "#2",
"description": "",
"context": "/document",
"inputs": [
{
"name": "jsonChunk",
"source": "/document/jsonChunks/*"
}
],
"outputs": [
{
"name": "output",
"targetName": "chunk_object"
}
]
}
in-memory output
/document/chunk_object Object
-/jsonChunk Object[239]
-/*
-/chunk_content
-/page_start
-/page_end
-/chunk_idx
approach 3
skill defintion
{
"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
"name": "#2",
"description": "",
"context": "/document",
"inputs": [
{
"name": "jsonChunk",
"sourceContext": "/document/jsonChunks/*",
"inputs": [
{
"name": "chunk_object",
"source": "/document/jsonChunks/*/chunk"
},
{
"name": "page_start",
"source": "/document/jsonChunks/*/page_start"
},
{
"name": "page_end",
"source": "/document/jsonChunks/*/page_end"
},
{
"name": "chunk_idx",
"source": "/document/jsonChunks/*/chunk_idx"
}
]
}
],
"outputs": [
{
"name": "output",
"targetName": "chunk_object"
}
]
}
in-memory output
/document/chunk_object Object
-/jsonChunk Object[239]
-/*
-/chunk_content
-/page_start
-/page_end
-/chunk_idx
None of my approaches above are working and the index field remains unpopulated. Can anyone please suggest any pointers or the right approach here? TIA