I need to implement scroll API on Azure Data Factory Copy Activity from OpenSearch via rest API (https://opensearch.org/docs/2.0/opensearch/search/paginate/).
The Scroll API can be used to iterate over a large amount of OpenSearch documents matching a query, or even all the matching documents.
Here you are an example (curl):
First request
GET products/_search?scroll=1m
{"size": 10 }
Next request
GET _search/scroll
{"scroll_id":"FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFlViblo1YXZTU2pxYkl1eE9zV3JLaEEAAAAAAAAAChZLem1HOElfcVRkT2JoZWljcnlWaFN3" }
I had troubles with ADF Copy Activity pagination rules. Please help.
I tried by first catching the scroll_id as output from the first call, storing it to a variable. But I couldn’t figure out how to use that variable in Copy Activity’s pagination rule (next request) in order to copy the OpenSearch Index json response.
Thanks
As per the Documentation
Pagination in ADF supports the case of
Next request’s header = property value in current response body
.
Your scenario is same as this where the current page gives the value of _scroll_id
which should be used as the next page header scroll_id
.
So, you can try the pagination rules like below in the copy activity source.
In the URL of the REST API, give your _search/scroll
URL and in the source of the copy activity, create a header scroll_id
and give the variable to it in which you have stored your first _scroll_id
from web activity.
In the pagination rules, give your header name and provide the body property _scroll_id
like below.
You can check the similar SO answer by @Matt. Also, check the same documentation for the end condition of the pagination.
You can also try loop approach in this case using until activity.