Given that a product ID may have duplicates with an order ID, we need to count the number of duplicates for each product ID and list all the associated product IDs.
{
“size”: 0,
“aggs”: {
“products_with_duplicates”: {
“terms”: {
“field”: “product_id”,
“size”: 10000 // Adjust size according to the number of unique product IDs you expect
},
“aggs”: {
“order_count”: {
“terms”: {
“field”: “order_id”,
“size”: 10000 // Adjust size according to the number of unique order IDs you expect
}
},
“duplicate_count”: {
“bucket_selector”: {
“buckets_path”: {
“orderCount”: “order_count._bucket_count”
},
“script”: “params.orderCount > 1” // This script filters out product IDs with only one order ID
}
}
}
}
}
}
The above query works fine in smaller dataset however it fails in larger dataset like when it is more than 10000.
Ahemad Ali Khan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.