I have created two index – index A (100k+ documents) and index B (600k+ documents). Mapping of both index looks like this
embedding of type knn vector
id of type text
I am trying to find top k documents in index B that are similar to a given document in index A. For this I use knn search with a query like so
query_body = {
"size": batch_size,
"query": {
"knn": {
"embedding": {
"vector": embedding,
"k": 10000 # Use the maximum allowed k
}
}
},
"_source": ["id"],
"sort": [
{"_score": "desc"},
{"_id": "asc"} # Secondary sort on _id for consistent pagination
]
}
From the documentation it appears that k has an upper limit of 10,000. Does that mean that only a maximum of 10,000 similar IDs can be returned ?
I
I am not sure if pagination is an option because even in that case only the first 10,000 results would be relevant and any subsequent result would perhaps not be relevant ?