I have 100K documents, among which only 50 have the alias property. Consider the following two documents:
doc1:
{
"name" : "FAC",// no alias property
}
doc2:
{
"name" : "some data",
"alias" : [ "FCC"]
}
I’m using the following multi-match query:
{
"query": {
"multi_match": {
"query": "FCC",
"type": "most_fields",
"fuzziness": 1,
"prefix_length": 0,
"operator": "AND",
"fields": [
"alias",
"name"
]
}
}
}
doc1 appears before doc2 in the response which is unexpected result. The score of doc1 is higher compared to doc2 because the alias property is absent in most documents. The default Similarity algorithm BM25 uses boost, IDF and TF to calculate the score. IDF is higher for doc1 compared to doc2 because it depends on the property. I can use Boolean Similarity to achieve expected behavior but I don’t want to update the default Similarity to Boolean and change the behavior for all other queries.
Is there any way I can achieve a similar behavior as Boolean Similarity using function_score
or provide similarity at runtime?