Currently I’m working on improving search result on my project. I’m not familiar with algorithm, so may ask something stupid.
Let’s say the project has millions restaurants stored in elasticsearch like this:
{
"id", 1,
"name", "some name",
"address", "some address",
"score", 5,
"desc", "some desc"
}
Currently we are using elastic search built in tokenizer to tokenize user input and searching using BM25.
e.g. “Best pizza in London”, the user input will be tokenized like “Best pizza”, “London”, etc, and then searching using BM25.
However, it comes the issue that from user’s perspective, their expectation with priority is:
- Best to find an list for “Best pizza in London”
- Then followed by “Best pizza-like food in London”
- Then followed by “Best food other than pizza in London”
But not “Best pizza in York”.
The issue is BM25 relies on frequency and document length, and treat all tokens equally. However, in real world, their importance varies.
It’s not only the location, due to various user inputs, it could be the person, the event, etc that matters.
So nowadays, for such a requirement, what could be a better way to solve?