Playing with vector search in OpenSearch and getting strange similarity scores, trying to understand why.
To keep things simple, I’ve indexed two vectors – [0, 1] and [0, -1].
To my understanding similarity score is simply cosine value for the angle between vectors (also I’ve used online similarity calculator and my own written function to compare).
So in this case angle is 180 degrees, so cos(alpha) value should be -1.
Though in open search when I do search for:
POST /my-index2/_search
{
"size": 10,
"query": {
"knn": {
"content_vector": {
"vector": [0, 1],
"k": 10
}
}
}
}
I get following:
"hits": [
{
"_index": "my-index2",
"_id": "1",
"_score": 0.9999999,
"_source": {
"content": "up",
"content_vector": [
0,
1
]
}
},
{
"_index": "my-index2",
"_id": "2",
"_score": 0.33333334,
"_source": {
"content": "down",
"content_vector": [
0,
-1
]
}
}
]
My index definition looks like this:
PUT /my-index2
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"content": {
"type": "text"
},
"content_vector": {
"type": "knn_vector",
"dimension": 2,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "nmslib",
"parameters": {
"ef_construction": 500,
"m": 16
}
}
}
}
}
}