I am working on an image similarity search project. I performed embedding insertion and created a collection through Milvus, but the retrieved results contain duplicate images and I have only inserted embeddings once in this code. I am wondering what caused the situation and how I would proceed to fix this issue.
Below is the related code segment. Happy to provide more information if that’s helpful. Thanks in advance.
@st.cache_resource
def get_milvus_client(uri):
logger.info("Setting up Milvus client")
return MilvusClient(uri=uri)
root = "./train"
extractor = load_model("resnet34")
def insert_embeddings(client):
print('inserting')
global extractor
root = "./train"
for dirpath, foldername, filenames in os.walk(root):
for filename in filenames:
if filename.endswith(".JPEG"):
filepath = os.path.join(dirpath, filename)
img = Image.open(filepath)
image_embedding = extractor(img)
client.insert(
"image_embeddings",
{"vector": image_embedding, "filename": filepath},
)
client = get_milvus_client(uri="example.db")
client.create_collection(
collection_name="image_embeddings",
vector_field_name="vector",
dimension=512,
auto_id=True,
enable_dynamic_field=True,
metric_type="COSINE",
)
insert_embeddings(client)
Rashad Tockey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.