I had thought that num_entities
would indicate the number of records (or whatever the correct term is) within a Milvus collection. However, I created 1 file – test_milvus.py
to create a simple collection like so:
import numpy as np
from pymilvus import connections, Collection, CollectionSchema, FieldSchema, DataType
connections.connect(alias='default',host='localhost', port='19530')
# Define the schema
schema = CollectionSchema([FieldSchema("id", DataType.INT64, is_primary=True, max_length=100),
FieldSchema("vector", DataType.FLOAT_VECTOR, dim=2)])
# Create a collection
collection = Collection("test_collection", schema)
# Insert data
data = [{"id":i, "vector": np.array([i, i],dtype=np.float32)} for i in range(10)]
collection.insert(data)
# Flush data
collection.flush()
# Disconnect from the server
connections.disconnect(alias='default')
and another to get information on collections within a Milvus database = milvus_info.py
– like so:
from pymilvus import Collection, connections, db, utility
def get_info (host: str = "localhost", port: str = "19530"):
# Connect to Milvus (replace with your connection details)
connections.connect(alias="default", host=host, port=port) # Replace with your connection parameters
# Print the list of databases and collections
db_list = db.list_database()
for db_name in db_list:
print(f"Database: {db_name}")
collection_list = utility.list_collections(using=db_name)
if len(collection_list) == 0:
print(" No collections")
for collection_name in collection_list:
print(f" Collection: {collection_name}")
temp_collection = Collection(name=collection_name)
for info in temp_collection.describe():
print(f" {info}: {temp_collection.describe()[info]}")
print(f" Number of entities: {temp_collection.num_entities}")
# Disconnect from Milvus
connections.disconnect(alias='default')
if __name__ == "__main__":
get_info()
The first time I ran test_milvus
followed by milvus_info.py
, I got this output:
$ python test_milvus.py
$ python milvus_info.py
Database: default
Collection: test_collection
collection_name: test_collection
auto_id: False
num_shards: 1
description:
fields: [{'field_id': 100, 'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'params': {}, 'is_primary': True}, {'field_id': 101, 'name': 'vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 2}}]
aliases: []
collection_id: 450687678279804785
consistency_level: 2
properties: {}
num_partitions: 1
enable_dynamic_field: False
Number of entities: 10
which struck me as odd, because there were only 2 vectors in the db.
However, if I run `test_milvus.py’ again, the number of entities goes up to 20, even though no new vectors have been added:
$ python test_milvus.py
$ python milvus_info.py
Database: default
Collection: test_collection
collection_name: test_collection
auto_id: False
num_shards: 1
description:
fields: [{'field_id': 100, 'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'params': {}, 'is_primary': True}, {'field_id': 101, 'name': 'vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 2}}]
aliases: []
collection_id: 450687678279804785
consistency_level: 2
properties: {}
num_partitions: 1
enable_dynamic_field: False
Number of entities: 20
This happens even though I’ve only tried to add records that were already there. I would’ve expected num_entities
to be 10, no matter how many time I run these files. The documentation says it returns the number of rows, but I can drive it arbitrarily high while still having only 10 rows. Is num_entities supposed to track all rows that ever existed???