I am currently using pymilvus 2.4.3 and my data contains sparse vector.
I am currently using client.insert() but it has a 64mb rpc limit. I split my ~115GB data table into 1750 files using pyspark to a location on databricks and upload file by file. However, this takes about 1 minute per file, which means 1750 minutes will take a whopping 29 hours!
How do I insert my data to a collection faster? I know there is spark-milvus connector but it currently does not support sparse vector.
I also saw there was do_bulk_insert but i kept getting an error that says
- taskID : 450235310995975498,
- state : Failed,
- row_count : 0,
- infos : {'failed_reason': 'typeutil.GetDim should not invoke on sparse vector type', 'progress_percent': '0'},
- id_ranges : [],
- create_ts : 2024-06-04 15:44:37
>
I think this may be a bug, not sure why the bulk insert process is looking for a dimension value for sparse vector type.
Thanks!