How does Django randomly skip days when trying to bulk_create
? The dataset is a CSV exported via a third party tool from a MSSQL Server.
Exhibit 1 – Data from my database:
Note the whole 2024-06-13 is missing.
Exhibit 2 – Data dump from my dataframe:
When I look at my pandas dataframe, there was data for 2024-06-13. So reading the CSV and parsing the date works.
-
At first, I thought the issue was using too much memory to
bulk_create
so I tried chunking. The problem still remained. But if it was a memory problem, then it wouldn’t so cleanly eliminate that day without affecting the other days around it. The session start/stop times correspond with when the shop opens and closes on the 12th and 14th. -
It’s not the only day that randomly disappeared. There are other days before this as well that have vanished. Also, the last possible import date was 2024-06-24. After that, it won’t import any more sessions that exists in my dataframe. I tried both SQLite and Postgres to no avail in case it was a database issue.
This is how it’s imported from my dataframe via DjangoORM:
sessions = [Session(**row) for row in df.to_dict(orient='records')]
objs = Session.objects.bulk_create(
sessions, batch_size=900,
# update_conflicts=True,
# unique_fields=['session_number'],
# update_fields=['tax_invoice_number']
)
I removed update_conflicts
to allow it to throw an Error
if there was conflicting keys, but it didn’t.
For reference, just to show that it’s in my dataframe when I dump it out to Google Sheets
Does anyone have any idea why some days just don’t get written to the database?