My delete_ids
variable contains 8,000 unique conversation IDs. I am trying to delete these 8,000 conversation IDs from my SQL table one at a time in a loop. I cannot delete all 8,000 IDs at once because some of them may not be valid conversation IDs; they might be something else. Therefore, I have written a try-except block to skip IDs that cannot be deleted or cause errors. However, this process is taking almost 40 minutes. The count in my delete_ids
list can vary, going up or down. Is there a more efficient way to improve the performance of my code?
failed_deletions = []
if len(matching_records_sql) < len(matching_records_df):
delete_ids = matching_records_sql['CONVERSATIONID'].tolist()
if delete_ids:
delete_query = "DELETE FROM MY_TABLE WHERE CONVERSATIONID = ?"
conn = pyodbc.connect(r'DRIVER={ODBC Driver 17 for SQL Server};'
r'SERVER=My_Server_Name;'
r'DATABASE=Database_Name;'
r'Trusted_Connection=yes;')
cursor = conn.cursor()
for id in delete_ids:
try:
cursor.execute(delete_query, (id,))
conn.commit()
except Exception as e:
print(f"Failed to delete CONVERSATIONID {id}: {e}")
failed_deletions.append(id)
#conn.close()
print('List of deleted ids from SQL table: ', [id for id in delete_ids if id not in failed_deletions])
print('List of IDs that failed to delete: ', failed_deletions)
else:
print('No IDs to delete.')
else:
print('matching_records_sql is not < matching_records_df')
Mr Bajaj is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1
In this case, you are performing 8000 individually and commiting after each one. This generates 8000 individual commits.
The ideal way you would loop through the CONVERSATIONID and filter them appropriately (checking constraint or something else). At this point you know which CONVERSATIONID will be deleted or not.
Then you delete them all at once, with a DELETE FROM MY_TABLE WHERE CONVERSATIONID IN ?
Also, depending on FK checks and indexes it could improve your query but just moving from individual queries to an aggregated one you should see massive performance improvements.