I’ve written a Python program to read a large number of rows from a table in an Azure SQL Server database, clean them and save them to another table using pyodbc. Here’s the code sample:
def getStgDataFromDB(conn, dbtbl, step, refloaddatetime, countyfld, countyval):
if not step in ["clean", "match"]:
logging.info(f"Invalid step value: {step}")
return(None)
readCursor = conn.cursor()
sqlstmt = f"SELECT * FROM {dbtbl} " +
f"WHERE loaddatetime = CAST(? AS DATETIME) " +
f"AND LOWER({countyfld}) = ? " +
f"ORDER BY loaddatetime, {countyfld}"
results = readCursor.execute(sqlstmt, refloaddatetime, countyval.lower())
return(results)
cursor_output = getStgDataFromDB(conn, srctbl, step, refloaddatetime, countyfld, countyval)
tblRows = cursor_output.fetchmany(FETCH_SIZE)
while len(tblRows):
#Describe the table and get column names
cols = [cursor_output.description[idx][0] for idx, elem in enumerate(cursor_output.description)]
tblDataStg = pd.DataFrame(np.array(tblRows), columns=cols)
....
#As part of the cleaning process, tblDataStg data is saved in tblDataClean
saveResults = tblDataClean.apply(savefunc, args=(conn,), axis=1)
conn.commit()
tblRows = cursor_output.fetchmany(FETCH_SIZE)
I have set FETCH_SIZE = 5 for the sake of testing. savefunc() contains code to open a separate cursor and insert data into the destination server using pyodbc’s execute() function.
If the commit() function is called while iterating through the cursor as shown above, I get an ODBC error ‘[HY010] [Microsoft][ODBC Driver 18 for SQL Server]Function sequence error (0) (SQLFetch)’.
The code works great as long as the commit() function is called outside the while loop. But, doing so means committing a thousands of rows, possibly more than 100K in some instances. My fear is that committing such a large number of rows in a single transaction will cause a performance issue.
How can I optimize the read-clean-save process and avoid the ODBC error?