Thiết kế website giá rẻ

Question

I’ve written a Python program to read a large number of rows from a table in an Azure SQL Server database, clean them and save them to another table using pyodbc. Here’s the code sample:

<code>def getStgDataFromDB(conn, dbtbl, step, refloaddatetime, countyfld, countyval):

if not step in ["clean", "match"]:

logging.info(f"Invalid step value: {step}")

return(None)

readCursor = conn.cursor()

sqlstmt = f"SELECT * FROM {dbtbl} " +

f"WHERE loaddatetime = CAST(? AS DATETIME) " +

f"AND LOWER({countyfld}) = ? " +

f"ORDER BY loaddatetime, {countyfld}"

results = readCursor.execute(sqlstmt, refloaddatetime, countyval.lower())

return(results)

cursor_output = getStgDataFromDB(conn, srctbl, step, refloaddatetime, countyfld, countyval)

tblRows = cursor_output.fetchmany(FETCH_SIZE)

while len(tblRows):

#Describe the table and get column names

cols = [cursor_output.description[idx][0] for idx, elem in enumerate(cursor_output.description)]

tblDataStg = pd.DataFrame(np.array(tblRows), columns=cols)

....

#As part of the cleaning process, tblDataStg data is saved in tblDataClean

saveResults = tblDataClean.apply(savefunc, args=(conn,), axis=1)

conn.commit()

tblRows = cursor_output.fetchmany(FETCH_SIZE)

</code>

<code>def getStgDataFromDB(conn, dbtbl, step, refloaddatetime, countyfld, countyval): if not step in ["clean", "match"]: logging.info(f"Invalid step value: {step}") return(None) readCursor = conn.cursor() sqlstmt = f"SELECT * FROM {dbtbl} " + f"WHERE loaddatetime = CAST(? AS DATETIME) " + f"AND LOWER({countyfld}) = ? " + f"ORDER BY loaddatetime, {countyfld}" results = readCursor.execute(sqlstmt, refloaddatetime, countyval.lower()) return(results) cursor_output = getStgDataFromDB(conn, srctbl, step, refloaddatetime, countyfld, countyval) tblRows = cursor_output.fetchmany(FETCH_SIZE) while len(tblRows): #Describe the table and get column names cols = [cursor_output.description[idx][0] for idx, elem in enumerate(cursor_output.description)] tblDataStg = pd.DataFrame(np.array(tblRows), columns=cols) .... #As part of the cleaning process, tblDataStg data is saved in tblDataClean saveResults = tblDataClean.apply(savefunc, args=(conn,), axis=1) conn.commit() tblRows = cursor_output.fetchmany(FETCH_SIZE) </code>

def getStgDataFromDB(conn, dbtbl, step, refloaddatetime, countyfld, countyval):
    if not step in ["clean", "match"]: 
        logging.info(f"Invalid step value: {step}")
        return(None)

    readCursor = conn.cursor()
    sqlstmt = f"SELECT * FROM {dbtbl} " + 
              f"WHERE loaddatetime = CAST(? AS DATETIME) " + 
              f"AND LOWER({countyfld}) = ? " + 
              f"ORDER BY loaddatetime, {countyfld}"
    results = readCursor.execute(sqlstmt, refloaddatetime, countyval.lower())
    return(results)

cursor_output = getStgDataFromDB(conn, srctbl, step, refloaddatetime, countyfld, countyval)
tblRows = cursor_output.fetchmany(FETCH_SIZE)
while len(tblRows):
   #Describe the table and get column names
   cols = [cursor_output.description[idx][0] for idx, elem in enumerate(cursor_output.description)]
   tblDataStg = pd.DataFrame(np.array(tblRows), columns=cols)
   ....
   #As part of the cleaning process, tblDataStg data is saved in tblDataClean
   saveResults = tblDataClean.apply(savefunc, args=(conn,), axis=1)
   conn.commit()
   tblRows = cursor_output.fetchmany(FETCH_SIZE)

I have set FETCH_SIZE = 5 for the sake of testing. savefunc() contains code to open a separate cursor and insert data into the destination server using pyodbc’s execute() function.

If the commit() function is called while iterating through the cursor as shown above, I get an ODBC error ‘[HY010] [Microsoft][ODBC Driver 18 for SQL Server]Function sequence error (0) (SQLFetch)’.

The code works great as long as the commit() function is called outside the while loop. But, doing so means committing a thousands of rows, possibly more than 100K in some instances. My fear is that committing such a large number of rows in a single transaction will cause a performance issue.

How can I optimize the read-clean-save process and avoid the ODBC error?

Thiết kế website giá rẻ

Danh mục

pyodbc: Function sequence error when using fetchmany