I’m looking to find a more efficient way to execute SQL queries within Python. My current process seems optimal when running the query, however the conversion process to a data frame appears to be inefficient given the execution times.
Below is is a rough framework of my code. I apologize for not providing a fully reproducible example, as a connection to Oracle is needed.
import pandas as pd
import pyodbc
import time
login_info = f"""
Driver={{IBM DB2 ODBC DRIVER - IBMDBCL1}};
HostName=xxx;
Port=xxx;
Security=xxx;
Database=xxx;
UID={xxx};
PWD={xxx};
"""
conn = pyodbc.connect(login_info)
cursor = conn.cursor()
def execute_sql(query):
query_start_time = time.time()
cursor.execute(query)
cols = [c[0] for c in cursor.description]
cols = list(map(lambda x: x.lower(), cols))
query_rt = "--- Query Runtime: %s seconds ---" % (time.time() - query_start_time)
df = pd.DataFrame.from_records(cursor.fetchall(), columns=cols)
total_rt = "--- Total Runtime: %s seconds ---" % (time.time() - query_start_time)
print(query_rt)
print(total_rt)
return df
loss_query = '''
select * from xxx limit 1
'''
loss_df = execute_sql(loss_query)
When running my query that is not specified, the Query Runtime = 99 seconds and the Total Runtime = 359 seconds. It seems odd to me that converting the cursor to a dataframe takes 2.6x longer than it takes to execute the query, but maybe that’s standard?