I have a pandas dataframe named outliers
, which contains a list of people with IDs.
I need to connect to SnowFlake database to pull some fields on these members. Is it possible to pass pandas dataframe name through the sql query to achieve this?
This is my logic:
query = f"""
select a.*, b.name
from {outliers} as a
inner join snowflake_table as b on a.mbr_id = b.mbr_id
"""
query_result = pandas.read_sql(sql=query, con=con)
I am getting an error message saying “Syntax error line 5 at position 37 unexpected …” (one of variable names).
Any recommendations?
You cannot pass a pandas dataframe as a parameter for a table. You can use dataframe.to_sql() to insert (create) a table then use the newly created table in your query:
outliers.to_sql('your_table_name', con=con, if_exists=<your_parmeter>, index=False)
query = f"""
select a.*, b.name
from 'your_table_name' as a
inner join snowflake_table as b on a.mbr_id = b.mbr_id
"""
query_result = pandas.read_sql(sql=query, con=con)
1
You’re trying to pass an entire DataFrame as text into a query string. Thats not how that works. You have a couple options to join these two tables together:
1.) You can get the sql table with pd.read_sql() and do the subsequent joining logic in pandas. This is probably the best way unless the data in the table is way too big to happen client_side.
2.) If you REALLY need this computation to happen in SQL: You can add a table in SQL and populate it with the DataFrame before joining to that table using .to_sql(). If you’re doing this often you can have a SQL table just for DataFrames and use the if_exists = ‘replace’ kwarg to replace the data every time.
If you don’t have access to create a table in SQL this becomes a bigger issue. One way you could finagle this would be to get the large SQL db in chunks with the chunksize kwarg in read_sql(). You’d then need to loop through the chunks and create an iterable of smaller joined tables to concatenate. Depending on your joining logic this may be a non-trivial task.