What I have done in AWS:
I have a datalake in AWS. I use AWS Glue Crawlers to get the schema of data (.parquet format) in different partitions (e.g. year=2000/month=1/day=1/file1.parquet, year=2000/month=1/day=1/file2.parquet) and populate a table in a database in AWS Glue Catalog.
I use AWS Athena to query the table in the database using pyathena and use this data locally for some processing.
What I want to do:
I want to replicate the above in Azure. I can create an Azure Synapse Workspace and add a lake database and specify the column partitions when I add the tables. I run the SQL commands within Azure Synapse Workspace and everything works. However, I cannot query the database and table using a python script and have the result saved in a dataframe.
I have looked into pyodbc and tried to run the following code:
connection_string = f'DRIVER={driver};SERVER={server};PORT=1433;DATABASE={database};UID={username};PWD={password}'
try:
# Connect to the database
conn = pyodbc.connect(connection_string)
print("Connection successful!")
cursor = conn.cursor()
query = "SELECT TOP 10 * FROM dbo.tablename"
cursor.execute(query)
rows = cursor.fetchall()
if rows:
columns = [column[0] for column in cursor.description]
df = pd.DataFrame.from_records(rows, columns=columns)
print(df)
else:
print("No results found for the query.")
cursor.close()
conn.close()
except Exception as e:
print(f"An error occurred: {e}")
I get the following error:
An error occurred: (‘42000’, “[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Cannot find the CREDENTIAL ”https://.dfs.core.windows.net//year=/month=/day=/.parquet’,’, because it does not exist or you do not have permission. (15151) (SQLExecDirectW)”)
Can anyone help me with this problem? Since I am very new to Azure, it will be extremely helpful if you could guide from the beginning (maybe some steps I missed during the creation of Azure Synapse workspace and the lake database).