Working on my free trial azure account , I am trying to copy csv files to ADLS Gen2 and save the dataframe as table in adls silver layer.
code:
DForderItems = spark.read.csv(“abfss://[email protected]/retailfiles/orderItems.csv”,header=False,schema=schema)
**
I am able to read csv file into DForderitems , but the trickiest part is i am unable to save it as table in given path as below.
DForderItems**.write.option(“path”,”abfss://[email protected]/retailfiles/orderItems”).option(“mergedschema”, True).mode(“append”).saveAsTable(“retail.orderItems”)
**
Error : **
[NO_PARENT_EXTERNAL_LOCATION_FOR_PATH] No parent external location was found for path ‘abfss://[email protected]/retailfiles/orderItems’. Please create an external location on one of the parent paths and then retry the query or command again.
File , line 2
- I tried creating a table in external location using sql
%sql
CREATE EXTERNAL LOCATION silv_layer
URL ‘abfss://[email protected]/retailfiles/’
WITH (CREDENTIAL (STORAGE_ACCOUNT_KEY = ‘GlGLL4o2tXZUawz0CqVgguhKGsAN2YLQRIUs56yw8PHTw8zYQIWc2+gWFojXFWWo/puH/Q2e/t6B+AStOAyWig==’));
Still got this error :
[PARSE_SYNTAX_ERROR] Syntax error at or near ‘LOCATION’. SQLSTATE: 42601
- I tried creating a database and writing table into it
#spark.sql
(“CREATE DATABASE IF NOT EXISTS retail LOCATION ‘abfss://[email protected]/retailfilessilver'”)
DForderItems.write.option(“path”,”abfss://[email protected]/retailfiles/orderItems”).option(“mergeschema”, True).mode(“append”).saveAsTable(“retail.orderItems”)
But i got another error
[NO_PARENT_EXTERNAL_LOCATION_FOR_PATH] No parent external location was found for path ‘abfss://[email protected]/retailfiles/orderItems’. Please create an external location on one of the parent paths and then retry the query or command again.
File , line 2
azuredataengineer89 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I have received the same ERROR below are the ERROR details:
Error[NO_PARENT_EXTERNAL_LOCATION_FOR_PATH] No parent external location was found for path
‘abfss://[email protected]/Customer.csv.
Please create an external location on one of the parent paths and then retry the query or command again.
The above ERROR indicates that the system cannot find the specified path
I have tried the below approach:
I have mounted my ADLS using the below script:
configs = { 'fs.azure.account.auth.type': 'OAuth', 'fs.azure.account.oauth.provider.type': 'org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider', 'fs.azure.account.oauth2.client.id': '< YOUR CLIENT ID> ', 'fs.azure.account.oauth2.client.secret': dbutils.secrets.get(scope='dbxsecretscope', key='kvsecretname'), 'fs.azure.account.oauth2.client.endpoint': 'https://login.microsoftonline.com/< YOUR TENANT ID >/token' }
dbutils.fs.mount(source='abfss://[email protected]/', mount_point='/mnt/raw', extra_configs=configs)
Note: When you are mounting your ADLS to azure databricks you will need to add Storage Blob Data Contributor role to you ADLS & Key Vault Administrator role to Key Vault.
The below will let you to list your mount point:
display(dbutils.fs.mounts())
Know more about how to mount ADLS to Azure databricks using the SPN,Azure Key Vault
“Next, I created an external table from a single CSV file in Azure Databricks as an example.”
%sql
CREATE TABLE IF NOT EXISTS hive_metastore.default.cust USING csv OPTIONS (path "/mnt/raw/new/Customer.csv", inferSchema=True, header=True)
Results:
df = spark.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("dbfs:/mnt/raw/new/Customer.csv")
df.show()
+--------------+
| Col1|
+--------------+
|123,456@1234_1|
+--------------+