I am working locally, building a spark_session using :
builder = SparkSession.builder
.master("local[8]")
.appName("test_app")
.config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.4")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.session.timeZone", "UTC")
spark = configure_spark_with_delta_pip(builder, extra_packages=["org.apache.hadoop:hadoop-aws:3.3.4"])
.enableHiveSupport()
.getOrCreate()
using the following to try reading the delta table :
spark.read.load(“s3://<delta_table_path>”)
(in the future, i might try to read other kind of files but I did not find any specificities to access s3 depending on the files i want to read, as long as I use the right pyspark method)
I have no credential issues as I am able to connect to s3 using boto3 to list files, for instance. I only get the “No FileSystem for scheme “s3″” error.
I tried downloading the jar manually, and put it into the venv/lib/site-packages/pyspark/jars folder manually.
I installed hadoop-aws 3.3.4 as the other hadoop jars installed in the pyspark/jars folder were in 3.3.4,
and also tring manually replace it with downloaded 3.3.5 jar, as my local hadoop version is 3.3.5. Same result
Hope you’ll know how to deal with this !!
Regards
Julie LACHAUD is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.