I have a zip file containing images uploaded to my storage account (ADLSv2)
storage acc: samplesa
container: samplecontainersa
data1: /folder1/sample1.exe
data2: /folder1/sample2.zip
I need to now read this zip, extract all sample images into a Pyspark dataframe in my synapse environment.
Following is my code:
import zipfile
from pyspark.sql.functions import map_zip_with
zip_path = "abfss://[email protected]/folder1/sample2.zip"
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
file_list = zip_ref.namelist()
image_files = [f for f in file_list if f.lower().endswith(('.jpg', '.jpeg', '.png'))]
image_data = [(f, zip_ref.read(f)) for f in image_files]
df = spark.createDataFrame(image_data, ["filename", "image_bytes"])
df.show()
However, I am getting this following error:
No such file or directory: 'abfss://[email protected]/folder1/sample2.zip'
I can read other csv/txt files in the same directory, however only having issues with accessing exe and zip. Any thoughts? Thanks!