I’m trying to load a spark dataframe via petastorm 0.12 following the tutorial given in the petastorm-spark-converter-tensorflow notebook. Essentially my code is the following. The error described in the title is raised in the with
statement. (Doesn’t happen when directly creating a TFDatasetContextManager via train_context_manager = converter_train.make_tf_dataset(BATCH_SIZE)
though.
from petastorm import TransformSpec
from petastorm.spark import make_spark_converter
spark.conf.set(SparkDatasetConverter.PARENT_CACHE_DIR_URL_CONF, "file:///dbfs/tmp/petastorm/cache"
converter_train = make_spark_converter(DF_TRAIN)
with converter_train.make_tf_dataset(BATCH_SIZE) as X_train:
pass
The dataset definitely isn’t empty. I also tried to apply a TransformSpec explicitly selecting my target columns
with converter_train.make_tf_dataset(
BATCH_SIZE,
transform_spec=TransformSpec(selected_fields=[TRAIN_COL])
) as X_train:
Btw the same happens with converter_train.make_torch_dataloader