The following code simulates a simple pipeline ingesting a CSV file and converting it to TFRecord.
You can also see the corresponding notebook: https://colab.research.google.com/drive/1GEytZjnNZZ7r_f9QQ9FbauohKNLGSooC?usp=sharing
output_config = example_gen_pb2.Output(split_config=
example_gen_pb2.SplitConfig(splits=[
example_gen_pb2.SplitConfig.Split(name='train', hash_buckets=8),
example_gen_pb2.SplitConfig.Split(name='eval', hash_buckets=2)
])
)
example_gen = CsvExampleGen(
input_base='data',
output_config=output_config
)
pipeline_root = 'artifacts'
pipeline = Pipeline(
pipeline_name='testing pipeline',
pipeline_root=pipeline_root,
components=[example_gen],
enable_cache=True,
metadata_connection_config=metadata.sqlite_metadata_connection_config(
os.path.join('artifacts', 'metadata.sqlite')
)
)
LocalDagRunner().run(pipeline)
I have manually verified that the TFRecord have been properly generated. However, the pipeline’s outputs dictionaries is empty.
print(pipeline.outputs)
# output: {}
This problem persists in both a .ipynb notebook and .py file.
Interestingly enough, InteractiveContext
does not have this problem.
Does anyone know what is causing this?