I have a problem that all metadata for my dataset is stored in MongoDB and I have to read it, validate it, then convert it into a TensorFlow dataset (plus do some preprocessing optionally). I looked into tfio.experimental.mongodb.MongoDBIODataset, but unfortunately you cannot specify a query inside of it. I’m currently tring tf.py_function, but it has some strong limitations. I was wondering if there’s any better way to load documents efficiently.
Currently I’m trying the following with tf.py_function:
`dataset = tf.data.Dataset.range(len(doc_ids))
dataset = (
dataset
.map(
lambda idx: tf.py_function(filtered_map, inp=[idx], Tout=flatten_signature_dict(output_types)),
num_parallel_calls=tf.data.AUTOTUNE,
)
`