Let me say I have twenty 5*5 tensors. How shall I create a dataset with batchsize = 20?
After reading a lot of posts I think the most possible solution is:
Step 1.Use tf.dataset.from_tensors to create 20 datasets, each containing one 5*5 tensor.
Step 2.Use tf.dataset.zips to zip 20 datasets to one large dataset.
Step 3.Do not Claim batch_size=20 in model.fit, since from official document it’s written “Do not specify the batch_size if your data is in the form of datasets, generators, or keras.utils.PyDataset instances (since they generate batches)”
I do not see how the batches are generated from steps above. Is it because the shape of the dataset (2055) implies the minibatch size should be 20, which is equal to the first argument?
If the following statement is correct. Let me say I want to reduce batchsize to 10. All I need to do is first zip two datasets 10 times after Step. 1, then zip the new 10 datasets as the final dataset. Is this correct?
What’s more, I tried to use tf.dataset.batch(20) to apply the big dataset after Step 2. After I tf.print the final dataset before and after this batch command, what output I get is:
Before: Zipped Dataset (array(shape(55), array(shape(55)), …)
After: Batched Dataset (array(shape(none55), array(shape(none55)), …)
The shape is different. After output the value I notice the actual shape after applying tf.dataset.batch(20) becomes 1*5*5 for each array.
Is this command tf.dataset.batch(20) useless in my case or shall I use it after Step 2?
The official document only uses one example which is dimensionless array. So I don’t know how does this command work towards higher rank tensors.