I’m beginner in image processing. I have two binary classes as subdirectory which total of 496 images and I have an issue with the last batch that has remainder of 13 images. So, instead of tf.dataset tensor (32, 300, 300, 3), there’s (16, 300, 300, 3) in the last batch. Actually, I noticed:
- after shuffle it contains 13 batches
- after batching it produces only 1 batch (I assume it is the remainder batch)
- when drop_remainder the data is empty
Why does it left with only 1 batch after shuffle?
image_size = (300, 300)
batch_size = 32
train_dataset = image_dataset_from_directory(
dataset_dir,
image_size=(image_size[0], image_size[1]),
batch_size=batch_size,
label_mode="binary",
validation_split=0.2,
subset="training",
seed=123,
)
train_dataset = train_dataset.shuffle(1000)
train_dataset = train_dataset.batch(
batch_size=batch_size, drop_remainder=True
).prefetch(buffer_size=AUTOTUNE)
print(train_dataset.cardinality().numpy())
1
When you create train_dataset
, it’s already batched into size 32
because you specified batch_size=32
in image_dataset_from_directory()
When you call
.batch()
again on an already batched dataset, it tries
to batch the batches, which effectively creates batches of batches
Code Correction
train_dataset = image_dataset_from_directory(
dataset_dir,
image_size=image_size,
batch_size=batch_size,
label_mode="binary",
validation_split=0.2,
subset="training",
seed=123,
)
train_dataset = train_dataset.shuffle(1000).prefetch(buffer_size=AUTOTUNE)