I have a bunch of spectograms that I want to feed into a CNN. All the images have a height of 400 and a width of 1000 and look like this:
spectogram
I only want to feed the actual spectogram and not the white parts or the axis description to the CNN so I thought i crop it using the preprocess funtion of the keras ImageDataGenerator.
This is what I came up with:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os
path_images = os.path.join('..', 'Datasets', 'spectrums')
def crop_image(image):
cropped_image = tf.image.crop_to_bounding_box(
image,
offset_height = 35,
offset_width = 89,
target_height = 307,
target_width = 717
)
return cropped_image
image_generator = ImageDataGenerator(
rescale=1.0/255,
preprocessing_function=crop_image
)
train_generator = image_generator.flow_from_dataframe(
dataframe=data_train,
directory=path_images,
x_col="spectrum_filename",
y_col="simplified_style",
target_size=(400, 1000),
batch_size=32,
class_mode="categorical",
color_mode="rgb" # add color mode
)
I wanted to train my CNN with this image generator but I got the following error:
ValueError: could not broadcast input array from shape (307,717,3) into shape (400,1000,3)
I thought that I have to adjust target_size
to the cropped size, but that just gives me another error:
ValueError: width must be >= target + offset.
All the images have the same size of 400, 1000. I don’t really know what I’m doing wrong. Is the preprocess_function
not meant for cropping?
My CNN architecture looks like this (I plan on enhancing it once I have the basic data loading done):
cnn_model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(307, 717, 3)),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Conv2D(128, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(6, activation='softmax') # 6 output neurons for 6 classes
])
4