I’m trying to solve binary segmentation problem where I need to classify trees from background. My function for data preprocessing is quite simple and looks like this:
def get_data(a, path, IMG_HEIGHT=256, IMG_WIDTH=256, IMG_CHANNELS=3):
out = np.zeros((len(a), IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS), dtype=np.uint8)
for i, image_id in enumerate(a):
path_image = path + image_id
image = np.array(Image.open(path_image))
image = resize(image, (IMG_HEIGHT, IMG_WIDTH), mode='constant', preserve_range=True)
if image.shape[-1] == IMG_WIDTH:
image = np.expand_dims(image, axis=-1)
out[i] = image
return out
Images are in .tif
format.
X_train = np.zeros((len(img_list), IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS), dtype=np.uint8)
Y_train = np.zeros((len(mask_list), IMG_HEIGHT, IMG_WIDTH, 1), dtype=bool)
X_test = np.zeros((len(img_test), IMG_HEIGHT, IMG_WIDTH, IMG_CHANNELS), dtype=np.uint8)
X_train = get_data(img_list, "/content/train/images/")
Y_train = get_data(mask_list, "/content/train/gt/", IMG_CHANNELS=1)
X_test = get_data(img_test, "/content/public_test/images/")
Data shape:
X_train.shape, Y_train.shape, X_test.shape
((2828, 256, 256, 3), (2828, 256, 256, 1), (707, 256, 256, 3))
At first all of my data was of type uint8
. However, I was getting this error when calling model.fit(...)
:
TypeError: Input 'y' of 'Mul' Op has type uint8 that does not match type float32 of argument 'x'.
But as soon as I changed my data to np.float32
my Google Colab session was crashing when training. I was wondering if I could change model dtype from float32
to something occupying less space like float16
and check if it will help. I tried to do it like this:
a = model.get_config()
for layer in a['layers']:
layer['config']['dtype']='float16'
model = model.from_config(a)
model.compile(optimizer='adam', loss='binary_crossentropy')
When I tried to train the model the output was as this (RAM
kept growing slowly):
Epoch 1/100
160/160 [==============================] - ETA: 0s - loss: nan
Epoch 1: val_loss did not improve from inf
160/160 [==============================] - 73s 180ms/step - loss: nan - val_loss: nan
Epoch 2/100
160/160 [==============================] - ETA: 0s - loss: nan
Epoch 2: val_loss did not improve from inf
160/160 [==============================] - 19s 121ms/step - loss: nan - val_loss: nan
Epoch 3/100
45/160 [=======>......................] - ETA: 13s - loss: nan
So my question is: why the loss is always nan
and what are other approaches to the problem?
Thanks in advance.