Inspired by https://keras.io/examples/vision/captcha_ocr/ I tried to build a similar CaptchaAI that can solve 6 instead of 5 characters (as in the linked example).
The whole model and training data you can find here:
https://colab.research.google.com/drive/1VHwobGcn-n_rr6mmq4tEicGXuruY5V9a?usp=sharing
THE ISSUE:
The weird thing is that I can export the prediction_model (which is different from the model, because it does not need the labels as input, which the Custom CTCLayer needs to calculate the loss function), but when loading the prediction_model, it performs very badly, as if it was never trained at all. This happens only if I restart the kernel before loading the model. If I use the same kernel as the one that was used to train the model, it still performs great.
Then I tried to export the whole model, including the CTCLayer, and cut the CTCLayer off after import. This did not work either since I cannot get the CTCLayer to be included in the .keras file correctly.
For help I already looked at this article and tried to modify the CTCLayer to be recognized as custom layer and therefore be included in the config:
https://keras.io/guides/serialization_and_saving/#config_methods
So the model itself and the prediction_model perform great, the only problem is exporting them to a file. I cannot even get it into the .keras format, my preferred format would be onnx.
If you want to have a glance at the Model definition without looking at the colab notebook link above, here:
@register_keras_serializable(package='Custom', name='CTCLayer')
class CTCLayer(layers.Layer):
def __init__(self, name=None, **kwargs):
super(CTCLayer, self).__init__(name=name)
self.loss_fn = keras.backend.ctc_batch_cost
super(CTCLayer, self).__init__(**kwargs)
def get_config(self):
config = super(CTCLayer, self).get_config()
return config
def call(self, y_true, y_pred):
# Compute the training-time loss value and add it
# to the layer using `self.add_loss()`.
batch_len = tf.cast(tf.shape(y_true)[0], dtype="int64")
input_length = tf.cast(tf.shape(y_pred)[1], dtype="int64")
label_length = tf.cast(tf.shape(y_true)[1], dtype="int64")
input_length = input_length * tf.ones(shape=(batch_len, 1), dtype="int64")
label_length = label_length * tf.ones(shape=(batch_len, 1), dtype="int64")
loss = self.loss_fn(y_true, y_pred, input_length, label_length)
self.add_loss(loss)
# At test time, just return the computed predictions
return y_pred
def build_model():
# Inputs to the model
input_img = layers.Input(
shape=(img_width, img_height, 1), name="image", dtype="float32"
)
labels = layers.Input(name="label", shape=(None,), dtype="float32")
# First conv block
x = layers.Conv2D(
32,
(3, 3),
activation="relu",
kernel_initializer="he_normal",
padding="same",
name="Conv1",
)(input_img)
x = layers.MaxPooling2D((2, 2), name="pool1")(x)
# Second conv block
x = layers.Conv2D(
64,
(3, 3),
activation="relu",
kernel_initializer="he_normal",
padding="same",
name="Conv2",
)(x)
x = layers.MaxPooling2D((2, 2), name="pool2")(x)
# We have used two max pool with pool size and strides 2.
# Hence, downsampled feature maps are 4x smaller. The number of
# filters in the last layer is 64. Reshape accordingly before
# passing the output to the RNN part of the model
new_shape = ((img_width // 4), (img_height // 4) * 64)
x = layers.Reshape(target_shape=new_shape, name="reshape")(x)
x = layers.Dense(64, activation="relu", name="dense1")(x)
x = layers.Dropout(0.2)(x)
# RNNs
x = layers.Bidirectional(layers.LSTM(128, return_sequences=True, dropout=0.25))(x)
x = layers.Bidirectional(layers.LSTM(64, return_sequences=True, dropout=0.25))(x)
# Output layer
x = layers.Dense(
len(char_to_num.get_vocabulary()) + 1, activation="softmax", name="dense2"
)(x)
# Add CTC layer for calculating CTC loss at each step
output = CTCLayer(name="ctc_loss")(labels, x)
# Define the model
model = keras.models.Model(
inputs=[input_img, labels], outputs=output, name="ocr_model_v1"
)
# Optimizer
opt = keras.optimizers.Adam()
# Compile the model and return
model.compile(optimizer=opt)
return model
# Get the model
model = build_model()
model.summary()