Problem Description
I’m encountering an issue while training an LSTM model with a custom loss function in TensorFlow/Keras. The model architecture involves multiple LSTM layers followed by a Dense layer with softmax activation. I’ve implemented a custom loss function (CategoricalCustomLoss) to handle specific class weighting, but I’m getting a “ValueError: No gradients provided for any variable” during training.
Code snippet
class CategoricalCustomLoss(Loss):
def call(self, y_true, y_pred):
y_pred_classes = tf.argmax(y_pred, axis=-1)
y_true_classes = tf.argmax(y_true, axis=-1)
long_mask = tf.cast(tf.equal(y_true_classes, 2), tf.float32)
short_mask = tf.cast(tf.equal(y_true_classes, 0), tf.float32)
correct_long_pred_mask = tf.cast(tf.equal(y_pred_classes, 2), tf.float32)
correct_short_pred_mask = tf.cast(tf.equal(y_pred_classes, 0), tf.float32)
long_score = tf.reduce_sum(tf.subtract(2.0 * long_mask, correct_long_pred_mask))
short_score = tf.reduce_sum(tf.subtract(2.0 * short_mask, correct_short_pred_mask))
total_true = tf.reduce_sum(long_mask) + tf.reduce_sum(short_mask)
score = long_score + short_score
loss = 1.0 - (score / total_true)
return loss
...
opt = Adam(learning_rate=self.params.learning_rate)
model.compile(optimizer=opt, loss=CategoricalCustomLoss(), metrics=['accuracy'])
model.fit(x=training_gen,steps_per_epoch=int(len(training_gen))//self.params.batch_size,epochs=self.params.epochs,use_multiprocessing=True,validation_data=testing_gen,validation_steps=int(len(testing_gen))//self.params.batch_size)
And this code return this error:
ValueError: No gradients provided for any variable: (['lstm/lstm_cell/kernel:0', 'lstm/lstm_cell/recurrent_kernel:0', 'lstm/lstm_cell/bias:0', 'lstm_1/lstm_cell_1/kernel:0', 'lstm_1/lstm_cell_1/recurrent_kernel:0', 'lstm_1/lstm_cell_1/bias:0', 'lstm_2/lstm_cell_2/kernel:0', 'lstm_2/lstm_cell_2/recurrent_kernel:0', 'lstm_2/lstm_cell_2/bias:0', 'dense/kernel:0', 'dense/bias:0'],). Provided `grads_and_vars` is ((None, <tf.Variable 'lstm/lstm_cell/kernel:0' shape=(543, 1024) dtype=float32>), (None, <tf.Variable 'lstm/lstm_cell/recurrent_kernel:0' shape=(256, 1024) dtype=float32>), (None, <tf.Variable 'lstm/lstm_cell/bias:0' shape=(1024,) dtype=float32>), (None, <tf.Variable 'lstm_1/lstm_cell_1/kernel:0' shape=(256, 1024) dtype=float32>), (None, <tf.Variable 'lstm_1/lstm_cell_1/recurrent_kernel:0' shape=(256, 1024) dtype=float32>), (None, <tf.Variable 'lstm_1/lstm_cell_1/bias:0' shape=(1024,) dtype=float32>), (None, <tf.Variable 'lstm_2/lstm_cell_2/kernel:0' shape=(256, 1024) dtype=float32>), (None, <tf.Variable 'lstm_2/lstm_cell_2/recurrent_kernel:0' shape=(256, 1024) dtype=float32>), (None, <tf.Variable 'lstm_2/lstm_cell_2/bias:0' shape=(1024,) dtype=float32>), (None, <tf.Variable 'dense/kernel:0' shape=(256, 3) dtype=float32>), (None, <tf.Variable 'dense/bias:0' shape=(3,) dtype=float32>)).
Any insights or suggestions on how to resolve this issue would be greatly appreciated!
Thank you in advance for your help!