I have read the few questions on this topic which were discussed in how to implement custom metric in keras? and How to calculate F1 Macro in Keras?.
The information on those pages and the Keras documentation was enough to help me implement the Cohen-Kappa_Score as a metric for my application, however there are a few draw backs in this implementation that I cannot overcome:
- The code only works if I set
tf.config.run_functions_eagerly(True)
- This code runs on CPU only while the model runs on GPU, significantly slowing down the execution.
- It also does not work if I set
os.environ["KERAS_BACKEND"] = "jax"
I also implemented the other option that was suggested in the previous post, by implementing a callback and calculating the cohen_kappa_score at the end of each epoch, but that required passing in the validation data and the screen was filled with thousands of rows showing 1/1 =================== 0s
. Even setting verbose=0
would not eliminate that nasty output. It also required calculating the metric with all the validation data after each epoch!! Effectively making that approach unworkable.
I am hoping someone can help me implement a version of my QwkMetric
that can run with normal settings.
As an FYI, I have located a version that is made for use by models designed for Ordinal Regression
applications that works extremely well. I am trying to write this version for Non-Ordinal Multi-class classification.
I am using:
tf version: 2.15.0
kears version: 2.15.0
numpy version: 1.25.2
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras.callbacks import ModelCheckpoint, Callback
from tensorflow.keras.metrics import Metric
from sklearn.metrics import cohen_kappa_score
import keras
from tensorflow.keras import backend as K
class QwkMetric(Metric):
def __init__(self, name='qwk', **kwargs):
super().__init__(name=name, **kwargs)
self.y_true = self.add_weight(name='y_true', shape=(0,), dtype=tf.int32)
self.y_pred = self.add_weight(name='y_pred', shape=(0,), dtype=tf.int32)
def update_state(self, y_true, y_pred, sample_weight=None):
# Flatten and cast y_true and y_pred to integer values
y_true = K.cast(K.reshape(y_true, [-1]), 'int32')
y_pred = K.cast(K.argmax(y_pred, axis=-1), 'int32')
# Concatenate the current batch's y_true and y_pred with the state variables
if(len(self.y_true) > 0):
self.y_true = tf.concat([self.y_true, y_true], axis=0)
else :
self.y_true = y_true
if(len(self.y_pred) > 0):
self.y_pred = tf.concat([self.y_pred, y_pred], axis=0)
else:
self.y_pred = y_pred
def result(self):
# Compute QWK using sklearn's cohen_kappa_score
y_true_np = K.get_value(self.y_true)
y_pred_np = K.get_value(self.y_pred)
qwk_score = cohen_kappa_score(y_true_np, y_pred_np, weights='quadratic')
return qwk_score
def reset_state(self):
# Clear the state
self.y_true = tf.zeros([0], dtype=tf.int32)
self.y_pred = tf.zeros([0], dtype=tf.int32)
qwk_metric = QwkMetric()
# Generate some synthetic data
num_samples = 1000
num_features = 20
num_classes = 5
X_train = np.random.random((num_samples, num_features)).astype(np.float32)
y_train = np.random.randint(0, num_classes, num_samples).astype(np.int32)
X_val = np.random.random((num_samples, num_features)).astype(np.float32)
y_val = np.random.randint(0, num_classes, num_samples).astype(np.int32)
train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train)).cache().batch(32).prefetch(tf.data.AUTOTUNE)
val_ds = tf.data.Dataset.from_tensor_slices((X_val, y_val)).cache().batch(32).prefetch(tf.data.AUTOTUNE)
# Create a simple model for testing
def create_model():
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation='relu', input_shape=(20,)),
tf.keras.layers.Dense(5, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=[qwk_metric])
return model
model = create_model()
# Define the ModelCheckpoint callback
checkpoint_cb = ModelCheckpoint('best_model.h5',
monitor='val_qwk',
mode='max',
save_best_only=True,
verbose=1)
# Train the model
history = model.fit(
train_ds,
epochs=5,
validation_data=val_ds,
callbacks=[checkpoint_cb],
verbose=1
)
# Print the QWK metric values
print("Final QWK on training data:", history.history['qwk'][-1])
print("Final QWK on validation data:", history.history['val_qwk'][-1])
implemented two versions of custom metric, both have shortcomings that make it difficult to use. I have read and implemented prior suggestions on this topic.