I have defined custom classifier head for TFRobertaForSequenceClassification
with two labels by doing as shown here, to be able to fine-tune it for my downstream task of classifying sentences as coming from a finite set of independent labels.
from transformers import TFRobertaForSequenceClassification
roberta_model = TFRobertaForSequenceClassification.from_pretrained(pretrained_model_path, from_pt=True, num_labels=2)
I would like to make roberta_model
a part of tensorflow.keras.Model
. Here’s what I now have using tensorflow.keras.layers.Identity
as the final/output layer, since the classification head of roberta already takes care of it.
import numpy as np
import tensorflow as tf
MAX_SEQUENCE_LENGTH = 256
# define the input-ids, attention-masks and input-type-ids
input_word_ids = tf.keras.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype=tf.int32, name='input_word_ids')
input_mask = tf.keras.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype=tf.int32, name='input_mask')
input_type_ids = tf.keras.Input(shape=(MAX_SEQUENCE_LENGTH,), dtype=tf.int32, name='input_type_ids')
# initiate the pre-trained model
roberta_model = TFRobertaForSequenceClassification.from_pretrained(pretrained_model_path, from_pt=True, num_labels=2)
x = roberta_model(input_ids=input_word_ids, attention_mask=input_mask, token_type_ids=input_type_ids, labels=np.array([0, 1]))
# that `x` has loss and logits as keys. Hence, passing both the loss and logits forward.
# add the final layers needed for the task
out = tf.keras.layers.Identity(activation='softmax')(x)
# construct the model
model = tf.keras.Model(inputs=[input_word_ids, input_mask, input_type_ids], outputs=out)
# compile the model
model.compile(loss='sparse_categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(lr=1e-5), metrics=['accuracy', tf.keras.metrics.F1Score()])
I need the model to be in Keras, because I have downstream code that processes the history that tf.keras.Model.fit
returns.
Issues I see with my existing approach is, as shown in the source code (https://github.com/huggingface/transformers/blob/main/src/transformers/models/roberta/modeling_roberta.py#L1210-L1232), a loss is already being appended on the end, So, me adding a softmax loss just to make it a part of tensorflow.keras.Model
and having two loss-functions does not look right. Additionally, can I not have any activation in the final Identity layer.
Am I doing something wrong somewhere else?