I am performing a simple seq-to-seq transformer task. I have tried various loss and metrics but none are working. Currently, in model.compile() I am using these:
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'],
The input and final_layer output of the code is correct (batch_size, output_seq_length, output_vocab_size):
Sample input shape (encoder input): (18, 20)
Sample input shape (decoder input): (18, 100)
Model output shape: (18, 100, 200)
tf.Tensor([ 18 100 200], shape=(3,), dtype=int32)
Regardless, I get the error that it wants a single value:
InvalidArgumentError: Can not squeeze dim[1], expected a dimension of 1, got 100
[[{{node metrics/accuracy/Squeeze}}]]
All code is available in a relatively simple playground:
https://colab.research.google.com/drive/1tfk518PwmrJEapxIQlOuiLmiTSfxREk3?usp=sharing
But here is some core code (I’ve tried all of the metrics and losses):
self.encoder_embedding = tf.keras.layers.Embedding(args.problem_vocab_size, args.dim)
self.encoder_ffn = tf.keras.Sequential([
tf.keras.layers.Dense(args.dim_ff, activation='relu'),
tf.keras.layers.Dense(args.dim)
])
self.decoder_embedding = tf.keras.layers.Embedding(args.solution_vocab_size, args.dim)
self.cross_attention = tf.keras.layers.MultiHeadAttention(num_heads=args.num_heads, key_dim=args.dim // args.num_heads)
self.decoder_ffn = tf.keras.Sequential([
tf.keras.layers.Dense(args.dim_ff, activation='relu'),
tf.keras.layers.Dense(args.dim)
])
self.final_layer = tf.keras.layers.Dense(args.solution_vocab_size)
model.compile(optimizer=tf.keras.optimizers.legacy.Adam(args.learning_rate),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.sparse_categorical_accuracy],
run_eagerly=False)
model.fit(dataset, epochs=args.epochs, steps_per_epoch=args.num_batches)