I am working on a translation model by transformer from scratch for the first time. And I am encountering problems while training decoders with varying sequence lengths with look ahead pads. I am unable to understand how to integrate varying length input with batch size and pass it to a custom training loop. How can I solve this?
def create_padding_mask(seq):
seq = tf.cast(tf.math.equal(seq, 0), tf.float32)
return seq[:, tf.newaxis, tf.newaxis, :]
def create_look_ahead_mask(size):
mask = 1 - tf.linalg.band_part(tf.ones((size, size)), -1, 0)
return mask
def CreateMask(encoderInput, decoderInput):
encPadMask = create_padding_mask(encoderInput)
decPadMask = create_padding_mask(decoderInput)
LAMask = create_look_ahead_mask(tf.shape(decoderInput)[1])
# decTargetPadMask = create_padding_mask(decoderInput)
LAMask = tf.maximum(decPadMask, LAMask)
return encPadMask, LAMask, decPadMask
New contributor
Nishant Singh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.