I’m looking into the CTC loss in pytorch, and it seems to be returning an incorrect value.
Let’s look at a table of 2 timesteps and 3 letters in the alphabet: ‘a’, ‘b’, and blank (‘-‘).
The table (sized 2×3) is representing the probability of a character in each time step is as follows:
table = np.array([[0.3, 0.1, 0.6], [0.3, 0.2, 0.5]])
So according to the theory, the probability of a path would be multiplication, for example:
p('a-') = p('a' at t=0) * P('-' at t=1) = 0.3 * 0.5 = 0.15
We also have the concept of “collapse” in which different paths collapse into the same label, by replacing repetitions by a single character and removing blank labels, so ‘aaaaa-b’ would collapse to ‘ab’.
The probability of a certain label would be the sum of all paths that collapse into that label, for example:
P(label='a') = p('aa') + p('a-') + p('-a') = 0.3 * 0.3 + 0.3 * 0.5 + 0.6 * 0.3 = 0.42
P(label='b') = p('bb') + p('b-') + p('-b') = 0.1 * 0.2 + 0.1 * 0.5 + 0.6 * 0.2 = 0.19
Since:
ctc_loss(label) = -ln(pr(label))
Then:
pr(label) = exp(-ctc_loss(label))
I wrote a small wrapper around the Pytorch CTC loss, to get the probability of a label:
import torch
alphabet = 'ab-'
def get_label_score(log_softmax_ctc, label):
targets = torch.tensor(np.array([[alphabet.index(ch) for ch in label]]))
input_l = torch.tensor(np.array([log_softmax_ctc.shape[0]]))
target_l = torch.tensor(np.array([len(label)]))
ctc = torch.nn.CTCLoss(blank=log_softmax_ctc.shape[1] - 1)
ctc_table_tn = torch.tensor(np.expand_dims(log_softmax_ctc, axis=1))
return -ctc(ctc_table_tn, targets, input_l, target_l).item()
And given the above table I get correct results for labels ‘a’, or ‘b’, but for ‘ab’ I get:
a 0.42
b 0.19000000000000006
ab 0.2449489742783178 # I expected to get 0.06, which is 0.3 * 0.2!
What am I doing wrong?
Full code:
import torch
import numpy as np
alphabet = 'ab-'
def get_label_score(log_softmax_ctc, label):
targets = torch.tensor(np.array([[alphabet.index(ch) for ch in label]]))
input_l = torch.tensor(np.array([log_softmax_ctc.shape[0]]))
target_l = torch.tensor(np.array([len(label)]))
ctc = torch.nn.CTCLoss(blank=log_softmax_ctc.shape[1] - 1)
ctc_table_tn = torch.tensor(np.expand_dims(log_softmax_ctc, axis=1))
return -ctc(ctc_table_tn, targets, input_l, target_l).item()
if __name__ == '__main__':
ctc_table = np.log(np.array([[0.3, 0.1, 0.6], [0.3, 0.2, 0.5]]))
labels = ['a', 'b', 'ab']
for l in labels:
res = np.exp(get_label_score(ctc_table, l))
print(l, res)
Thanks 🙂
I expected the result for P(‘ab’) to be 0.06, but receiving a strange 0.24 from Pytorch.
Ecceq is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.