I’d like to match the index of tensors from the list.
I’m trying to do link prediction using Pytorch.
In this process, I need to convert the index to the name by mapping it to the dictionary.
To do this, I set the dictionary and masking to the tensor, but it returned unexpected indices.
<code>inv_entity_dict = {v: k for k, v in entity_dict.items()}
inv_entity_dict
#{0: 'TMEM35A',
# 1: 'FHL5',
# 2: 'Sirolimus',
# 3: 'TMCO2',
# 4: 'RNF123',
# 5: 'SMURF2',
# 6: 'SSH3',
# 7: 'PSMA4',
# 8: 'SOD3',
# 9: 'SCOC',
# 10: 'Cysteamine',
# 11: 'TOX',
#...}
nonzero[0:10]
#array([ 0, 1, 3, 4, 5, 6, 7, 8, 9, 11])
</code>
<code>inv_entity_dict = {v: k for k, v in entity_dict.items()}
inv_entity_dict
#{0: 'TMEM35A',
# 1: 'FHL5',
# 2: 'Sirolimus',
# 3: 'TMCO2',
# 4: 'RNF123',
# 5: 'SMURF2',
# 6: 'SSH3',
# 7: 'PSMA4',
# 8: 'SOD3',
# 9: 'SCOC',
# 10: 'Cysteamine',
# 11: 'TOX',
#...}
nonzero[0:10]
#array([ 0, 1, 3, 4, 5, 6, 7, 8, 9, 11])
</code>
inv_entity_dict = {v: k for k, v in entity_dict.items()}
inv_entity_dict
#{0: 'TMEM35A',
# 1: 'FHL5',
# 2: 'Sirolimus',
# 3: 'TMCO2',
# 4: 'RNF123',
# 5: 'SMURF2',
# 6: 'SSH3',
# 7: 'PSMA4',
# 8: 'SOD3',
# 9: 'SCOC',
# 10: 'Cysteamine',
# 11: 'TOX',
#...}
nonzero[0:10]
#array([ 0, 1, 3, 4, 5, 6, 7, 8, 9, 11])
After running the code, it returned unexpected results because Sirolimus(idx==2), which is not in the nonzero array, should not be matched the name.
<code>for i in range(1):
raw_probs = (z[i][nonzero[0:10]] @ z[i][nonzero[0:10]].t()).sigmoid()
filtered_probs = pd.DataFrame((raw_probs>0.9).nonzero(as_tuple=False).cpu().numpy(), columns=['Gene1', 'Gene2'])
filtered_probs['prob'] = raw_probs[(raw_probs>0.9)].cpu().detach().numpy()
filtered_probs_name = map_id2gene(filtered_probs, inv_entity_dict) #converting func.
#Expected result
# Gene1 Gene2 prob
#67 TOX TOX 1.0
#0 TMEM35A TMEM35A 1.0
#1 TMEM35A FHL5 1.0
#2 TMEM35A RNF123 1.0
#52 SCOC TMEM35A 1.0
#Wrong
# Gene1 Gene2 prob
#67 SCOC SCOC 1.0
#0 TMEM35A TMEM35A 1.0
#1 TMEM35A FHL5 1.0
#2 TMEM35A Sirolimus 1.0
#52 SOD3 TMEM35A 1.0
</code>
<code>for i in range(1):
raw_probs = (z[i][nonzero[0:10]] @ z[i][nonzero[0:10]].t()).sigmoid()
filtered_probs = pd.DataFrame((raw_probs>0.9).nonzero(as_tuple=False).cpu().numpy(), columns=['Gene1', 'Gene2'])
filtered_probs['prob'] = raw_probs[(raw_probs>0.9)].cpu().detach().numpy()
filtered_probs_name = map_id2gene(filtered_probs, inv_entity_dict) #converting func.
#Expected result
# Gene1 Gene2 prob
#67 TOX TOX 1.0
#0 TMEM35A TMEM35A 1.0
#1 TMEM35A FHL5 1.0
#2 TMEM35A RNF123 1.0
#52 SCOC TMEM35A 1.0
#Wrong
# Gene1 Gene2 prob
#67 SCOC SCOC 1.0
#0 TMEM35A TMEM35A 1.0
#1 TMEM35A FHL5 1.0
#2 TMEM35A Sirolimus 1.0
#52 SOD3 TMEM35A 1.0
</code>
for i in range(1):
raw_probs = (z[i][nonzero[0:10]] @ z[i][nonzero[0:10]].t()).sigmoid()
filtered_probs = pd.DataFrame((raw_probs>0.9).nonzero(as_tuple=False).cpu().numpy(), columns=['Gene1', 'Gene2'])
filtered_probs['prob'] = raw_probs[(raw_probs>0.9)].cpu().detach().numpy()
filtered_probs_name = map_id2gene(filtered_probs, inv_entity_dict) #converting func.
#Expected result
# Gene1 Gene2 prob
#67 TOX TOX 1.0
#0 TMEM35A TMEM35A 1.0
#1 TMEM35A FHL5 1.0
#2 TMEM35A RNF123 1.0
#52 SCOC TMEM35A 1.0
#Wrong
# Gene1 Gene2 prob
#67 SCOC SCOC 1.0
#0 TMEM35A TMEM35A 1.0
#1 TMEM35A FHL5 1.0
#2 TMEM35A Sirolimus 1.0
#52 SOD3 TMEM35A 1.0
I guess the initialized raw_probs
indices went into the converting process directly.
<code>raw_prob
#tensor([[1.0000e+00, ..., 1.0000e+00], #real index: 0
# [1.0000e+00, ..., 1.0000e+00], #real index: 1
# [1.0000e+00, ..., 1.0000e+00], #real index: 3, but considered to 2
# [1.0000e+00, ..., 1.0000e+00], #real index: 4, but considered to 3, ...
# [1.0000e+00, ..., 1.0000e+00], #real index: 5
# [1.0000e+00, ..., 1.0000e+00], #real index: 6
# [0.0000e+00, ..., 0.0000e+00], #real index: 7
# [0.0000e+00, ..., 4.4097e-36], #real index: 8
# [1.0000e+00, ..., 1.0000e+00], #real index: 9
# [1.0000e+00, ..., 1.0000e+00] #real index: 11, but considered to 9], device='cuda:0')
</code>
<code>raw_prob
#tensor([[1.0000e+00, ..., 1.0000e+00], #real index: 0
# [1.0000e+00, ..., 1.0000e+00], #real index: 1
# [1.0000e+00, ..., 1.0000e+00], #real index: 3, but considered to 2
# [1.0000e+00, ..., 1.0000e+00], #real index: 4, but considered to 3, ...
# [1.0000e+00, ..., 1.0000e+00], #real index: 5
# [1.0000e+00, ..., 1.0000e+00], #real index: 6
# [0.0000e+00, ..., 0.0000e+00], #real index: 7
# [0.0000e+00, ..., 4.4097e-36], #real index: 8
# [1.0000e+00, ..., 1.0000e+00], #real index: 9
# [1.0000e+00, ..., 1.0000e+00] #real index: 11, but considered to 9], device='cuda:0')
</code>
raw_prob
#tensor([[1.0000e+00, ..., 1.0000e+00], #real index: 0
# [1.0000e+00, ..., 1.0000e+00], #real index: 1
# [1.0000e+00, ..., 1.0000e+00], #real index: 3, but considered to 2
# [1.0000e+00, ..., 1.0000e+00], #real index: 4, but considered to 3, ...
# [1.0000e+00, ..., 1.0000e+00], #real index: 5
# [1.0000e+00, ..., 1.0000e+00], #real index: 6
# [0.0000e+00, ..., 0.0000e+00], #real index: 7
# [0.0000e+00, ..., 4.4097e-36], #real index: 8
# [1.0000e+00, ..., 1.0000e+00], #real index: 9
# [1.0000e+00, ..., 1.0000e+00] #real index: 11, but considered to 9], device='cuda:0')
In this case, how can I match the correct ids and names based on the inv_entity_dict
and nonzero
list?