Within Python, I am passing the state:
tensor([[ -7.8010, -5.2399, -6.2746, 2.3048, -2.9423, 11.1721, 1.0000],
[-12.9662, -4.4320, 1.5688, 11.9084, -4.5796, 13.6317, 1.0000],
[ -7.7946, -4.4305, -0.3919, 4.9900, 1.3524, 6.0343, 1.0000],
[ 3.2841, 4.0301, -7.8403, -11.8998, 12.6486, 14.3685, 1.0000],
[ 12.3128, 1.6109, 3.1347, 0.0000, -0.9280, 14.3685, 1.0000]])
through
self.actor = nn.Sequential(
nn.Linear(*input_dims, fc1_dims),
nn.ReLU(),
nn.Linear(fc1_dims, fc2_dims),
nn.ReLU(),
nn.Linear(fc2_dims, n_actions),
nn.Softmax(dim = -1),
and it is returning:
tensor([[nan, nan],
[nan, nan],
[nan, nan],
[nan, nan],
[nan, nan]], grad_fn=<SoftmaxBackward0>)
The NN is torch, the fc1_dims = 256, the fc2_dims = 256. The data size is about 300k, it has iterated through about 275k+ before reaching this error. I’m fairly new to neural networks and machine learning so I am hoping to use this as a learning opportunity as well.
I have already normalized the data using QPR with a threshold of 1.5. I tried increasing the batch_size to 128. I looked on stack overflow for similar questions which have brought up gradient explosion or gradient vanishing but I’m not sure how to identify and fix those issues.
tombomb is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.