I keep getting this error and I’m not sure how to fix it. I’m working on a PPO code and this is my first time. The error is in the mu = self.actor(state)
and choose_action
function.
I asked ChatGPT but it felt like it was a loop of errors. I can’t change my environment code, only the PPO agent. The forward function:
# Define the actor network
def forward(self, state):
print('!!!!!!!state shape is:' ,state.shape)
mu = self.actor(state)
mu = sanitize_tensor(mu) # Sanitize output of actor network
dist = MultivariateNormal(mu, T.diag_embed(self.log_std.exp().expand_as(mu)))
return dist ##change
The choose action function:
def choose_action(self, observation):
state = T.tensor(observation, dtype=T.float).unsqueeze(0).to(self.actor.device)
print(f"State shape: {state.shape}") # Debugging print statement
dist = self.actor(state)
value = self.critic(state)
action = dist.sample()
action = action.cpu().detach().numpy().flatten()
expected_size = 5 * (env.No_TX_UAVs + env.No_Jam_UAVs)
if action.size != expected_size:
raise ValueError(f'Action array has incorrect size. Expected {expected_size}, got {action.size}')
probs = dist.log_prob(T.tensor(action).to(self.actor.device)).cpu().detach().numpy().flatten()
value = T.squeeze(value).item()
return action, probs, value
Calling the function in main:
action, prob, val = agent.choose_action(observation)
Is there a something I’m missing?
I have also been encountering NaN
values a lot, so my agent is not learning. How can I fix it and how can I without touching the environment?
RNC is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.