Reinforcement Learning Agent not taking realistic actions
I am using a PPO agent in a Simulink environment, but the actions produced by the agent seem to be discrete. Specifically, the agent only outputs either the upper limit or the lower limit. Any ideas why this could be happening? I am using the RL Toolbox for training.
Here are some details about my setup:
I am using a variable step time Simulink model with the ode23t solver.
My Simulink model uses the Simscape library for thermal fluids and simulates a simplified district heating network. The DHN has 2 branches NORTH (NORD) and SOUTH (SUD).
I am trying to use an RL agent to optimize control, initially focusing on minimizing energy costs by changing the mass flow in the branches.
Regarding the hyperparameters of the agent I am using the RL Toolbox with the following parameters:
Sample time= 3600
Discount factor= 0.99
GPU
Batch size = 512
Learning rate= 1e-3 (for both actor and critic)
I suspect there might be an issue with either my model or the agent. I will attach the Simulink model (the properties table should be loaded beforehand). I hope the problem is clear and that someone can help!
Thank you in advance!