I’m training a deep RL model with TensorFlow, but my model doesn’t have a single correct action. The output of the network is a vector [x1, x2], and both are actions that need to be optimized.
def train_step(self, state, reward, action):
with tf.GradientTape() as tape:
state = tf.convert_to_tensor(state, dtype=tf.float32)
action = tf.convert_to_tensor(action, dtype=tf.float32)
reward = tf.convert_to_tensor(reward, dtype=tf.float32)
predicted_action = self.actor(state, training=True)
# Aquí calculamos el loss directamente basado en la acción real y la recompensa
loss = -tf.reduce_mean(predicted_action * reward)
print("LOSS:", loss)
grads = tape.gradient(loss, self.actor.trainable_variables)
self.optimizer.apply_gradients(zip(grads, self.actor.trainable_variables))
self.epsilon = self.epsilon*self.epsilon_dec
I have my reward function defined as follows:
def transform_actions_to_reward(actions, state, agent, tipo, iteration):
actions = actions[0]
energy = state[0]
price = state[1]
battery_charge = state[2]
percentage_stored = actions[0]
percentage_empty = actions[1]
energy_stored = energy*percentage_stored
energy_iny = energy*(1-percentage_stored)
battery_iny = agent.battery*percentage_empty
agent.fill_bat(energy_stored)
if price == 1:
reward = (energy_iny + battery_iny)
elif price == 0:
reward = -(energy_iny + battery_iny)
reward = reward
The problem is with my training function, which I got from a forum, and I haven’t been able to optimize the actions correctly. I see that it’s calculating the loss as the reward multiplied by the actions, but the reward corresponds to the previous actions. However, if I try to use the previous actions to calculate the loss, I get an error.
ValueError: No gradients provided for any variable: (['lstm_14/lstm_cell/kernel:0', 'lstm_14/lstm_cell/recurrent_kernel:0', 'lstm_14/lstm_cell/bias:0', 'lstm_15/lstm_cell/kernel:0', 'lstm_15/lstm_cell/recurrent_kernel:0', 'lstm_15/lstm_cell/bias:0', 'dense_7/kernel:0', 'dense_7/bias:0'],). Provided `grads_and_vars`
What should i do?
gustavo lobos astorquiza is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.