I’m working on a Q-learning implementation to help a robot navigate within a double gyre flow field. The objective is to find an optimal path between two points in the flow while minimizing energy consumption.
Current Setup:
-
States: Agent’s positions within the flow field
-
Actions: {up, down, right, left}
-
Algorithm: Q-learning with UCB policy
Problem: The current reward function prioritizes speed (fewer time steps) to determine the optimal path. However, I also need to factor in energy consumption, where:
-
Moving with the flow should result in less energy usage (positive reward).
-
Moving against the flow requires more energy (small negative reward).
double gyre flow code:
def DoubleGyre(t_incr, num_x_grid, num_y_grid):
x = np.linspace(0, 2, num_x_grid)
y = np.linspace(0, 1, num_y_grid)
t = np.linspace(0, 10, t_incr)
A = 0.1
epsn = 0.25
omega = 0.2 * np.pi
VEC = []
for k in range(len(t)):
X, Y = np.meshgrid(x, np.flipud(y))
a = epsn * np.sin(omega * t[k])
b = 1 - 2 * a
f = a * X**2 + b * X
U = -A * np.pi * np.sin(np.pi * f) * np.cos(np.pi * Y)
V = A * np.pi * (2 * a * X + b) * np.cos(np.pi * f) * np.sin(np.pi * Y)
C = np.logical_or(U, V)
VEC.append({
'X': X,
'Y': Y,
'U': U,
'V': V,
'C': C
})
return x, y, t, VEC
t_incr = 100
num_x_grid = 50
num_y_grid = 50
x, y, t, VEC = DoubleGyre(t_incr, num_x_grid, num_y_grid)
#initialse algorithm:
alpha = 0.46 # Learning rate
gamma = 0.99 # Discount factor
c = 2.0 # Exploration parameter for UCB
num_episodes = 100000 # Number of training episodes
max_steps = 4000 # Max steps per episode
start_point = (5, 5) # Corresponding to (0.2, 0.1)
end_point = (45, 45) # Corresponding to (1.8, 0.9)
Q_table = np.zeros((num_x_grid, num_y_grid, 4)) # 4 actions: up, down, left, right
current reward structure:
def get_reward(state):
return 100 if state == end_point else -1
I’m unsure how to quantify the energy consumption based on the flow characteristics and how to integrate this into the reward function effectively. Could anyone guide me on adjusting the reward function to balance between minimizing energy consumption and maintaining an efficient path?
hmlkd is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.