I have a custom Petting Zoo parallel env which worked fine until i add some action mask.
I follow the action_masking tutorial and try to implement action masking in this way (in a different class from the Env) :
def compute_action_mask(self) -> np.array:
"""Return the action mask"""
current_state = self.get_current_state()
if current_state == "g":
if self.consecutive_durations["g"] < self.env.min_green_time:
action_mask = np.array([1, 0, 0], dtype=np.int8)
else:
action_mask = np.array([1, 1, 0], dtype=np.int8)
elif current_state == "y":
if self.consecutive_durations["y"] < self.env.yellow_time:
action_mask = np.array([0, 1, 0], dtype=np.int8)
elif self.consecutive_durations["y"] == self.env.yellow_time:
action_mask = np.array([0, 0, 1], dtype=np.int8)
else:
if self.consecutive_durations["r"] >= self.env.max_red_time:
action_mask = np.array([1, 0, 0], dtype=np.int8)
else:
action_mask = np.array([1, 0, 1], dtype=np.int8)
return action_mask
And now my observations dict follows the structure (in the Env class) :
def _compute_observations(self) -> dict:
""" Compute and return the observations
Returns:
observations (dict): dict of the form {id_agent: {'observation' : [], 'action_mask' : []}}
"""
observations = {}
for id_agent, agent in self.dict_feux.items():
observations[id_agent] = {
"observation" : agent.compute_observation(),
"action_mask" : agent.compute_action_mask()
}
return observations
But when I tested this env (firstly converted into aec env) with the api_test
method of petting zoo (because it seems that the parallel_api_test
does not look at action masks), I have the following error :
AssertionError: Out of bounds observation: {'observation': array([ 0. , 13.9], dtype=float32), 'action_mask': array([1, 0, 1], dtype=int8)}
My current observation space :
def observation_space(self) -> spaces.Box:
"""Return the observation space."""
return spaces.Box(
low=0,
high=100,
shape=(2,),
dtype=np.float32
)
If I understand this error, I should change my observation space. But how to change it to match the add of the action masking ?
Thank you in advance