I am implementing the spatial smoothness framework in my RL algorithm (paper). In this framework, I added a noise to the current state (observation) and then calculated the distribution from the noisy state. I intend to calculate Jeffrey’s divergence (symmetric KL-divergence). However, I have encountered a situation where KL(P||Q) = KL(Q||P), and both are greater than zero. Is this possible? My code is as follows:
from torch.distributions import Normal, Independent, kl
def calculate_spatial_loss(self, minibatch, dist_current):
P = dist_current
# current observation
obs = minibatch.obs
# current observation with noise
obs_perturbed = obs + np.random.normal(0, noise_scale, size=obs.shape)
(mean, std), _ = actor(obs_perturbed)
dist_perturbed = Normal(mean, std)
Q = Independent(dist_perturbed, 1)
# KL-Divergence:
kl_P = kl.kl_divergence(P.base_dist, Q.base_dist).sum(dim=1)
kl_Q = kl.kl_divergence(Q.base_dist, P.base_dist).sum(dim=1)
# Jeffery's divergence
divergence = 0.5 * (kl_P + kl_Q)
return divergence.mean()
I did:
print(kl_P.mean(), kl_Q.mean())
The result I got is as follow:
tensor(7.8187, grad_fn=<MeanBackward0>) tensor(7.8187, grad_fn=<MeanBackward0>)
tensor(8.3213, grad_fn=<MeanBackward0>) tensor(8.3213, grad_fn=<MeanBackward0>)
tensor(8.8823, grad_fn=<MeanBackward0>) tensor(8.8823, grad_fn=<MeanBackward0>)
tensor(9.2913, grad_fn=<MeanBackward0>) tensor(9.2913, grad_fn=<MeanBackward0>)
tensor(9.8640, grad_fn=<MeanBackward0>) tensor(9.8640, grad_fn=<MeanBackward0>)
tensor(1.0413, grad_fn=<MeanBackward0>) tensor(1.0413, grad_fn=<MeanBackward0>)
tensor(1.0928, grad_fn=<MeanBackward0>) tensor(1.0928, grad_fn=<MeanBackward0>)
tensor(1.1453, grad_fn=<MeanBackward0>) tensor(1.1453, grad_fn=<MeanBackward0>)
tensor(1.2143, grad_fn=<MeanBackward0>) tensor(1.2143, grad_fn=<MeanBackward0>)
tensor(1.2762, grad_fn=<MeanBackward0>) tensor(1.2762, grad_fn=<MeanBackward0>)
which shows both are not zero but they are equal.
Do you think it is possible? and if yes, what this mean? Do you think I am doing something wrong in the code?
Payam Parvizi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1