If Action is Multidimension, how can we update policy by gradient method in reinforcement learning?
If Action is multi-dimension, how can I implement policy updating ?
$J$ in $nabla_theta J(theta)$ is muti-dimensional ?
If Action is multi-dimension, how can I implement policy updating ?
$J$ in $nabla_theta J(theta)$ is muti-dimensional ?