If Action is multi-dimension, how can I implement policy updating ?
$J$ in $nabla_theta J(theta)$ is muti-dimensional ?
I also want to know a concrete python code in neural network policy.
I’m very interested in portfolio optimization by reinforcement learning.
I faced with this problem dealing with multi-dimensional action that corresponds with portfolio weights.
I learn the policy gradient theorem in monte carlo of REINFORCEMENT below.
$nabla_theta J(theta) approx frac{1}{N} sum_{i=1}^{N} sum_{t=0}^{T-1} nabla_theta log pi_theta(a_t^i mid s_t^i) R(tau^i)$
CfT is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.