R_max algorithm doesn’t converge to the right policy
I have a task where I need to implement an R_max algorithm with modified policy iteration over the frozen lake problem.
I have a task where I need to implement an R_max algorithm with modified policy iteration over the frozen lake problem.