everyone.
As you see above. AlphaZero is a general robot to play games such as Go by self-playing. But when we design a game which can be played and trained by AlphaZero framework, meanwhile its action space is greatly enormous such as AlphaTensor with almost 5^12 actions, We have to make the sampled alphazero.
And my questions are:
1.When we take the sampled alphazero, whether the policy output of net contaions all the action probabilities? If it is, isn’t it a lot?
2.How to sample the actions when the action field is so large that you can’t enumerate it?
Everyone interested in alphazero are welcome to talk though we don’t solve the problem.
David Lee is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.