I have created a model that I adapted from this video:
The environment the model is working in is one with a controllable ball with tank-like controls, a ball that can be pushed and a square box that is the goal
Game environment
The model gains rewards for touching the ball with the player and also gains rewards based on how close the ball is to the goal by the end of the iteration.
The loss of the model seems to be very random and can get absurdly high Loss Graph with average and current
Additionally the model doesn’t seem to be learning anything
I’ve tried to Discretize the data,
I’ve tried adjusting the learning rate,
I’ve adjusted rewards (although I may not have adjusted them enough)
Still it doesn’t seem to learn anything,
I’m not sure if there’s a deep flaw in my model or perhaps the model isn’t well suited to the space as it was originally for snake in a grid-like space and this game is not built in a grid
Link to codebase:
https://github.com/jamiemitch121/RLNN-2D