I am trying to develop a deep q learning based model for playing a game like flappy Bird. Initially, we let the agent do actions at random , and then it learns how to play by getting rewards and penalties. For flappy Bird, the choice of action is either jump or not. And being random at nature the probability will be 50% for each action. But giving jump action in such a frequent manner will always cause it to go too high . (Like tapping the mobile screen, after every alternate or 3rd frame.) And it will almost never will be able to cross the gap, and will never experience a positive reward.
Can someone explain, how should the agent initiate the game, like, what should be the strategy to take the first few actions.?
I tried using random method to decide the action initially, but by this, even before the bird starts falling down, another jump command is given, ultimately causing it to collide with ceiling or the pipe.