Number agent actions recorded does not match rewards recorded
I trying to batch train an AI to play connect four but for some reason I have more records of rewards then actions. It does not happen with ever episode and is always exactly one less entry then rewards. There should be one action for each reward recorded. I believe the rewards are correct but the some actions are getting skipped. States are also sometimes an odd number but they should always be even. The state before and after an action. I trimmed down the code as best I can.