im trying to train a agent on my bitcoin trading environment . i tried finrl library and it didn’t got any good environment for crypto .
so i tried to wrote a 2 sided market environment for trading bitcoin that can be trained on stable_baselines3 algorithms like a2c and ppo .the actions are numbers between -1 and 1 so lower than zero is sell and zero is hold and bigger than zero is buy .
the problem is i got some logics in my environment like if there is a buy position open you cant open a sell position or if there is more than 5 buy or sell position open you cant open more . this logics make the agent learn its better to just hold .
the train processes looks fine i mean i can see the different actions that agent is using but in test its just 0 .
i checked the policy that sb3 use on ppo and a2c and its mlpolicy (ActorCricitPolicy) and i want to know if i need to change the policy or not . im not gonna lie i saw the policy codes and didn’t understand a thing 🙂 .
sorry for my bad English .
i tied different algorithms like a2c and ppo and ddpg and sac and td3 .
i tried adjusting the reward funcs . the reward is based on the profit we make at the end of each position .
i tried different kwargs in agents .
i tried different time frame datasets (1 min , 5 min , 1 h , ….)
erfan faraji is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
what happens in this unsaturated program
Ramji Pal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.