I am trying to compare each layer outputs of two ML models trained with different libs (tensorflow and a custom lighter one) that have the same valid performance in some sort of a “unitary test” and I have big differences in testing between float32 training and float16 training, even though the predicting performances are very similar.
When training the model with float32 weights, I use np.isclose with 1e-08 atol to compare the layer outputs and i get very minimal errors, but when I train the model with float16 weights, I get much bigger errors, even though the two models have the same performance and I increased the atol to 1e-04.
I am wondering if just increasing the atol to 1e-04 is the right approach, as the np.isclose layer output comparison of the two models (even though they have the same predicting performance) gives a lot more error in float16 than in float32.
Lucas Barrot is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.