I am trying to set up a Bayesian Neural Network which is implemented with a statistical layer, tfp.layers.DenseVariational.
I was about to test various activation functions. From my data, tanh or relu should work best.
However, I figured that most Bayesian Neural Nets are using sigmoid as activation function. Does anyone know why?
Moreover, the Bayesian Network is not able to train with relu activation.
Is there any theoretical reason why which I am overseeing?
Any help is appreciated!