I was wondering what the output_gradients
argument does in the gradient function of an GradientTape object in tensorflow. According to https://www.tensorflow.org/api_docs/python/tf/GradientTape#gradient, this variable should contain “a list of gradients, one for each differentiable element of target.”
Its default value is None.
It isn’t very clear what this does exactly.
Hope that somebody known and thanks in advance!
When I omit the argument, the function calculates the jacobian with respect to some function, for example f(x) = z so I get dz/dx. I figured that passing the output_gradients dL/dz (with respect to loss L) would calculate their product based on the chainrule, i.e. dL/dx = dz/dx * dL/dz, but testing this out I get a different result. What is done with the output_gradients?? There is no real info in the documentation on this…
Here’s some dummy code:
from typing import Optional
import tensorflow as tf
import numpy as np
import gpflow
def natgrad_apply_gradients(
q_mu_grad: tf.Tensor,
q_sqrt_grad: tf.Tensor,
q_mu: gpflow.Parameter,
q_sqrt: gpflow.Parameter,
xi_transform: Optional[gpflow.optimizers.natgrad.XiTransform] = None,
) -> None:
gamma = 1
xi_transform = gpflow.optimizers.natgrad.XiNat()
dL_dmean = gpflow.base._to_constrained(q_mu_grad, q_mu.transform)
dL_dvarsqrt = gpflow.base._to_constrained(q_sqrt_grad, q_sqrt.transform)
with tf.GradientTape(persistent=True, watch_accessed_variables=False) as tape:
tape.watch([q_mu.unconstrained_variable, q_sqrt.unconstrained_variable])
eta1, eta2 = gpflow.optimizers.natgrad.meanvarsqrt_to_expectation(q_mu, q_sqrt)
meanvarsqrt = gpflow.optimizers.natgrad.expectation_to_meanvarsqrt(eta1, eta2)
dL_deta1, dL_deta2 = tape.gradient(
meanvarsqrt, [eta1, eta2], output_gradients=[dL_dmean, dL_dvarsqrt]
)
dtheta_deta1, dtheta_deta2 = tape.gradient(
meanvarsqrt, [eta1, eta2], output_gradients=None
)
return dL_deta1, dL_deta2, dtheta_deta1, dtheta_deta2
X_data = tf.ones(5)
num_latent_gps = 1
static_num_data = X_data.shape[0]
q_sqrt_unconstrained_shape = (num_latent_gps, gpflow.utilities.triangular_size(static_num_data))
num_data = gpflow.Parameter(tf.shape(X_data)[0], shape=[], dtype=tf.int32, trainable=False)
dynamic_num_data = tf.convert_to_tensor(num_data)
mu = np.array([[0.93350756], [0.15833747], [0.23830378], [0.28742445], [0.14999759]])
q_mu = gpflow.Parameter(mu, shape=(static_num_data, num_latent_gps))
q_sqrt = tf.eye(dynamic_num_data, batch_shape=[num_latent_gps])
q_sqrt = gpflow.Parameter(
q_sqrt,
transform=gpflow.utilities.triangular(),
unconstrained_shape=q_sqrt_unconstrained_shape,
constrained_shape=(num_latent_gps, static_num_data, static_num_data),
)
q_mu_grad = q_mu.unconstrained_variable * 0.33
q_sqrt_grad = q_sqrt.unconstrained_variable
dL_deta1, dL_deta2, dtheta_deta1, dtheta_deta2 = natgrad_apply_gradients(q_mu_grad, q_sqrt_grad, q_mu, q_sqrt)
dL_deta1 !== dtheta_deta1 * q_mu_grad
Alexander Janssen is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.