I have a question on when I can use an intermediate variable from which function value, jacobian and (projected) hessian are directly derived during optimization.
For example, I have a function f
depending on q
, which in turn depends on x
.
A good property of f
is that its jacobian g
and projected hessian hv
all depend on q
directly.
Therefore, I can compute q
from x
once and obtain all f
, g
and hv
in the meantime.
In python/scipy, I can set q
to be a global variable. In each optimization iteraction, in the evaluation of f
with myfunc
, q
is automatically updated based on the current x
; in the evaluation of g
and hv
, instead of dealing with x
, the myjac
and myhessp
functions read the global q
and return the corresponding derivatives.
Something like
q=0
def myfunc(x):
global q
q=foo(x)
return bar0(q)
def myjac(x):
# do nothing with x
return bar1(q)
def myhessp(x):
# do nothing with x
return bar2(q)
scipy.optimize.minimize(myfunc,initial_guess_x0,method="blablabla",jac=myjac,hessp=myhessp)
In each iteration, if myfunc
, myjac
and myhessp
are evaluated only once respectively with only one $x$, the strategy will definitely work.
However, some optimizers evaluate these values with more than one x
in one iteration.
When the optimizers try to evaluate g
and hv
with a x_prime
other than the current x
that is used to evaluate f
, the x_prime
actually does not make a difference and instead the q
derived from x
is read by mistake.
Of course, multiple evaluation of f
is tolerable, since in myfunc
, f
is obtained after q
is obtained from x
.
One example is the use of a finite difference numerical hessian.
To evaluate the numerical hessian, the optimizer will need to call myjac
with many different x
s.
Since myjac
only reads the unchanged q
instead of x
, all the jacobian will be exactly the same.
I have tried letting myjac
and myhessp
calculate g
and hv
from x
instead of q
, but in my case the evaluation of q
is very time-consuming.
In contrast, obtaining g
and hv
from q
is quite fast.
Therefore, I want to stick to the strategy, and I wonder if any optimizer works with it.
The optimizer had better be a second-order one, so the convergence will be easy.
It had also better be based on a level-shifted algorithm with trust radius, because I have a good preconditioner at hand.
Many thanks in advance!