I recently read a statistics paper. It has an unconstrained problem:
$$min_theta F(theta)+lambda || theta||_1$$, where $$F(theta)=L(theta)+frac{rho}{2}|h(W(theta))|^2+ alpha h(W(theta))$$
$rho$ is a penalty parameter and $alpha$ is a dual variable.
It uses the L-BFGS-B algorithm. It says the L-BFGS-B algorithm can be directly applied by casting the above problem into a box-constrained form:
$$min_theta F(theta)+lambda || theta||1 iff min{theta^+ geq0, theta^- geq0} F(theta^+-theta^-)+lambda mathbf{1}^T (theta^++theta^-) $$ where $mathbf{1}$ is a vector of all ones.
My question is why this is equivalent?
My first observation is that any real number x can be written as the difference of two non-negative numbers: $x = x^+ – x^-$, where $x^+ = max(x,0)$ and $x^- = max(-x,0)$, and we have $|x| = x^+ + x^-$. It looks like a proof.
The coding from that paper is
adj <- function(w) { # ?
w <- as.matrix(w)
w.pos <- w[1:(length(w)/2),]
w.neg <- w[((length(w)/2)+1):(length(w)),]
dim(w.pos) <- c(d,d)
dim(w.neg) <- c(d,d)
W <- w.pos - w.neg
return(W)}
But I notice that in the proof, one of $x^+$ and $x^-$ must be $0$. But in the above coding, there’s no such constraint. This is what I feel confused.
In the small proof, one of $x^+$ and $x^-$ must be $0$. Is this should a constraint that we need to explicitly enforce in the optimization problem?