Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
L. Vandenberghe
EE236C (Spring 2016)
7. Conjugate functions
• closed functions
• conjugate function
• duality
7-1
Closed set
a set C is closed if it contains its boundary:
xk ∈ C,
xk → x̄
=⇒
x̄ ∈ C
Operations that preserve closedness
• the intersection of (finitely or infinitely many) closed sets is closed
• the union of a finite number of closed sets is closed
• inverse under linear mapping: {x | Ax ∈ C} is closed if C is closed
Conjugate functions
7-2
Image under linear mapping
the image of a closed set under a linear mapping is not necessarily closed
Example
C = {(x1, x2) ∈
R2+
| x1x2 ≥ 1},
A=
1 0
,
AC = R++
Sufficient condition: AC is closed if
• C is closed and convex
• and C does not have a recession direction in the nullspace of A, i.e.,
Ay = 0,
x̂ ∈ C,
x̂ + αy ∈ C for all α ≥ 0
=⇒
y=0
in particular, this holds for any matrix A if C is bounded
Conjugate functions
7-3
Closed function
Definition: a function is closed if its epigraph is a closed set
Examples
• f (x) = − log(1 − x2) with dom f = {x | |x| < 1}
• f (x) = x log x with dom f = R+ and f (0) = 0
• indicator function of a closed set C :
δC (x) =
0
x∈C
+∞ otherwise
Not closed
• f (x) = x log x with dom f = R++, or with dom f = R+ and f (0) = 1
• indicator function of a set C if C is not closed
Conjugate functions
7-4
Properties
Sublevel sets: f is closed if and only if all its sublevel sets are closed
Minimum: if f is closed with bounded sublevel sets then it has a minimizer
Common operations on convex functions that preserve closedness
• sum: f = f1 + f2 is closed if f1 and f2 are closed
• composition with affine mapping: f = g(Ax + b) is closed if g is closed
• supremum: f (x) = supα fα(x) is closed if each function fα is closed
in each case, we assume dom f 6= ∅
Conjugate functions
7-5
Outline
• closed functions
• conjugate function
• duality
Conjugate function
the conjugate of a function f is
f ∗(y) =
sup (y T x − f (x))
x∈dom f
f ∗ is closed and convex (even when f is not)
Fenchel’s inequality: the definition implies that
f (x) + f ∗(y) ≥ xT y for all x, y
this is an extension to non-quadratic convex f of the inequality
1 T
1 T
x x + y y ≥ xT y
2
2
Conjugate functions
7-6
Quadratic function
1 T
f (x) = x Ax + bT x + c
2
Strictly convex case (A 0)
1
f ∗(y) = (y − b)T A−1(y − b) − c
2
General convex case (A 0)
1
f (y) = (y − b)T A†(y − b) − c,
2
∗
Conjugate functions
dom f ∗ = range(A) + b
7-7
Negative entropy and negative logarithm
Negative entropy
f (x) =
n
X
f ∗(y) =
xi log xi
i=1
n
X
eyi−1
i=1
Negative logarithm
f (x) = −
n
X
log xi
f ∗(y) = −
i=1
n
X
log(−yi) − n
i=1
Matrix logarithm
f (X) = − log det X (dom f = Sn++)
Conjugate functions
f ∗(Y ) = − log det(−Y ) − n
7-8
Indicator function and norm
Indicator of convex set C : conjugate is the support function of C
δC (x) =
0
x∈C
+∞ x ∈
6 C
∗
δC
(y) = sup y T x
x∈C
Indicator of convex cone C : conjugate is indicator of polar (negative dual) cone
∗
δC
(y)
= δ−C ∗ (y) = δC ∗ (−y) =
0
y T x ≤ 0 ∀x ∈ C
+∞ otherwise
Norm: conjugate is indicator of unit ball for dual norm
f (x) = kxk
∗
f (y) =
0
kyk∗ ≤ 1
+∞ kyk∗ > 1
(see next page)
Conjugate functions
7-9
Proof. recall the definition of dual norm:
kyk∗ = sup xT y
kxk≤1
to evaluate f ∗(y) = supx (y T x − kxk) we distinguish two cases
• if kyk∗ ≤ 1, then (by definition of dual norm)
y T x ≤ kxk for all x
and equality holds if x = 0; therefore supx (y T x − kxk) = 0
• if kyk∗ > 1, there exists an x with kxk ≤ 1, xT y > 1; then
f ∗(y) ≥ y T (tx) − ktxk = t(y T x − kxk)
and right-hand side goes to infinity if t → ∞
Conjugate functions
7-10
Calculus rules
Separable sum
f (x1, x2) = g(x1) + h(x2)
f ∗(y1, y2) = g ∗(y1) + h∗(y2)
Scalar multiplication (α > 0)
f (x) = αg(x)
f ∗(y) = αg ∗(y/α)
f (x) = αg(x/α)
f ∗(y) = αg ∗(y)
• the operation f (x) = αg(x/α) is sometimes called ‘right scalar multiplication’
• a convenient notation is f = gα for the function (gα)(x) = αg(x/α)
• conjugates can be written concisely as (gα)∗ = αg ∗ and (αg)∗ = g ∗α
Conjugate functions
7-11
Calculus rules
Addition to affine function
f (x) = g(x) + aT x + b
f ∗(y) = g ∗(y − a) − b
Translation of argument
f (x) = g(x − b)
f ∗(y) = bT y + g ∗(y)
Composition with invertible linear mapping (A square and nonsingular)
f (x) = g(Ax)
f ∗(y) = g ∗(A−T y)
Infimal convolution
f (x) = inf (g(u) + h(v))
u+v=x
Conjugate functions
f ∗(y) = g ∗(y) + h∗(y)
7-12
The second conjugate
f ∗∗(x) =
sup
(xT y − f ∗(y))
y∈dom f ∗
• f ∗∗ is closed and convex
• from Fenchel’s inequality, xT y − f ∗(y) ≤ f (x) for all y and x; therefore
f ∗∗(x) ≤ f (x) for all x
equivalently, epi f ⊆ epi f ∗∗ (for any f )
• if f is closed and convex, then
f ∗∗(x) = f (x) for all x
equivalently, epi f = epi f ∗∗ (if f is closed convex); proof on next page
Conjugate functions
7-13
Proof (by contradiction): assume f is closed and convex, and epi f ∗∗ 6= epi f
suppose (x, f ∗∗(x)) 6∈ epi f ; then there is a strict separating hyperplane:
a
b
T z−x
s − f ∗∗(x)
≤c<0
∀(z, s) ∈ epi f
for some a, b, c with b ≤ 0 (b > 0 gives a contradiction as s → ∞)
• if b < 0, define y = a/(−b) and maximize left-hand side over (z, s) ∈ epi f :
f ∗(y) − y T x + f ∗∗(x) ≤ c/(−b) < 0
this contradicts Fenchel’s inequality
• if b = 0, choose ŷ ∈ dom f ∗ and add small multiple of (ŷ, −1) to (a, b):
a + ŷ
−
T z−x
s − f ∗∗(x)
∗
T
∗∗
≤ c + f (ŷ) − x ŷ + f (x) < 0
now apply the argument for b < 0
Conjugate functions
7-14
Conjugates and subgradients
if f is closed and convex, then
y ∈ ∂f (x)
⇐⇒
x ∈ ∂f ∗(y)
⇐⇒
xT y = f (x) + f ∗(y)
Proof. if y ∈ ∂f (x), then f ∗(y) = supu (y T u − f (u)) = y T x − f (x); hence
f ∗(v) = sup (v T u − f (u))
u
≥ v T x − f (x)
= xT (v − y) − f (x) + y T x
= f ∗(y) + xT (v − y)
this holds for all v ; therefore, x ∈ ∂f ∗(y)
reverse implication x ∈ ∂f ∗(y) =⇒ y ∈ ∂f (x) follows from f ∗∗ = f
Conjugate functions
7-15
Conjugate of strongly convex function
assume f is closed and strongly convex with parameter µ > 0
• f ∗ is defined for all y (i.e., dom f ∗ = Rn)
• f ∗ is differentiable everywhere, with gradient
∇f ∗(y) = argmax (y T x − f (x))
x
• ∇f ∗ is Lipschitz continuous with constant 1/µ
1
k∇f (y) − ∇f (y )k2 ≤ ky − y 0k2 for all y and y 0
µ
∗
Conjugate functions
∗
0
7-16
Proof: if f is strongly convex and closed
• y T x − f (x) has a unique maximizer x for every y
• x maximizes y T x − f (x) if and only if y ∈ ∂f (x); from page 7-15
y ∈ ∂f (x)
⇐⇒
x ∈ ∂f ∗(y) = {∇f ∗(y)}
hence ∇f ∗(y) = argmaxx (y T x − f (x))
• from convexity of f (x) − (µ/2)xT x:
(y − y 0)T (x − x0) ≥ µkx − x0k22
if y ∈ ∂f (x), y 0 ∈ ∂f (x0)
• this is co-coercivity of ∇f ∗ (which implies Lipschitz continuity)
(y − y 0)T (∇f ∗(y) − ∇f ∗(y 0)) ≥ µk∇f ∗(y) − ∇f ∗(y 0)k22
Conjugate functions
7-17
Outline
• closed functions
• conjugate function
• duality
Duality
primal:
minimize f (x) + g(Ax)
dual:
maximize −g ∗(z) − f ∗(−AT z)
• follows from Lagrange duality applied to reformulated primal
minimize
subject to
f (x) + g(y)
Ax = y
dual function for the formulated problem is:
inf (f (x) + z T Ax + g(y) − z T y) = −f ∗(−AT z) − g ∗(z)
x,y
• Slater’s condition (for convex f , g ): strong duality holds if there exists an x̂ with
x̂ ∈ int dom f,
Ax̂ ∈ int dom g
this also guarantees that the dual optimum is attained, if optimal value is finite
Conjugate functions
7-18
Set constraint
minimize
subject to
f (x)
Ax − b ∈ C
Primal and dual problem
primal:
minimize f (x) + δC (Ax − b)
dual:
∗
maximize −bT z − δC
(z) − f ∗(−AT z)
Examples
constraint
set C
∗
support function δC
(z)
Ax = b
{0}
0
norm inequality
kAx − bk ≤ 1
unit k · k-ball
kzk∗
conic inequality
Ax K b
−K
δK ∗ (z)
equality
Conjugate functions
7-19
Norm regularization
minimize
f (x) + kAx − bk
• take g(y) = ky − bk in general problem
minimize
f (x) + g(Ax)
• conjugate of k · k is indicator of unit ball for dual norm
g ∗(z) = bT z + δB (z) where B = {z | kzk∗ ≤ 1}
• hence, dual problem can be written as
maximize
subject to
Conjugate functions
−bT z − f ∗(−AT z)
kzk∗ ≤ 1
7-20
Optimality conditions
minimize
subject to
f (x) + g(y)
Ax = y
assume f , g are convex and Slater’s condition holds
Optimality conditions: x is optimal if and only if there exists a z such that
1. primal feasibility: x ∈ dom f and y = Ax ∈ dom g
2. x and y = Ax are minimizers of the Lagrangian f (x) + z T Ax + g(y) − z T y :
−AT z ∈ ∂f (x),
z ∈ ∂g(Ax)
if g is closed, this can be written symmetrically as
−AT z ∈ ∂f (x),
Conjugate functions
Ax ∈ ∂g ∗(z)
7-21
References
• J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and Minimization
Algoritms (1993), chapter X.
• D.P. Bertsekas, A. Nedić, A.E. Ozdaglar, Convex Analysis and Optimization
(2003), chapter 7.
• R. T. Rockafellar, Convex Analysis (1970).
Conjugate functions
7-22