Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (193.63 KB, 10 trang )
<span class='text_page_counter'>(1)</span><div class='page_container' data-page=1>
Hoàng Nam Dũng
Last time: subgradient method
Consider the problem
min
x f(x)
withf convex, and dom(f) =Rn.
Subgradient method: choose an initial x(0)<sub>∈</sub>Rn, and repeat:
x(k) =x(k−1)−tk ·g(k−1), k =1,2,3, . . .
whereg(k−1)∈∂f(x(k−1)). We use pre-set rules for the step sizes
(e.g., diminshing step sizes rule).
Iff is Lipschitz, then subgradient method has a convergence rate
O(1/ε2<sub>)</sub><sub>.</sub>
Outline
Today
I Proximal gradient descent
I Convergence analysis
I ISTA, matrix completion
I Special cases
Decomposable functions
Suppose
f(x) =g(x) +h(x)
where
I g is convex, differentiable,dom(g) =Rn
I h is convex, not necessarily differentiable.
Iff were differentiable, then gradient descent update would be
x+ =x<sub>−</sub>t<sub>· ∇</sub>f(x)
Recall motivation: minimizequadratic approximationtof around
x, replace<sub>∇</sub>2f(x) by 1<sub>t</sub>I
x+= argminzf(x) +∇f(x)T(z −x) +
1
2tkz −xk
2
2
| {z }
Decomposable functions
In our casef is not differentiable, butf =g +h,g differentiable.
Why don’t we makequadratic approximation to g, leaveh alone?
I.e., update
x+= argminzg˜t(z) +h(z)
= argminzg(x) +∇g(x)T(z −x) +
1
2tkz −xk
2
2+h(z)
= argminz
1
2tkz−(x−t∇g(x))k
2
2+h(z).
1
2tkz −(x−t∇g(x))k22 stay close to gradient update forg
Proximal mapping
Theproximal mapping (orprox-operator) of a convex function h is
defined as
proxh(x) = argminz
1
2kx−zk
2
2+h(z).
Examples:
I h(x) =0:proxh(x) =x.
I h(x) is indicator function of a closed convex set C:proxh is
the projection on C
proxh(x) = argminz∈C
1
2kx−zk
2
2=PC(x).
I h(x) =kx<sub>k</sub>1:proxh is the ’soft-threshold’ (shrinkage)
operation
proxh(x)i =
xi −1 xi ≥1
0 <sub>|</sub>xi| ≤1
Proximal mapping
Theproximal mapping (orprox-operator) of a convex function h is
defined as
proxh(x) = argminz
1
2kx−zk
2
2+h(z).
Examples:
I h(x) =0:proxh(x) =x.
I h(x) is indicator function of a closed convex set C:proxh is
the projection on C
proxh(x) = argminz∈C
1
2kx−zk
2
2=PC(x).
I h(x) =kx<sub>k</sub>1:proxh is the ’soft-threshold’ (shrinkage)
operation
proxh(x)i =
xi −1 xi ≥1
0 <sub>|</sub>xi| ≤1
Proximal mapping
Theproximal mapping (orprox-operator) of a convex function h is
defined as
proxh(x) = argminz
1
2kx−zk
2
2+h(z).
Examples:
I h(x) =0:proxh(x) =x.
I h(x) is indicator function of a closed convex set C:proxh is
the projection on C
proxh(x) = argminz∈C
1
2kx−zk
2
2=PC(x).
I h(x) =kx<sub>k</sub>1:proxh is the ’soft-threshold’ (shrinkage)
operation
proxh(x)i =
xi −1 xi ≥1
0 <sub>|</sub>xi| ≤1
Proximal mapping
Theproximal mapping (orprox-operator) of a convex function h is
defined as
proxh(x) = argminz
1
2kx−zk
2
2+h(z).
Examples:
I h(x) =0:proxh(x) =x.
I h(x) is indicator function of a closed convex set C:proxh is
the projection on C
proxh(x) = argminz∈C
1
2kx−zk
2
2=PC(x).
I h(x) =kx<sub>k</sub>1:proxh is the ’soft-threshold’ (shrinkage)
operation
proxh(x)i =
xi −1 xi ≥1
0 <sub>|</sub>xi| ≤1
Proximal mapping
Theorem
Ifh is convex and closed (has closed epigraph) then
proxh(x) = argminz
1
2kx−zk
2
2+h(z).
exists and is unique for allx.
Chứng minh.
See />
proxop.pdf
Uniqueness since the objective function is strictly convex.
Optimality condition
z = proxh(x)⇔x−z ∈∂h(z)