Tải bản đầy đủ (.pdf) (10 trang)

Bài giảng Tối ưu hóa nâng cao - Chương 8: Proximal gradient descent (and acceleration)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (193.63 KB, 10 trang )

<span class='text_page_counter'>(1)</span><div class='page_container' data-page=1>

Proximal Gradient Descent (and Acceleration)



Hoàng Nam Dũng


</div>
<span class='text_page_counter'>(2)</span><div class='page_container' data-page=2>

Last time: subgradient method
Consider the problem


min
x f(x)


withf convex, and dom(f) =Rn.


Subgradient method: choose an initial x(0)<sub>∈</sub>Rn, and repeat:


x(k) =x(k−1)−tk ·g(k−1), k =1,2,3, . . .


whereg(k−1)∈∂f(x(k−1)). We use pre-set rules for the step sizes
(e.g., diminshing step sizes rule).


Iff is Lipschitz, then subgradient method has a convergence rate
O(1/ε2<sub>)</sub><sub>.</sub>


</div>
<span class='text_page_counter'>(3)</span><div class='page_container' data-page=3>

Outline


Today


I Proximal gradient descent


I Convergence analysis


I ISTA, matrix completion



I Special cases


</div>
<span class='text_page_counter'>(4)</span><div class='page_container' data-page=4>

Decomposable functions
Suppose


f(x) =g(x) +h(x)


where


I g is convex, differentiable,dom(g) =Rn


I h is convex, not necessarily differentiable.


Iff were differentiable, then gradient descent update would be
x+ =x<sub>−</sub>t<sub>· ∇</sub>f(x)


Recall motivation: minimizequadratic approximationtof around
x, replace<sub>∇</sub>2f(x) by 1<sub>t</sub>I


x+= argminzf(x) +∇f(x)T(z −x) +


1


2tkz −xk


2
2


| {z }



</div>
<span class='text_page_counter'>(5)</span><div class='page_container' data-page=5>

Decomposable functions


In our casef is not differentiable, butf =g +h,g differentiable.
Why don’t we makequadratic approximation to g, leaveh alone?
I.e., update


x+= argminzg˜t(z) +h(z)


= argminzg(x) +∇g(x)T(z −x) +


1


2tkz −xk


2


2+h(z)
= argminz


1


2tkz−(x−t∇g(x))k


2


2+h(z).


1



2tkz −(x−t∇g(x))k22 stay close to gradient update forg


</div>
<span class='text_page_counter'>(6)</span><div class='page_container' data-page=6>

Proximal mapping


Theproximal mapping (orprox-operator) of a convex function h is
defined as


proxh(x) = argminz


1


2kx−zk


2


2+h(z).


Examples:


I h(x) =0:proxh(x) =x.


I h(x) is indicator function of a closed convex set C:proxh is


the projection on C


proxh(x) = argminz∈C


1


2kx−zk



2


2=PC(x).
I h(x) =kx<sub>k</sub>1:proxh is the ’soft-threshold’ (shrinkage)


operation


proxh(x)i =






xi −1 xi ≥1


0 <sub>|</sub>xi| ≤1


</div>
<span class='text_page_counter'>(7)</span><div class='page_container' data-page=7>

Proximal mapping


Theproximal mapping (orprox-operator) of a convex function h is
defined as


proxh(x) = argminz


1


2kx−zk



2


2+h(z).
Examples:


I h(x) =0:proxh(x) =x.


I h(x) is indicator function of a closed convex set C:proxh is


the projection on C


proxh(x) = argminz∈C


1


2kx−zk


2


2=PC(x).
I h(x) =kx<sub>k</sub>1:proxh is the ’soft-threshold’ (shrinkage)


operation


proxh(x)i =







xi −1 xi ≥1


0 <sub>|</sub>xi| ≤1


</div>
<span class='text_page_counter'>(8)</span><div class='page_container' data-page=8>

Proximal mapping


Theproximal mapping (orprox-operator) of a convex function h is
defined as


proxh(x) = argminz


1


2kx−zk


2


2+h(z).
Examples:


I h(x) =0:proxh(x) =x.


I h(x) is indicator function of a closed convex set C:proxh is


the projection on C


proxh(x) = argminz∈C


1



2kx−zk


2


2=PC(x).


I h(x) =kx<sub>k</sub>1:proxh is the ’soft-threshold’ (shrinkage)


operation


proxh(x)i =






xi −1 xi ≥1


0 <sub>|</sub>xi| ≤1


</div>
<span class='text_page_counter'>(9)</span><div class='page_container' data-page=9>

Proximal mapping


Theproximal mapping (orprox-operator) of a convex function h is
defined as


proxh(x) = argminz


1



2kx−zk


2


2+h(z).
Examples:


I h(x) =0:proxh(x) =x.


I h(x) is indicator function of a closed convex set C:proxh is


the projection on C


proxh(x) = argminz∈C


1


2kx−zk


2


2=PC(x).
I h(x) =kx<sub>k</sub>1:proxh is the ’soft-threshold’ (shrinkage)


operation


proxh(x)i =







xi −1 xi ≥1


0 <sub>|</sub>xi| ≤1


</div>
<span class='text_page_counter'>(10)</span><div class='page_container' data-page=10>

Proximal mapping
Theorem


Ifh is convex and closed (has closed epigraph) then


proxh(x) = argminz


1


2kx−zk


2


2+h(z).


exists and is unique for allx.
Chứng minh.


See />


proxop.pdf


Uniqueness since the objective function is strictly convex.



Optimality condition


z = proxh(x)⇔x−z ∈∂h(z)


</div>

<!--links-->
Bài giảng tối ưu hóa
  • 78
  • 695
  • 4
  • ×