Bài giảng Tối ưu hóa nâng cao - Chương 8: Proximal gradient descent (and acceleration)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (193.63 KB, 10 trang )

Proximal Gradient Descent (and Acceleration)

Hoàng Nam Dũng

</div>
<span class='text_page_counter'>(2)</span><div class='page_container' data-page=2>

Last time: subgradient method
Consider the problem

min
x f(x)

withf convex, and dom(f) =Rn.

Subgradient method: choose an initial x(0)<sub>∈</sub>Rn, and repeat:

x(k) =x(k−1)−tk ·g(k−1), k =1,2,3, . . .

whereg(k−1)∈∂f(x(k−1)). We use pre-set rules for the step sizes
(e.g., diminshing step sizes rule).

Iff is Lipschitz, then subgradient method has a convergence rate
O(1/ε2<sub>)</sub><sub>.</sub>

</div>
<span class='text_page_counter'>(3)</span><div class='page_container' data-page=3>

Outline

Today

I Proximal gradient descent

I Convergence analysis

I ISTA, matrix completion

I Special cases

</div>
<span class='text_page_counter'>(4)</span><div class='page_container' data-page=4>

Decomposable functions
Suppose

f(x) =g(x) +h(x)

where

I g is convex, differentiable,dom(g) =Rn

I h is convex, not necessarily differentiable.

Iff were differentiable, then gradient descent update would be
x+ =x<sub>−</sub>t<sub>· ∇</sub>f(x)

Recall motivation: minimizequadratic approximationtof around
x, replace<sub>∇</sub>2f(x) by 1<sub>t</sub>I

x+= argminzf(x) +∇f(x)T(z −x) +

2tkz −xk

2
2

| {z }

</div>
<span class='text_page_counter'>(5)</span><div class='page_container' data-page=5>

Decomposable functions

In our casef is not differentiable, butf =g +h,g differentiable.
Why don’t we makequadratic approximation to g, leaveh alone?
I.e., update

x+= argminzg˜t(z) +h(z)

= argminzg(x) +∇g(x)T(z −x) +

2tkz −xk

2+h(z)
= argminz

2tkz−(x−t∇g(x))k

2+h(z).

2tkz −(x−t∇g(x))k22 stay close to gradient update forg

</div>
<span class='text_page_counter'>(6)</span><div class='page_container' data-page=6>

Proximal mapping

Theproximal mapping (orprox-operator) of a convex function h is
defined as

proxh(x) = argminz

2kx−zk

2+h(z).

Examples:

I h(x) =0:proxh(x) =x.

I h(x) is indicator function of a closed convex set C:proxh is

the projection on C

proxh(x) = argminz∈C

2kx−zk

2=PC(x).
I h(x) =kx<sub>k</sub>1:proxh is the ’soft-threshold’ (shrinkage)

operation

proxh(x)i =






xi −1 xi ≥1

0 <sub>|</sub>xi| ≤1

</div>
<span class='text_page_counter'>(7)</span><div class='page_container' data-page=7>

Proximal mapping

Theproximal mapping (orprox-operator) of a convex function h is
defined as

proxh(x) = argminz

2kx−zk

2+h(z).
Examples:

I h(x) =0:proxh(x) =x.

I h(x) is indicator function of a closed convex set C:proxh is

the projection on C

proxh(x) = argminz∈C

2kx−zk

2=PC(x).
I h(x) =kx<sub>k</sub>1:proxh is the ’soft-threshold’ (shrinkage)

operation

proxh(x)i =






xi −1 xi ≥1

0 <sub>|</sub>xi| ≤1

</div>
<span class='text_page_counter'>(8)</span><div class='page_container' data-page=8>

Proximal mapping

Theproximal mapping (orprox-operator) of a convex function h is
defined as

proxh(x) = argminz

2kx−zk

2+h(z).
Examples:

I h(x) =0:proxh(x) =x.

I h(x) is indicator function of a closed convex set C:proxh is

the projection on C

proxh(x) = argminz∈C

2kx−zk

2=PC(x).

I h(x) =kx<sub>k</sub>1:proxh is the ’soft-threshold’ (shrinkage)

operation

proxh(x)i =






xi −1 xi ≥1

0 <sub>|</sub>xi| ≤1

</div>
<span class='text_page_counter'>(9)</span><div class='page_container' data-page=9>

Proximal mapping

Theproximal mapping (orprox-operator) of a convex function h is
defined as

proxh(x) = argminz

2kx−zk

2+h(z).
Examples:

I h(x) =0:proxh(x) =x.

I h(x) is indicator function of a closed convex set C:proxh is

the projection on C

proxh(x) = argminz∈C

2kx−zk

2=PC(x).
I h(x) =kx<sub>k</sub>1:proxh is the ’soft-threshold’ (shrinkage)

operation

proxh(x)i =







xi −1 xi ≥1

0 <sub>|</sub>xi| ≤1

</div>
<span class='text_page_counter'>(10)</span><div class='page_container' data-page=10>

Proximal mapping
Theorem

Ifh is convex and closed (has closed epigraph) then

proxh(x) = argminz

2kx−zk

2+h(z).

exists and is unique for allx.
Chứng minh.

See />

proxop.pdf

Uniqueness since the objective function is strictly convex.

Optimality condition

z = proxh(x)⇔x−z ∈∂h(z)

</div>


Bài giảng tối ưu hóa

đồ án tốt nghiệp tối ưu hóa, nâng cao chất lượng mạng ngoại vi cáp đồng

Bài giảng lập trình c nâng cao chương 1 mảng một chiều (nâng cao)

Bài giảng lập trình c nâng cao chương 2 ma trận

Bài giảng lập trình c nâng cao chương 4 kỹ thuật xử lý chuỗi ký tự

Bài giảng lập trình c nâng cao chương 5 kiểu dữ liệu có cấu trúc

Bài giảng lập trình java nâng cao chương 1 GV lê tân

Bài giảng tối ưu hóa

Bài giảng tối ưu hóa CHƯƠNG 1 bài TOÁN QUY HOẠCH TUYẾN TÍNH

Bài giảng tối ưu hóa chương 2 PHƯƠNG PHÁP ví dụ

Bài giảng Tối ưu hóa nâng cao - Chương 8: Proximal gradient descent (and acceleration)

Proximal Gradient Descent (and Acceleration)

Tài liệu bạn tìm kiếm đã sẵn sàng tải về