Tải bản đầy đủ (.pdf) (10 trang)

Bài giảng Tối ưu hóa nâng cao - Chương 5: Gradient descent

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (384.47 KB, 10 trang )

<span class='text_page_counter'>(1)</span><div class='page_container' data-page=1>

Gradient Descent



Hoàng Nam Dũng


</div>
<span class='text_page_counter'>(2)</span><div class='page_container' data-page=2>

Gradient descent



Consider unconstrained, smooth convex optimization


min
x f(x)


with convex and differentiable functionf :Rn→R. Denote the optimal


value byf∗ = minxf(x)and a solution byx∗.


Gradient descent: choose initial pointx(0) <sub>∈</sub>Rn, repeat:
x(k)=x(k−1)<sub>−</sub>tk <sub>· ∇</sub>f(x(k−1)<sub>)</sub><sub>,</sub> <sub>k</sub> <sub>=</sub><sub>1</sub><sub>,</sub><sub>2</sub><sub>,</sub><sub>3</sub><sub>, . . .</sub>


</div>
<span class='text_page_counter'>(3)</span><div class='page_container' data-page=3>

Gradient descent



Consider unconstrained, smooth convex optimization


min
x f(x)


with convex and differentiable functionf :Rn→R. Denote the optimal


value byf∗ = minxf(x)and a solution byx∗.


Gradient descent: choose initial pointx(0) <sub>∈</sub>Rn, repeat:
x(k)=x(k−1)<sub>−</sub>tk<sub>· ∇</sub>f(x(k−1)<sub>)</sub><sub>,</sub> <sub>k</sub> <sub>=</sub><sub>1</sub><sub>,</sub><sub>2</sub><sub>,</sub><sub>3</sub><sub>, . . .</sub>



</div>
<span class='text_page_counter'>(4)</span><div class='page_container' data-page=4>








</div>
<span class='text_page_counter'>(5)</span><div class='page_container' data-page=5>




</div>
<span class='text_page_counter'>(6)</span><div class='page_container' data-page=6>

Gradient descent interpretation



At each iteration, consider the expansion


f(y)<sub>≈</sub>f(x) +<sub>∇</sub>f(x)T<sub>(y</sub>


−x) + 1


2t ky−xk


2
2.


Quadratic approximation, replacing usual Hessian<sub>∇</sub>2<sub>f</sub><sub>(x)</sub><sub>by</sub> 1


tI.
f(x) +<sub>∇</sub>f(x)T<sub>(y</sub>


−x) linear approximation to f



1


2tky−xk


2


2 proximity term tox, with weight 1/2t


Choose next pointy =x+to minimize quadratic approximation


</div>
<span class='text_page_counter'>(7)</span><div class='page_container' data-page=7>

Gradient descent interpretation





</div>
<span class='text_page_counter'>(8)</span><div class='page_container' data-page=8>

Outline



I How to choose step sizes


I Convergence analysis


I Nonconvex functions


</div>
<span class='text_page_counter'>(9)</span><div class='page_container' data-page=9>

Fixed step size



Simply taketk =t for allk =1,2,3, . . ., candivergeift is too big.


Considerf(x) = (10x2


1+x22)/2, gradient descent after 8 steps:



Fixed step size



Simply take

t

k

=

t

for all

k

= 1,

2,

3, . . .

, can

diverge

if

t

is too big.



Consider

f

(x) = (10x

2<sub>1</sub>

+

x

2<sub>2</sub>

)/2

, gradient descent after 8 steps:



</div>
<span class='text_page_counter'>(10)</span><div class='page_container' data-page=10>

Fixed step size



Can beslowift is too small. Same example, gradient descent after 100


steps:


Can be

slow

if

t

is too small. Same example, gradient descent after


100 steps:



−20


−10


0


10


20 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


</div>

<!--links-->

×