Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (384.47 KB, 10 trang )
<span class='text_page_counter'>(1)</span><div class='page_container' data-page=1>
Hoàng Nam Dũng
Consider unconstrained, smooth convex optimization
min
x f(x)
with convex and differentiable functionf :Rn→R. Denote the optimal
value byf∗ = minxf(x)and a solution byx∗.
Gradient descent: choose initial pointx(0) <sub>∈</sub>Rn, repeat:
x(k)=x(k−1)<sub>−</sub>tk <sub>· ∇</sub>f(x(k−1)<sub>)</sub><sub>,</sub> <sub>k</sub> <sub>=</sub><sub>1</sub><sub>,</sub><sub>2</sub><sub>,</sub><sub>3</sub><sub>, . . .</sub>
Consider unconstrained, smooth convex optimization
min
x f(x)
with convex and differentiable functionf :Rn→R. Denote the optimal
value byf∗ = minxf(x)and a solution byx∗.
Gradient descent: choose initial pointx(0) <sub>∈</sub>Rn, repeat:
x(k)=x(k−1)<sub>−</sub>tk<sub>· ∇</sub>f(x(k−1)<sub>)</sub><sub>,</sub> <sub>k</sub> <sub>=</sub><sub>1</sub><sub>,</sub><sub>2</sub><sub>,</sub><sub>3</sub><sub>, . . .</sub>
At each iteration, consider the expansion
f(y)<sub>≈</sub>f(x) +<sub>∇</sub>f(x)T<sub>(y</sub>
−x) + 1
2t ky−xk
2
2.
Quadratic approximation, replacing usual Hessian<sub>∇</sub>2<sub>f</sub><sub>(x)</sub><sub>by</sub> 1
tI.
f(x) +<sub>∇</sub>f(x)T<sub>(y</sub>
−x) linear approximation to f
1
2tky−xk
2
2 proximity term tox, with weight 1/2t
Choose next pointy =x+to minimize quadratic approximation
●
I How to choose step sizes
I Convergence analysis
I Nonconvex functions
Simply taketk =t for allk =1,2,3, . . ., candivergeift is too big.
Considerf(x) = (10x2
1+x22)/2, gradient descent after 8 steps:
Can beslowift is too small. Same example, gradient descent after 100
steps:
−20
−10
0
10
20 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●