HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
MASTER’S GRADUATION THESIS
Mang l■i tr■ nghi■m m■i m■ cho ng■■i dùng, công ngh■ hi■n th■ hi■n ■■i, b■n online khơng khác gì so v■i b■n g■c. B■n có th■ phóng to, thu nh■ tùy ý.
Improving Evolutionary Algorithm for
Document Extractive Summarization
NGUYEN THI HOAI
Major: Data Science and Artificial Intelligence
Thesis advisor:
Dr. Nguyen Thi Thu Trang
Institute:
School of Information and Communication Technology
HA NOI, 2021
123doc
Xu■t
Sau
Nhi■u
h■n
phát
event
s■
m■t
t■
h■u
thú
ýn■m
t■■ng
m■t
v■,raevent
kho
■■i,
t■oth■
c■ng
ki■m
123doc
vi■n
■■ng
ti■n
kh■ng
■ãthi■t
t■ng
ki■m
l■
th■c.
b■■c
v■i
ti■nh■n
123doc
online
kh■ng
2.000.000
b■ng
ln
■■nh
ln
tàitài
v■
li■u
t■o
li■u
tríhi■u
c■
c■a
■ t■t
h■i
qu■
mình
c■
gianh■t,
trong
l■nh
t■nguy
v■c:
l■nh
thu
tínnh■p
tài
v■c
cao
chính
nh■t.
tài
online
li■u
tínMong
cho
d■ng,
và kinh
t■t
mu■n
cơng
c■
doanh
các
mang
ngh■
online.
thành
l■i
thơng
cho
viên
Tính
tin,
c■ng
c■a
■■n
ngo■i
website.
■■ng
th■i
ng■,...Khách
■i■m
xã h■itháng
m■thàng
ngu■n
5/2014;
có th■
tài
123doc
ngun
d■ dàng
v■■t
tri tra
th■c
m■c
c■u
q
100.000
tàibáu,
li■uphong
m■t
l■■t cách
truy
phú,c■p
chính
■am■i
d■ng,
xác,
ngày,
nhanh
giàus■
giá
chóng.
h■u
tr■ 2.000.000
■■ng th■ithành
mongviên
mu■n
■■ng
t■oký,
■i■u
l■t ki■n
vào top
cho200
chocác
cácwebsite
users cóph■
thêm
bi■n
thunh■t
nh■p.
t■iChính
Vi■t Nam,
vì v■yt■123doc.net
l■ tìm ki■m
ra thu■c
■■i nh■m
top 3■áp
Google.
■ng Nh■n
nhu c■u
■■■c
chiadanh
s■ tài
hi■u
li■udo
ch■t
c■ng
l■■ng
■■ng
vàbình
ki■mch■n
ti■n là
online.
website ki■m ti■n online hi■u qu■ và uy tín nh■t.
Nhi■u
123doc
Sau
Th■a
khi
thu■n
event
s■
cam
nh■n
h■u
k■t
s■
thú
xác
m■t
d■ng
v■,
s■
nh■n
mang
event
kho
1. t■
th■
l■i
ki■m
■■ng
CH■P
vi■n
nh■ng
ti■n
h■
kh■ng
NH■N
quy■n
th■ng
thi■tl■
CÁC
th■c.
s■
l■i
v■ichuy■n
■I■U
t■t
h■n
123doc
nh■t
2.000.000
KHO■N
sang
ln
cho ng■■i
ph■n
ln
TH■A
tàit■o
li■u
thơng
dùng.
THU■N
c■
■ tin
t■t
h■i
Khixác
c■
khách
giaminh
l■nh
t■ng
Chào
hàng
tài
v■c:
thu
m■ng
kho■n
tr■
nh■p
tài thành
b■n
chính
email
online
■■n
thành
tínb■n
cho
d■ng,
v■i
viên
■ã
t■t
123doc.
123doc.net!
cơng
■■ng
c■a
c■ các
ngh■
123doc
kýthành
v■i
Chúng
thơng
và
123doc.netLink
viên
n■p
tơi
tin,
c■a
cung
ti■n
ngo■i
website.
vào
c■p
ng■,...Khách
xác
tài
D■ch
kho■n
th■c
V■
s■
c■a
(nh■
hàng
■■■c
123doc,
■■■c
cóg■i
th■v■
mơ
b■n
d■■■a
t■
dàng
s■
d■■i
■■■c
ch■
tra■ây)
email
c■u
h■■ng
cho
tài
b■n
li■u
b■n,
nh■ng
■ã
m■t
tùy
■■ng
quy■n
cách
thu■c
ky,
chính
l■i
b■n
vàosau
xác,
các
vuin■p
lịng
“■i■u
nhanh
ti■n
■■ng
Kho■n
chóng.
trên
nh■p
website
Th■a
email
Thu■n
c■a v■
mình
S■vàD■ng
click D■ch
vào link
V■”
123doc
sau ■ây
■ã (sau
g■i ■ây ■■■c g■i t■t T■i t■ng th■i ■i■m, chúng tơi có th■ c■p nh■t ■KTTSDDV theo quy■t ...
Nhi■u
Mang
Ln
123doc
Th■a
Xu■t
Sau
khi
h■n
h■■ng
phát
thu■n
l■i
event
s■
cam
nh■n
m■t
tr■
t■
h■u
k■t
s■
thú
nghi■m
t■i
ýxác
n■m
t■■ng
m■t
d■ng
v■,
là
s■
nh■n
website
ra
mang
event
kho
m■i
■■i,
1.
t■o
t■
th■
m■
l■i
c■ng
ki■m
■■ng
d■n
123doc
CH■P
vi■n
nh■ng
cho
■■u
■■ng
ti■n
h■
kh■ng
ng■■i
NH■N
■ã
quy■n
th■ng
thi■t
chia
t■ng
ki■m
dùng,
l■
CÁC
s■
th■c.
s■
l■i
b■■c
v■i
ti■n
vàchuy■n
■I■U
t■t
cơng
h■n
mua
123doc
online
kh■ng
nh■t
2.000.000
ngh■
bán
KHO■N
sang
b■ng
ln
cho
tài
■■nh
hi■n
ng■■i
li■u
ph■n
ln
tài
TH■A
tài
v■
th■
li■u
hàng
t■o
li■u
thơng
dùng.
tríhi■n
THU■N
hi■u
c■
c■a
■■u
■ tin
t■t
h■i
Khi
■■i,
qu■
mình
Vi■t
xác
c■
khách
gia
b■n
nh■t,
minh
trong
l■nh
Nam.
t■ng
Chào
online
hàng
uy
tài
v■c:
l■nh
thu
Tác
m■ng
tín
kho■n
tr■
nh■p
khơng
tài
phong
v■c
cao
thành
b■n
chính
email
nh■t.
tài
online
khác
chun
■■n
li■u
thành
tínb■n
Mong
gì
cho
d■ng,
và
v■i
so
nghi■p,
viên
kinh
■ã
t■t
123doc.
123doc.net!
v■i
mu■n
cơng
■■ng
c■a
c■
doanh
b■n
các
hồn
mang
ngh■
123doc
ký
g■c.
online.
thành
v■i
h■o,
Chúng
l■i
thơng
B■n
và
123doc.netLink
cho
viên
Tính
■■
n■p
có
tơi
tin,
c■ng
c■a
cao
th■
■■n
cung
ti■n
ngo■i
tính
website.
phóng
■■ng
th■i
vào
c■p
ng■,...Khách
trách
xác
tài
■i■m
D■ch
xã
to,kho■n
th■c
nhi■m
h■i
thutháng
V■
nh■
m■t
s■
c■a
(nh■
■■i
hàng
■■■c
tùy
ngu■n
5/2014;
123doc,
v■i
■■■c
ý.
cóg■i
t■ng
th■
tài
123doc
v■
mơ
ngun
b■n
d■
ng■■i
■■a
t■
dàng
s■
v■■t
d■■i
tri
dùng.
■■■c
ch■
tra
th■c
m■c
■ây)
email
c■u
M■c
h■■ng
q
100.000
cho
tài
b■n
tiêu
báu,
li■u
b■n,
nh■ng
■ã
hàng
phong
m■t
l■■t
tùy
■■ng
■■u
quy■n
cách
truy
thu■c
phú,
ky,
c■a
c■p
chính
■a
l■i
b■n
vào
123doc.net
m■i
d■ng,
sau
xác,
các
vuingày,
n■p
lịng
“■i■u
nhanh
giàu
ti■n
s■
■■ng
tr■
giá
Kho■n
chóng.
h■u
trên
thành
tr■
nh■p
2.000.000
website
■■ng
Th■a
th■
email
vi■n
th■i
Thu■n
c■a
thành
mong
tài v■
li■u
mình
viên
mu■n
S■
online
và
■■ng
D■ng
click
t■o
l■n
ký,
D■ch
■i■u
vào
nh■t
l■t
link
ki■n
V■”
vào
Vi■t
123doc
top
sau
cho
Nam,
200
■ây
cho
■ã
cung
các
các
(sau
g■iwebsite
c■p
users
■âynh■ng
■■■c
cóph■
thêm
tài
bi■n
g■i
thu
li■u
t■t
nh■t
nh■p.
■■c
T■it■i
khơng
t■ng
Chính
Vi■tth■i
th■
Nam,
vì v■y
■i■m,
tìm
t■123doc.net
th■y
l■chúng
tìm
trên
ki■m
tơi
th■
racóthu■c
■■i
tr■■ng
th■nh■m
c■p
top
ngo■i
3nh■t
■áp
Google.
tr■
■KTTSDDV
■ng
123doc.net.
Nh■n
nhu c■u
■■■c
theo
chiaquy■t
danh
s■ tài
hi■u
...li■udo
ch■t
c■ng
l■■ng
■■ng
vàbình
ki■mch■n
ti■n là
online.
website ki■m ti■n online hi■u qu■ và uy tín nh■t.
Mangh■n
Ln
123doc
Th■a
Xu■t
Sau
Nhi■u
khi
h■■ng
phát
thu■n
l■i
event
s■
cam
nh■n
m■t
tr■
t■
h■u
k■t
s■
thú
nghi■m
t■i
ýxác
n■m
t■■ng
m■t
d■ng
v■,
là
s■
nh■n
website
ra
mang
event
kho
m■i
■■i,
1.
t■o
t■
th■
m■
l■i
c■ng
ki■m
■■ng
d■n
123doc
CH■P
vi■n
nh■ng
cho
■■u
■■ng
ti■n
h■
kh■ng
ng■■i
NH■N
■ã
quy■n
th■ng
thi■t
chia
t■ng
ki■m
dùng,
l■
CÁC
s■
th■c.
s■
l■i
b■■c
v■i
ti■n
vàchuy■n
■I■U
t■t
cơng
h■n
mua
123doc
online
kh■ng
nh■t
2.000.000
ngh■
bán
KHO■N
sang
b■ng
ln
cho
tài
■■nh
hi■n
ng■■i
li■u
ph■n
ln
tài
TH■A
tài
v■
th■
li■u
hàng
t■o
li■u
thơng
dùng.
tríhi■n
THU■N
hi■u
c■
c■a
■■u
■ tin
t■t
h■i
Khi
■■i,
qu■
mình
Vi■t
xác
c■
khách
gia
b■n
nh■t,
minh
trong
l■nh
Nam.
t■ng
Chào
online
hàng
uy
tài
v■c:
l■nh
thu
Tác
m■ng
tín
kho■n
tr■
nh■p
khơng
tài
phong
v■c
cao
thành
b■n
chính
email
nh■t.
tài
online
khác
chun
■■n
li■u
thành
tínb■n
Mong
gì
cho
d■ng,
và
v■i
so
nghi■p,
viên
kinh
■ã
t■t
123doc.
123doc.net!
v■i
mu■n
cơng
■■ng
c■a
c■
doanh
b■n
các
hồn
mang
ngh■
123doc
ký
g■c.
online.
thành
v■i
h■o,
Chúng
l■i
thơng
B■n
và
123doc.netLink
cho
viên
Tính
■■
n■p
có
tơi
tin,
c■ng
c■a
cao
th■
■■n
cung
ti■n
ngo■i
tính
website.
phóng
■■ng
th■i
vào
c■p
ng■,...Khách
trách
xác
tài
■i■m
D■ch
xã
to,kho■n
th■c
nhi■m
h■i
thutháng
V■
nh■
m■t
s■
c■a
(nh■
■■i
hàng
■■■c
tùy
ngu■n
5/2014;
123doc,
v■i
■■■c
ý.
cóg■i
t■ng
th■
tài
123doc
v■
mơ
ngun
b■n
d■
ng■■i
■■a
t■
dàng
s■
v■■t
d■■i
tri
dùng.
■■■c
ch■
tra
th■c
m■c
■ây)
email
c■u
M■c
h■■ng
q
100.000
cho
tài
b■n
tiêu
báu,
li■u
b■n,
nh■ng
■ã
hàng
phong
m■t
l■■t
tùy
■■ng
■■u
quy■n
cách
truy
thu■c
phú,
ky,
c■a
c■p
chính
■a
l■i
b■n
vào
123doc.net
m■i
d■ng,
sau
xác,
các
vuingày,
n■p
lịng
“■i■u
nhanh
giàu
ti■n
s■
■■ng
tr■
giá
Kho■n
chóng.
h■u
trên
thành
tr■
nh■p
2.000.000
website
■■ng
Th■a
th■
email
vi■n
th■i
Thu■n
c■a
thành
mong
tài v■
li■u
mình
viên
mu■n
S■
online
và
■■ng
D■ng
click
t■o
l■n
ký,
D■ch
■i■u
vào
nh■t
l■t
link
ki■n
V■”
vào
Vi■t
123doc
top
sau
cho
Nam,
200
■ây
cho
■ã
cung
các
các
(sau
g■iwebsite
c■p
users
■âynh■ng
■■■c
cóph■
thêm
tài
bi■n
g■i
thu
li■u
t■t
nh■t
nh■p.
■■c
T■it■i
khơng
t■ng
Chính
Vi■tth■i
th■
Nam,
vì v■y
■i■m,
tìm
t■123doc.net
th■y
l■chúng
tìm
trên
ki■m
tơi
th■
racóthu■c
■■i
tr■■ng
th■nh■m
c■p
top
ngo■i
3nh■t
■áp
Google.
tr■
■KTTSDDV
■ng
123doc.net.
Nh■n
nhu c■u
■■■c
theo
chiaquy■t
danh
s■ tài
hi■u
...li■udo
ch■t
c■ng
l■■ng
■■ng
vàbình
ki■mch■n
ti■n là
online.
website ki■m ti■n online hi■u qu■ và uy tín nh■t.
Lnh■n
123doc
Th■a
Xu■t
Sau
khi
h■■ng
phát
thu■n
cam
nh■n
m■t
t■k■t
s■
t■i
ýxác
n■m
t■■ng
d■ng
là
s■
nh■n
website
ra
mang
■■i,
1.
t■o
t■l■i
c■ng
■■ng
d■n
123doc
CH■P
nh■ng
■■u
■■ng
h■
NH■N
■ã
quy■n
th■ng
chia
t■ng
ki■m
CÁC
s■s■
l■i
b■■c
ti■n
vàchuy■n
■I■U
t■t
mua
online
kh■ng
nh■t
bán
KHO■N
sang
b■ng
cho
tài
■■nh
ng■■i
li■u
ph■n
tài
TH■A
v■
li■u
hàng
thơng
dùng.
tríTHU■N
hi■u
c■a
■■u
tin
Khi
qu■
mình
Vi■t
xác
khách
nh■t,
minh
trong
Nam.
Chào
hàng
uy
tài
l■nh
Tác
m■ng
tín
kho■n
tr■
phong
v■c
cao
thành
b■n
email
nh■t.
tàichun
■■n
li■u
thành
b■n
Mong
và
v■i
nghi■p,
viên
kinh
■ã
123doc.
123doc.net!
mu■n
■■ng
c■a
doanh
hồn
mang
123doc
kýonline.
v■i
h■o,
Chúng
l■ivà
123doc.netLink
cho
Tính
■■
n■p
tơi
c■ng
cao
■■n
cung
ti■n
tính
■■ng
th■i
vào
c■p
trách
xác
tài
■i■m
D■ch
xãkho■n
th■c
nhi■m
h■itháng
V■
m■t
s■
c■a
(nh■
■■i
■■■c
ngu■n
5/2014;
123doc,
v■i
■■■c
g■i
t■ng
tài
123doc
v■
mơ
ngun
b■n
ng■■i
■■a
t■s■
v■■t
d■■i
tri
dùng.
■■■c
ch■
th■c
m■c
■ây)
email
M■c
h■■ng
q
100.000
cho
b■n
tiêu
báu,
b■n,
nh■ng
■ã
hàng
phong
l■■t
tùy
■■ng
■■u
quy■n
truy
thu■c
phú,
ky,
c■a
c■p
■a
l■i
b■n
vào
123doc.net
m■i
d■ng,
sau
các
vuingày,
n■p
lịng
“■i■u
giàu
ti■n
s■
■■ng
tr■
giá
Kho■n
h■u
trên
thành
tr■
nh■p
2.000.000
website
■■ng
Th■a
th■
email
vi■n
th■i
Thu■n
c■a
thành
mong
tài v■
li■u
mình
viên
mu■n
S■
online
và
■■ng
D■ng
click
t■o
l■n
ký,
D■ch
■i■u
vào
nh■t
l■t
link
ki■n
V■”
vào
Vi■t
123doc
top
sau
cho
Nam,
200
■ây
cho
■ã
cung
các
các
(sau
g■iwebsite
c■p
users
■âynh■ng
■■■c
cóph■
thêm
tài
bi■n
g■i
thu
li■u
t■t
nh■t
nh■p.
■■c
T■it■i
khơng
t■ng
Chính
Vi■tth■i
th■
Nam,
vì v■y
■i■m,
tìm
t■123doc.net
th■y
l■chúng
tìm
trên
ki■m
tơi
th■
racóthu■c
■■i
tr■■ng
th■nh■m
c■p
top
ngo■i
3nh■t
■áp
Google.
tr■
■KTTSDDV
■ng
123doc.net.
Nh■n
nhu c■u
■■■c
theo
chiaquy■t
danh
s■ tài
hi■u
...li■udo
ch■t
c■ng
l■■ng
■■ng
vàbình
ki■mch■n
ti■n là
online.
website ki■m ti■n online hi■u qu■ và uy tín nh■t.
Lnh■n
Th■a
Xu■t
Sau
Nhi■u
123doc
Mang
khi
h■■ng
phát
thu■n
l■i
event
cam
s■
nh■n
m■t
tr■
t■
h■u
k■t
s■
thú
nghi■m
t■i
ýxác
n■m
t■■ng
m■t
d■ng
v■,
là
s■
nh■n
website
ra
mang
event
kho
m■i
■■i,
1.
t■o
t■
th■
m■
l■i
c■ng
ki■m
■■ng
d■n
123doc
CH■P
vi■n
nh■ng
cho
■■u
■■ng
ti■n
h■
kh■ng
ng■■i
NH■N
■ã
quy■n
th■ng
thi■t
chia
t■ng
ki■m
dùng,
l■
CÁC
s■
th■c.
s■
l■i
b■■c
v■i
ti■n
vàchuy■n
■I■U
t■t
cơng
h■n
mua
123doc
online
kh■ng
nh■t
2.000.000
ngh■
bán
KHO■N
sang
b■ng
ln
cho
tài
■■nh
hi■n
ng■■i
li■u
ph■n
ln
tài
TH■A
tài
v■
th■
li■u
hàng
t■o
li■u
thơng
dùng.
tríhi■n
THU■N
hi■u
c■
c■a
■■u
■ tin
t■t
h■i
Khi
■■i,
qu■
mình
Vi■t
xác
c■
khách
gia
b■n
nh■t,
minh
trong
l■nh
Nam.
t■ng
Chào
online
hàng
uy
tài
v■c:
l■nh
thu
Tác
m■ng
tín
kho■n
tr■
nh■p
khơng
tài
phong
v■c
cao
thành
b■n
chính
email
nh■t.
tài
online
khác
chun
■■n
li■u
thành
tínb■n
Mong
gì
cho
d■ng,
và
v■i
so
nghi■p,
viên
kinh
■ã
t■t
123doc.
123doc.net!
v■i
mu■n
cơng
■■ng
c■a
c■
doanh
b■n
các
hồn
mang
ngh■
123doc
ký
g■c.
online.
thành
v■i
h■o,
Chúng
l■i
thơng
B■n
và
123doc.netLink
cho
viên
Tính
■■
n■p
có
tơi
tin,
c■ng
c■a
cao
th■
■■n
cung
ti■n
ngo■i
tính
website.
phóng
■■ng
th■i
vào
c■p
ng■,...Khách
trách
xác
tài
■i■m
D■ch
xã
to,kho■n
th■c
nhi■m
h■i
thutháng
V■
nh■
m■t
s■
c■a
(nh■
■■i
hàng
■■■c
tùy
ngu■n
5/2014;
123doc,
v■i
■■■c
ý.
cóg■i
t■ng
th■
tài
123doc
v■
mơ
ngun
b■n
d■
ng■■i
■■a
t■
dàng
s■
v■■t
d■■i
tri
dùng.
■■■c
ch■
tra
th■c
m■c
■ây)
email
c■u
M■c
h■■ng
q
100.000
cho
tài
b■n
tiêu
báu,
li■u
b■n,
nh■ng
■ã
hàng
phong
m■t
l■■t
tùy
■■ng
■■u
quy■n
cách
truy
thu■c
phú,
ky,
c■a
c■p
chính
■a
l■i
b■n
vào
123doc.net
m■i
d■ng,
sau
xác,
các
vuingày,
n■p
lịng
“■i■u
nhanh
giàu
ti■n
s■
■■ng
tr■
giá
Kho■n
chóng.
h■u
trên
thành
tr■
nh■p
2.000.000
website
■■ng
Th■a
th■
email
vi■n
th■i
Thu■n
c■a
thành
mong
tài v■
li■u
mình
viên
mu■n
S■
online
và
■■ng
D■ng
click
t■o
l■n
ký,
D■ch
■i■u
vào
nh■t
l■t
link
ki■n
V■”
vào
Vi■t
123doc
top
sau
cho
Nam,
200
■ây
cho
■ã
cung
các
các
(sau
g■iwebsite
c■p
users
■âynh■ng
■■■c
cóph■
thêm
tài
bi■n
g■i
thu
li■u
t■t
nh■t
nh■p.
■■c
T■it■i
khơng
t■ng
Chính
Vi■tth■i
th■
Nam,
vì v■y
■i■m,
tìm
t■123doc.net
th■y
l■chúng
tìm
trên
ki■m
tơi
th■
racóthu■c
■■i
tr■■ng
th■nh■m
c■p
top
ngo■i
3nh■t
■áp
Google.
tr■
■KTTSDDV
■ng
123doc.net.
Nh■n
nhu c■u
■■■c
theo
chiaquy■t
danh
s■ tài
hi■u
...li■udo
ch■t
c■ng
l■■ng
■■ng
vàbình
ki■mch■n
ti■n là
online.
website ki■m ti■n online hi■u qu■ và uy tín nh■t.
Vi■c
■■ng
Thành
s■
u■t
Nhi■u
Mang
Ln
123doc
Th■a
Xu■t
Sau
h■u
phát
khi
h■n
b■n
h■■ng
phát
thu■n
l■i
ýevent
viên
s■
cam
nh■n
r■ng
m■t
t■
m■t
tr■
s■
t■
h■u
s■
ýk■t
s■
thú
kho
nghi■m
t■i
ýd■ng
n■u
t■■ng
xác
n■m
ph■i
t■■ng
m■t
d■ng
v■,
là
s■
th■
nh■n
Thành
website
ra
ho■c
mang
th■c
event
t■o
kho
vi■n
m■i
■■i,
1.
t■o
t■
c■ng
th■
viên
■■ng
hi■n
m■
l■i
kh■ng
c■ng
ki■m
■■ng
d■n
123doc
CH■P
vi■n
nh■ng
ti■p
cho
theo
■■ng
■■u
ký
■■ng
ti■n
h■
l■
kh■ng
ng■■i
t■c
NH■N
s■
■ã
■úng
v■i
quy■n
th■ng
thi■t
chia
ki■m
d■ng
t■ng
s■
ki■m
h■n
dùng,
l■
các
CÁC
s■
d■ng
th■c.
ti■n
s■
l■i
b■■c
các
v■i
ti■n
2.000.000
và
ch■
chuy■n
■I■U
t■t
cơng
online
h■n
D■ch
mua
123doc
d■ch
online
kh■ng
d■n
nh■t
2.000.000
ngh■
bán
KHO■N
v■
b■ng
V■
■■■c
sang
tài
b■ng
ln
cho
tài
■■nh
c■a123doc.net
sau
li■u
hi■n
tài
ng■■i
li■u
ph■n
ln
tài
niêm
TH■A
khi
■
li■u
tài
v■
th■
li■u
hàng
t■t
t■o
■KTTSDDV
li■u
thơng
dùng.
trí
y■t
hi■u
hi■n
THU■N
c■
hi■u
c■
c■a
■■u
■
ho■c
l■nh
tin
qu■
■■ng
t■t
h■i
Khi
■■i,
qu■
mình
Vi■t
xác
c■
khách
gia
các
v■c:
nh■t,
■■■c
b■n
nh■t,
ngh■a
minh
trong
l■nh
Nam.
t■ng
Chào
quy
tài
online
uy
hàng
uy
c■p
tài
v■c:
■■nh
chính
l■nh
thu
Tác
tín
v■i
m■ng
tín
kho■n
tr■
cao
nh■t,
nh■p
khơng
tài
vi■c
phong
v■c
cao
tín
áp
thành
b■n
chính
nh■t.
d■ng,
d■ng
email
nh■t.
tài
b■n
vi■c
online
khác
chun
■■n
li■u
thành
tín
Mong
■ã
■ó
cho
b■n
cơng
Mong
gì
cho
d■ng,
và
v■i
■■ng
có
so
các
nghi■p,
viên
ki
kinh
■ã
mu■n
t■t
ngh■
123doc.
123doc.net!
ngh■a
v■i
mu■n
123doc
cơng
d■ch
■■n■
■■ng
c■a
c■
cwebsite.
ýdoanh
b■n
v■i
thơng
mang
các
hồn
mang
là
ngh■
123doc
v■
ký
v■■t
g■c.
các
■■a
Thàn
online.
thành
■ó
v■i■ng
v■i
l■i
tin,
h■o,
Chúng
Chún
■i■u
l■i
thơng
B■n
ch■
m■c
có
cho
ngo■i
và
là
123doc.netLink
chogun
cho
viên
Tính
■■
website
th■
mơ
n■p
kho■n
email
có
c■ng
tơi
tin,
ky,
100.000
c■ng
c■a
cao
ng■,...Khách
t■
■■■c
th■
■■n
cung
ti■n
b■n
ngo■i
d■■i
b■n
■■ng
tính
c■a
ki■m
website.
phóng
■■ng
trith■i
vào
c■p
vui
l■■t
niêm
th■c
ng■,...Khách
■ã
trách
n■ây)
xác
lịng
xã
ti■n
tài
■i■m
khơng
D■ch
xã
to,
■■ng
truy
y■t
q
h■i
kho■n
th■c
hànnh
nhi■m
h■i
cho
thu
■■ng
online
c■p
theo
m■t
báu,
tháng
V■
■■ng
ky,
nh■
m■t
b■n,
s■
c■a
xác,
m■i
(nh■
■■i
nh■p
hi■u
hàng
t■ng
ngu■n
b■n
phong
■■■c
tùy
ngu■n
5/2014;
ýtùy
123doc,
nhanh
v■i
Mong
ngày,
vui
■■■c
qu■
ý.
email
th■i
có
thu■c
phú,
tài
g■i
t■ng
lịng
th■
tài
123doc
và
s■
■i■m.
mu■n
ngun
chóng.
c■a
v■
mơ
ngun
b■n
■a
vào
uy
d■
■■ng
ng■■i
h■u
■■a
t■
tín
d■ng,
mình
dàng
các
s■
man
T■t
v■■t
tri
2.000.000
d■■i
nh■t.
nh■p
tri
dùng.
■■■c
ch■
th■c
“■i■u
c■
và
ngun
tra
th■c
giàu
m■c
■ây)
click
các
email
c■u
email
q
M■c
h■■ng
giá
Kho■n
q
100.000
thành
ocho
vào
tri
tài
báu,
tr■
b■nn
b■n
c■a
tiêu
báu,
th■c
li■u
b■n,
link
■■ng
nh■ng
Th■a
viên
phong
■ã
hàng
mình
phong
viên
m■t
l■■t
q
123doc
tùy
■■ng
■■ng
th■i
Thu■n
■■u
c■a
báo
và
phú,
quy■n
cách
truy
thu■c
phú,
click
mong
■ã
ky,
các
ký,
website.
c■a
c■p
■a
chính
v■
■a
l■i
b■n
g■i
vào
l■t
vào
users
d■ng,
123doc.net
m■i
S■
mu■n
d■ng,
sau
vào
xác,
các
link
vui
D■ng
ngày,
có
n■p
giàu
top
lịng
“■i■u
123doc
nhanh
t■o
giàu
thêm
200
ti■n
D■ch
giá
s■
■■ng
■i■u
tr■
giá
Kho■n
thu
chóng.
các
h■u
tr■
■ã
trên
thành
tr■
V■”
ki■n
nh■p.
nh■p
■■ng
g■i
website
2.000.000
website
■■ng
Th■a
sau
th■
cho
email
Chính
th■i
■ây
vi■n
th■i
ph■
Thu■n
chomong
c■a
thành
vì
(sau
mong
các
tài
bi■n
v■y
v■
li■u
mình
users
mu■n
■ây
viên
nh■t
mu■n
S■
123doc.net
online
và
■■■c
■■ng
có
D■ng
t■i
t■o
click
t■o
thêm
l■n
Vi■t
■i■u
g■i
ký,
D■ch
■i■u
vào
ra
nh■t
thu
Nam,
l■t
t■t
■■i
link
ki■n
nh■p.
ki■n
V■”
vào
T■i
Vi■t
123doc
nh■m
t■
cho
top
sau
cho
t■ng
l■
Nam,
Chính
cho
200
tìm
■ây
■áp
cho
■ã
th■i
cung
các
ki■m
các
vìcác
(sau
g■i
■ng
v■y
■i■m,
users
website
c■p
users
thu■c
■ây
nhu
123doc.net
nh■ng
có
chúng
c■u
■■■c
có
top
ph■
thêm
thêm
chia
3tơi
tài
bi■n
Google.
g■i
thu
ra
có
thu
li■u
s■
■■i
t■t
nh■p.
th■
nh■t
nh■p.
tài
■■c
T■i
Nh■n
nh■m
li■u
c■p
t■i
Chính
khơng
t■ng
Chính
ch■t
nh■t
Vi■t
■■■c
■áp
th■i
vìth■
l■■ng
Nam,
■KTTSDDV
vì■ng
v■y
v■y
danh
■i■m,
tìm
123doc.net
nhu
t■
và
123doc.net
th■y
hi■u
l■
ki■m
chúng
c■u
tìm
trên
theo
do
chia
ki■m
ti■n
c■ng
tơi
ra
th■
quy■t
ra
s■
có
■■i
online.
thu■c
■■i
tr■■ng
■■ng
th■
tài...
nh■m
nh■m
li■u
c■p
top
bình
ngo■i
ch■t
■áp
3nh■t
■áp
Google.
ch■n
l■■ng
■ng
tr■
■KTTSDDV
■ng
123doc.net.
lànhu
Nh■n
nhu
website
vàc■u
ki■m
c■u
■■■c
chia
theo
ki■m
chia
ti■n
s■
quy■t
danh
s■
online.
ti■n
tàitài
hi■u
li■u
online
...li■uch■t
do
ch■t
hi■u
c■ng
l■■ng
l■■ng
qu■
■■ng
vàvàki■m
uy
bình
ki■m
tín ch■n
ti■n
nh■t.
ti■nonline.
là
online.
website ki■m ti■n online hi■u qu■ và uy tín nh■t.
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
MASTER’S GRADUATION THESIS
Improving Evolutionary Algorithm for
Document Extractive Summarization
NGUYEN THI HOAI
Major: Data Science and Artificial Intelligence
Thesis advisor:
Dr. Nguyen Thi Thu Trang
Institute:
School of Information and Communication Technology
HA NOI, 2021
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM
Độc lập – Tự do – Hạnh phúc
************
BẢN XÁC NHẬN CHỈNH SỬA LUẬN VĂN THẠC SĨ
Họ và tên tác giả luận văn: Nguyễn Thị Hồi
Đề tài luận văn: Cải tiến thuật tốn tiến hóa trong tóm tắt trích rút văn bản
Chun ngành:
Khoa học dữ liệu và Trí tuệ nhân tạo
Mã số SV:
20202195M
Tác giả, Người hướng dẫn khoa học và Hội đồng chấm luận văn xác nhận
tác giả đã sửa chữa, bổ sung luận văn theo biên bản họp Hội đồng ngày
24/12/2021 với các nội dung sau:
1. Chương 2: Chuyển phần Related works từ Chương 3 sang Chương 2
(tên chương là Theoretical Background and Related works).
2. Chương 4:
• Kết quả thực nghiệm (trang 38) bổ sung phần giải thích cách lựa chọn
các đặc tính câu cho hàm Fitness, bộ tham số và giải thích lý do chỉ chọn
độ đo Recall đối với hai bộ dữ liệu DUC2001, DUC2002 và F1 đối với
CNN/Daily Mail.
• Bổ sung kết quả TextRank cho hai bộ dữ liệu DUC2001 và DUC2002.
3. Phụ lục: Mô tả rõ hơn nghiên cứu GA-Features và PSO-Harmony
Search.
- Sau khi sửa: Nội dung luận án gồm 5 chương chính là:
• Chương 1: Giới thiệu,
• Chương 2: Lý thuyết tổng quan và các nghiên cứu gần đây,
• Chương 3: Mơ hình lai PSO-GA đề xuất,
• Chương 4: Thực nghiệm,
• Chương 5: Kết luận và nghiên cứu tương lai.
Ngày tháng năm 2021
Tác giả luận văn
Giáo viên hướng dẫn
TS. Nguyễn Thị Thu Trang
Nguyễn Thị Hoài
CHỦ TỊCH HỘI ĐỒNG
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
Graduation Thesis Assignment
Name: Nguyen Thi Hoai
Phone: +84 384 830 357
Email:
Class: 20BKHDL-E
Affiliation: Hanoi University of Science and Technology
I – Nguyen Thi Hoai - hereby warrants that the work and presentation in this thesis
are performed by myself under the supervision of Dr. Nguyen Thi Thu Trang. All
results presented in this thesis are truthful and are not copied from any other work.
All references in this thesis - including images, tables, figures, and quotes - are
clearly and fully documented in the bibliography. I will take full responsibility for
even one copy that violates school regulations.
Hanoi, 25th November 2021
Author
Nguyen Thi Hoai
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
Acknowledgement
It is a genuine pleasure to express my deep sense of thanks and gratitude to
my advisor and guide, Dr. Nguyen Thi Thu Trang. I am highly grateful because
she managed and supported me in completing my thesis. Her unwavering
enthusiasm for science kept me constantly engaged with my research, and her
personal generosity helped make my time at HUST enjoyable. Furthermore, she
has taught me the methodology to carry out the study and present the research
works as clearly as possible. It was a great privilege and honor to work and study
under her guidance.
Also, I would like to thank Dr. Bui Thi Mai Anh for her keen interest in me
in every research. Her prompt inspirations, timely suggestions with kindness,
motivation, and dynamism have enabled me to accomplish this task throughout my
study period.
Most importantly, my sincere thanks and appreciation also go to my family
for their constant source of loving, caring and inspiration. They are such vital parts
of my life that I cannot imagine a life without them.
Last but not least, I gratefully acknowledge my university and the people who
have willingly helped me out with their abilities. I greatly appreciate the support
received through the collaborative work undertaken with them.
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
Abstract
The task of text summarization is to generate main ideas that cover the content of
the whole text with the aim of shortening reading time. There are two approaches to
summarizing, extractive and abstractive summarization. Numerous methods have been
researched in this domain, in which using heuristic algorithms is a more effective and
straightforward way than applying machine learning or deep learning. A genetic
algorithm (GA) is a search heuristic that is inspired by Charles Darwin’s theory of natural
evolution. This algorithm reflects the process of natural selection where the fittest
individuals are selected for reproduction in order to produce offspring of the next
generation. However, using traditional GA alone may suffer from a weak local search
capability and slow convergence speed. Another algorithm, Particle Swarm Optimization
(PSO), is a population-based optimization technique inspired by the motion of bird flocks
and schooling fish. The premature convergence of PSO is prevented by applying GA on
a small population. Besides, the local optimum phenomenon of PSO can also be avoided
with GA. The goal of the thesis is to investigate the effectiveness of a hybrid method
combining GA and PSO based attribute selection in improving the performance of
classification algorithms solving the automatic text summarization task. To assess the
effectiveness of the proposed algorithm, I have conducted the experimentation on three
common datasets, DUC2001, DUC2002, which are typically used for extractive
summarization, and CNN/Daily Mail. The experiment results have shown that the hybrid
PSO-GA outperforms all the state-of-the-art works on all three ROUGE point metrics for
these datasets.
The solution presented in this thesis was accepted at The 35th Pacific Asia
Conference on Language, Information and Computation (PACLIC 35) in 2021.
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
Content
Abstract ............................................................................................................................. i
Content .............................................................................................................................ii
List of Figures .................................................................................................................. v
List of Tables................................................................................................................... vi
List of Equations............................................................................................................vii
Acronyms ..................................................................................................................... viii
Chapter 1
Introduction ............................................................................................... 1
1.1
Motivation ..................................................................................................... 1
1.2
Objective and scope ...................................................................................... 2
1.3
Structure of thesis ......................................................................................... 2
Chapter 2
2.1
Theoretical Background and Related works .......................................... 4
Text summarization ..................................................................................... 4
2.1.1 Abstractive summarization .................................................................5
2.1.2 Extractive summarization ...................................................................6
2.2
Genetic Algorithm ........................................................................................ 6
2.2.1 Initialization ........................................................................................9
2.2.2 Crossover ............................................................................................9
a)
N-Point Crossover .......................................................................................9
b)
Uniform Crossover: ...................................................................................10
2.2.3 Mutation ............................................................................................11
2.3
Particle swarm optimization...................................................................... 11
2.3.1 Particle and swarm ...........................................................................12
2.3.2 Objective function .............................................................................12
2.4
Term Frequency– Inverse Document Frequency .................................... 13
2.4.1 Term Frequency ................................................................................14
2.4.2 Inverse Document Frequency ...........................................................14
2.5
Cosine Similarity ........................................................................................ 15
2.6
Related works ............................................................................................. 16
Chapter 3
Proposed hybrid PSO-GA ...................................................................... 18
ii
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
3.1
Overview ..................................................................................................... 18
3.2
General solution ......................................................................................... 18
3.3
Problem Representation ............................................................................ 20
3.3.1 Document representation ................................................................. 20
3.3.2 Summary presentation ...................................................................... 20
3.4
Proposed of fitness function ...................................................................... 21
3.4.1 Sentence features .............................................................................. 21
a)
Similarity to the topic sentence ................................................................. 21
b)
Sentence length ......................................................................................... 22
c)
Sentence position ...................................................................................... 22
d)
Number of proper nouns ........................................................................... 23
e)
Coverage ................................................................................................... 23
3.4.2 Fitness function ................................................................................ 24
3.5
Proposed strategies for operators ............................................................. 25
3.5.1 Initialization ..................................................................................... 25
3.5.2 Selection ........................................................................................... 26
3.5.3 Crossover ......................................................................................... 27
3.5.4 Mutation ........................................................................................... 28
3.5.5 Evaluation Convergence .................................................................. 28
3.5.6 Adaptive PSO strategy ..................................................................... 29
Chapter 4
4.1
Experimentation...................................................................................... 32
Evaluation metrics ..................................................................................... 32
4.1.1 Precision, recall and f-score ............................................................ 32
a)
Recall ........................................................................................................ 32
b)
Precision.................................................................................................... 32
c)
F-score ...................................................................................................... 33
4.1.2 Rouge measure ................................................................................. 33
a)
Rouge 1 ..................................................................................................... 33
b)
Rouge 2 ..................................................................................................... 34
c)
Rouge L..................................................................................................... 35
4.2
Preparation for experiment ...................................................................... 35
iii
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
4.2.1 Datasets ............................................................................................35
4.2.2 Environment setup ............................................................................37
4.3
Chapter 5
Experimental result .................................................................................... 38
Conclusion and future works ................................................................. 42
5.1
Conclusion ................................................................................................... 42
5.2
Future works ............................................................................................... 42
References ...................................................................................................................... 43
Appendix ........................................................................................................................ 48
iv
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
List of Figures
Figure 2.1 Two approaches to summarization. ................................................................. 5
Figure 2.2 The basic structure of Genetic Algorithm [8]. ................................................ 7
Figure 3.1 The steps of the proposed hybrid PSO-GA for single documents. ............... 19
Figure 3.2 Velocity and Position Updating of adaptive PSO. ........................................ 30
Figure 3.3 Flow chart of the adaptive PSO. .................................................................... 31
Figure 4.1 Rouge Scores (Recall) (%) on DUC2001...................................................... 39
Figure 4.2 Rouge Scores (Recall) (%) on DUC2002...................................................... 40
Figure 4.3 Rouge Scores (F1) (%) on CNN/Daily Mail. ................................................ 41
v
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
List of Tables
Table 2.1 One-Point crossover example ......................................................................... 10
Table 2.2 Two-Point crossover example ......................................................................... 10
Table 2.3 Uniform crossover example ............................................................................ 10
Table 2.4 Bit-flip mutation example ............................................................................... 11
Table 4.1 Dataset statistics .............................................................................................. 36
Table 4.2 CNN/Daily Mail description ........................................................................... 36
Table 4.3 Parameter list ................................................................................................... 37
Table 0.1 Rouge Scores (Recall) (%) of the Hybrid PSO-GA compared with other
research. The experiment was carried out on 309 test documents of the DUC2001 corpus
......................................................................................................................................... 48
Table 0.2 Rouge Scores (Recall) (%) of the Hybrid PSO-GA compared with other
research. The experiment was carried out on 567 test documents of the DUC2002 corpus
......................................................................................................................................... 49
Table 0.3 Rouge Scores (F1) (%) of the Hybrid PSO-GA comparing with other research.
The experiment was carried out on 11490 test documents of CNN/Daily Mail corpus . 50
vi
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
List of Equations
Equation 2.1 Term frequency- inverse document frequency. ......................................... 15
Equation 2.2 Cosine Similarity. ...................................................................................... 15
Equation 3.1 The constraint on summary’s length. ........................................................ 19
Equation 3.2 The similarity to the topic sentence feature............................................... 21
Equation 3.3 The sentence length feature. ...................................................................... 22
Equation 3.4 The sentence position feature. ................................................................... 22
Equation 3.5 The evaluation of the proper noun feature. ............................................... 23
Equation 3.6 The coverage feature. ................................................................................ 24
Equation 3.7 Fitness function. ........................................................................................ 25
Equation 3.8 Sum of weighting to each feature. ............................................................. 25
Equation 3.9 The Rank selection strategy. ..................................................................... 26
Equation 3.10 The Roulette wheel selection strategy. .................................................... 27
Equation 3.11 Uniform crossover strategy ..................................................................... 27
Equation 3.12 Random mutation strategy....................................................................... 28
Equation 3.13 Evaluation Convergence strategy ............................................................ 28
vii
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
Acronyms
GA
Genetic Algorithm
PSO
Particle Swarm Optimization
EA
Evolutionary Algorithm
GA-PSO
Genetic Algorithm – Particle Swarm Optimization
TF-IDF
Term Frequency - Inverse Document Frequency
viii
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
Chapter 1 Introduction
1.1 Motivation
Today, with the development of all aspects of life, from culture, education to
technology, the amount of information provided to people is becoming more and more
diverse and rich. One question is how people can quickly and efficiently access such a huge
amount of information? The goal of an automatic text summarization system is to produce a
concise and fluent summary while still showing the main content and overall meaning of the
original text, allowing the user to grasp the information available in that text, but with a
much shorter reading time. There are two basic approaches to document summarization
(Hahn & Mani, 2000) [1], namely, extractive summary and abstractive summary. Extractive
summarization is to select sentences from the text that best express the meaning of the
content without using external words. This technique consists of sentences and phrases
ranked by importance and selects the most important elements of the document to construct
a summary. There is a huge diversity of techniques to automatically generate extractive
summaries for a single document which can be grouped into two directions: supervised and
unsupervised. The former methods are based on machine learning models [2] or deep
learning models [3]. Those require a huge training dataset, including human-generated
summaries, then very costly and time-consuming. The latter approaches, on the other hand,
do not require any training corpus but aim to rank all the sentences from the original
document by exploring their relationship. The sentences with the highest-ranking scores are
then selected to build the summary. In this research direction, many techniques have been
proposed for extractive summarization, such as Hidden Markov Model [4], graph-based
model [5], algebraic reduction [6] and evolutionary algorithms for instance, Genetic
Algorithm (GA) [7][8], Particle Swarm Optimization (PSO) [9]. Abstractive summaries tend
to "interpret" a text to be short and concise. It involves creating entirely new phrases and
sentences to capture the meaning of the text, requiring advanced techniques such as
interpretation, generalization, combining real-world knowledge and so on. Due to the
difficulty of summarizing, most of the research on text summarization is based on extractive
summarization.
1
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
The thesis focuses on improving evolutionary algorithms (EA) in extractive
summarization. Many previous works proposed some evolutionary algorithm, especially
genetic algorithm (GA), for selecting the best sentences to create a summary. However,
using GA individually may suffer from a weak local search capability and slow convergence
speed. On the one hand, the fast convergence capability of PSO is leveraged to overcome
the low-speed limitation of GA. Additionally, the local optimum phenomenon of PSO can
be avoided by using GA. Therefore, I propose a hybrid method, namely hybrid PSO-GA,
which exploits the advantages of both PSO and GA approaches. More specifically, the study
will improve evolutionary algorithm in extractive text summarization to increase the
accuracy of the summary and the performance of the proposed model.
1.2 Objective and scope
In this thesis, I investigate to improve the evolutionary algorithm in extractive
summarization. Evolutionary algorithms are a heuristic-based approach to tackling problems
that cannot be easily solved in polynomial time. GA is a well-known algorithm, which is
inspired by the biological evolution process. PSO, Ant Colony Optimization (ACO) are
swarm intelligence algorithms, where it shares many similarities with evolutionary
computation techniques such as GA. The hybrid algorithms that integrated GA and another
algorithm, such as PSO, were proposed previously. However, there has been no previous
study in which PSO and GA were combined to increase the quality of the summary. My goal
is to take advantage of these two algorithms so that the generated population is more diverse
and converges faster.
1.3 Structure of thesis
The rest of this thesis is organized as follows.
Chapter 2 provides a concise review of the basic knowledge necessary to understand the
solutions and methods discussed in the thesis. First, I will discuss text summarizing,
including extractive and abstractive text summarization. Following that, I discussed the
genetic algorithm, particle swarm optimization algorithm, and their components. Finally, I
also discuss the equations that I use to create my model.
2
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
Chapter 3 describes the fundamental procedures involved in developing the proposed
hybrid PSO-GA for automatically generating summaries of a single document. I will begin
by discussing the general model of evolutionary algorithms for text summarization. Next, I
would like to discuss some of the most state-of-the-art research related to the text
summarization domain.
Chapter 4 discusses the assessment metric, the work environment, and the data that I
employ. Additionally, I describe the fine-tuning parameter method for medium data that do
not have training or validation parts (DUC2001, DUC2002) and then supply the parameters
of my model. Finally, I compare the results to those of related studies using the same
unsupervised and supervised dataset.
Chapter 5 concludes my discussion of the proposed approach and gives some further
works.
3
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
Chapter 2 Theoretical Background and
Related works
This chapter contains a broader overview of text summarization. The application is
based on existing algorithms; therefore, a detailed description of the implemented
algorithms is given here. Firstly, I will start by explaining text summarization and
classification. Following that, I will give a quick overview of Genetic Algorithms and
Particle Swarm Optimization. I would like to discuss a few commonly used equations to my
model and its improvements, which will be described in Chapter 3. Finally, some previous
studies in text summarization are presented to describe an overview of the current research
situation.
2.1 Text summarization
Based on the output type of the summary, text summarization techniques can be
broadly grouped into extractive and abstractive approaches. Extractive methods produce
summaries by identifying essential sections of the text and combining extracted parts
coherently. Abstractive methods, in contrast, may generate novel words or phrases which
are not in the source text - as human-written summaries. While the former is strongly
concerned with the most salient sentences from the input document, the latter requires
advanced language generation techniques such as paraphrasing, generalizing, incorporating
real-world knowledge, etc. The abstractive-based approach, therefore, is much more
complex than the extractive one.
A summary is typically shorter than the original document, and the greater the
compression (i.e., the shorter the summary), the greater the loss of information. As a result,
the goal of developing methods for summarizing a text is to keep the length as short as
possible while minimizing information loss as much as possible.
4
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
To assess the effectiveness of a summary, the ROUGE evaluation toolkit [10] is
usually used, which was found to highly correlate with human judgments [11]. It compares
the summaries generated by the program with the human-generated (gold standard)
summaries. (This measure will be learned and used later in section 4.1.2).
Based on the input type, it is divided into single and multi-document summarization.
In single-document summarization, the summary of only one document is built, while in
multi-document summarization, the summary of a whole collection of documents (such as
all today’s news or all search results for a query) is built.
Figure 2.1 Two approaches to summarization.
2.1.1
Abstractive summarization
An abstractive summary is an arbitrary text that describes the contexts of the source
document. Abstractive summarization process consists of “understanding” the original text
and “re-telling” it in fewer words. Namely, an abstractive summarization method uses
linguistic methods to examine and interpret the text and then to find new concepts and
expressions to best describe it by generating a new shorter text that conveys the most
important information from the original document. While this may seem the best way to
construct a summary (and this is how human beings do it), in real-life setting immaturity of
the corresponding linguistic technology for text analysis and generation currently renders
such methods practically infeasible.
5
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
2.1.2
Extractive summarization
An extractive summary, in contrast, is composed with a selection of sentences (or
phrases, paragraphs, etc.) from the original text, usually presented to the user in the same
order—i.e., a copy of the source text with most sentences omitted. An extractive
summarization method only decides, for each sentence, whether or not it will be included in
the summary. The resulting summary reads rather awkward; however, the simplicity of the
underlying statistical techniques makes extractive summarization an attractive, robust,
language-independent alternative to more “intelligent” abstractive methods. In this thesis, I
consider single extractive summarization.
2.2 Genetic Algorithm
Evolutionary algorithms (EAs) have shown promising results in solving the problem
of extractive summarization. In this context, EAs are cast as a clustering method that aims
to search for the most relevant sentences from the input document. The search process is
guided by fitness functions that assign a value to each candidate solution (i.e., candidate
summary) in the search space to assess its quality. Most studies adopted Genetic Algorithms
(GAs) and focused on exploring the effectiveness of different strategies to formalize fitness
functions (Meena & Gopalani, 2015; Anh et al., 2019) [12][8]. A genetic algorithm (Thede,
2004) [13] is a popular probabilistic, iterative controlled search technique used for global
optimization that is inspired by Charles Darwin’s theory of natural evolution. This algorithm
simulates the process of natural selection where the fittest individuals are selected for
reproduction in order to produce offspring of the next generation. Therefore, offspring
inherit the characteristics of the parents and will be added to the next generation. If parents
have better fitness, their offspring will be better than parents and have a better chance at
surviving. This process keeps on iterating, and in the end, a generation with the fittest
individuals will be found. Each generation consists of a population of individuals and each
individual represents a point in search space and possible solution. Each individual is
represented as a string of character/integer/float/bits. This string is analogous to the
Chromosome. I consider a set of solutions for a problem and select the set of best ones out
of them. The working cycle of GA has four basic steps, namely, initialization, selection,
crossover, and mutation that are described in Figure 2.2.
6
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
The essential advantage of GAs is the capacity to maintain the diversity of candidate
solutions thanks to genetic operators (i.e., selection, crossover and mutation). However, the
complexity of the evolution process often leads to a slow convergence speed. Indeed, the
computation time of GA increases non-linearly in the case of large population size.
Figure 2.2 The basic structure of Genetic Algorithm [8].
Genetic algorithms have three main genetic operators: crossover, mutation, and
selection. These genetic operators play their essential roles and can be very different.
• Crossover: This operator exchanges the genetic information of two parents in order
to create children. It is conducted on randomly chosen parent couples in order to generate a
child population similar to size to the parent population.
Crossover is primarily an action that occurs in a subspace. This point becomes evident
in a binary system where the strings are composed of the characters 𝑎 and 𝑏. For instance,
regardless of the crossing actions taken by two strings 𝑆1 = [𝑏𝑎𝑏𝑎] and 𝑆2 = [𝑏𝑏𝑎𝑏], their
offspring will always be of the form [𝑏 … ]. That is, the crossover can only produce solutions
in a subspace where the first component is always 𝑏. Additionally, no matter how the
7
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
crossover is administered, two identical solutions will result in two identical offspring. This
implies that crossover occurs within a subspace and that the converged solutions/states
remain converged.
• Mutation: This operator augments the new child population with new genetic
information. This is accomplished by flipping some chromosomal bits. Mutation resolves
the local minimum problem and promotes diversification.
More specifically, mutation frequently results in a solution that is not contained inside
the subspace. If the mutation occurs on the first 𝑎 and it flips to 𝑏 in the preceding example,
then the solution 𝑆3 = [𝑏𝑎𝑏𝑏] does not belong to the prior subspace. Indeed, mutation
frequently yields solutions that are divergent from present solutions, hence expanding the
population's diversity. This enables the people to break out from a suffocating local
optimum. A critical point to consider is random selection within the population. For instance,
crossover necessitates the presence of two parents in a population. Are they chosen randomly
or are I biased toward the solutions with the highest fitness? One method is to pick using a
roulette wheel; another is to utilize fitness-proportional selection. Obviously, additional
selection methods are used, such as linear ranking selection, tournament selection, and
others.
• Selection of the fittest, or elitism: individuals are selected for the reproduction of
offspring. The selected individuals are then arranged in pairs of two to enhance reproduction.
Parent selection is very crucial to the convergence rate of the GA as good parents drive
individuals to better and fitter solutions.
Both crossover and mutation operate independently of an aim or fitness landscape. On
the other hand, selection of the fittest, or elitism, does use the fitness landscape to determine
what to choose and hence affects an algorithm's search behavior. What is selected and how
it is selected is algorithm-dependent, as are the objective function values. This elitism
ensures that only the greatest solutions survive. However, excessive elitism may result in an
early convergence. It is worth noting how basic these genetic operators are. Additional
operators may take various forms, and hybrid operators may also be used. To grasp the
fundamental behavior of genetic algorithms, I will concentrate on these crucial operators in
2.2.1, 2.2.1 and 2.2.2.
8
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
2.2.1
Initialization
The Genetic Algorithm Process begins with population initialization. Population is a
subset of current generation solutions. Traditionally, solutions are represented in binary as
strings of 0s and 1s.
Additionally, population P can be described as a collection of chromosomes.
Typically, the initial population 𝑃(0), or the first generation, is generated randomly.
Populations 𝑃(𝑡), at generation t (t =1, 2, ...) are formed through an iterative procedure.
When working with genetic algorithms, it is critical to maintain population variety;
otherwise, a phenomenon known as Premature Convergence may occur. In evolutionary
algorithms, "premature convergence" refers to the algorithm's convergence before the global
optimal solution is attained. Premature convergence occurs when an evolutionary algorithm
becomes locked in local minima. When this occurs, parental solutions are unable to develop
offspring that are genetically superior to their parents. Premature convergence is frequently
triggered by the following factors: loss of diversity, too strong selective pressure towards a
best solution or too much exploitation of existing building blocks from the current
population (e.g., by recombining them, or mutating them only slightly).
2.2.2
Crossover
The crossover operator is analogous to reproduction and biological crossover. In this,
more than one parent is selected and one or more offspring are produced using the genetic
material of the parents. Crossover is usually applied in a GA with a high probability – across.
a) N-Point Crossover
A crossover point is randomly chosen for two individuals (parents) obtained by
selection. All data beyond that point in the organism string is swapped between the two
parent organisms. This crossover operates at the individual level. Assume the one-point
crossover point happens at random after the seventh bit, as shown in Table 2.1. Then, after
the seventh bit, all parent bits are switched to produce the following two children. This is a
particular application of the N-point Crossover technique.
9
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
Table 2.1 One-Point crossover example
Chromosome1
1011011|001110110
Chromosome2
1110001|010011011
Offspring1
1011011|010011011
Offspring2
1110001|001110110
Similarly, two randomly chosen points on the individual chromosomes (strings) are
used to exchange genetic material. Table 2.2 shows crossover using the two-point crossover.
Table 2.2 Two-Point crossover example
Chromosome1
1011|10001010|1010
Chromosome2
1111|01010100|0011
Offspring1
1011|01010100|1010
Offspring2
1111|10001010|1010
b) Uniform Crossover:
Each gene (bit) is selected randomly from one of the corresponding genes of the parent
chromosomes. Use the tossing of a coin as an example technique. Suppose the second, third,
fourth, ninth, tenth and eleventh bits positions (of the original parents) are swapped. Table
2.3 shows crossover using the uniform crossover.
Table 2.3 Uniform crossover example
Chromosome1
0|000|1101|010|0011
Chromosome2
1|101|0101|101|0101
Offspring1
0|101|1101|101|0011
Offspring2
1|000|0101|010|0101
10
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
2.2.3
Mutation
In simple terms, mutation is a little random change to the chromosome that results in
a new solution. It is typically utilized with a low probability – amut – to preserve and introduce
variation into genetic populations. When the probability is really high, the GA gets reduced
to a random search.
Mutation is the component of the GA that is responsible for "exploring" the search
space. It has been noted that while mutation is required for GA convergence, crossover is
not.
With bit-flip mutation, I select one or more random bits and flip them, as shown in
Table 2.4. This is used for binary-encoded GAs.
Table 2.4 Bit-flip mutation example
Chromosome
000011010100011
Offspring
110011110100011
2.3 Particle swarm optimization
Eberhart and Kennedy (Eberhart & Kennedy, 1995) [14] proposed the particle swarm
optimization (PSO) algorithm, which is a stochastic optimization technique based on swarm
(1995). The PSO algorithm models the social behavior of animals such as insects, herds,
birds, and fish. These swarms follow a cooperative food-finding strategy, with each member
of the swarm modifying the search pattern in response to its own and other members'
learning experiences. The PSO algorithm's main design concept is based on two studies: One
is the evolutionary algorithm, which, like the evolutionary algorithm, uses a swarm mode to
search a vast region in the solution space of the optimized objective function at the same
time. Artificial life, on the other hand, is the study of artificial systems with life-like qualities.
Since its introduction in 1995, it has undergone numerous upgrades. As academics gained
knowledge of the technique, they developed new versions to address specific requirements,
11
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep
created new applications in a variety of fields, published theoretical papers examining the
impact of various factors, and proposed numerous variants of the algorithm.
2.3.1
Particle and swarm
The PSO algorithm can be summarized as follows based on this: PSO algorithm is a
swarm-based search procedure in which each individual is referred to as a particle and is
described as a potential solution to the optimum problem in D-dimensional search space,
and it can remember the swarm's and its own optimal positions, as well as velocity. Each
generation combines the particle's information to alter the velocity of each dimension, which
is then utilized to determine the particle's new position. Particles in the multi-dimensional
search space constantly alter their states until they reach balance or the ideal state, or they
go beyond the calculation limits. The objective functions introduce a unique relationship
between different dimensions of the issue space.
The PSO uses a number of particles that constitute a swarm. The particles that make
up the PSO system fly around in a multidimensional search space; during this flight, each
particle modifies its position based on its own and surrounding particles' experiences, using
the optimal position found by itself and its neighbors. Particles can update their positions
and velocities according to the environment change, namely it meets the requirements of
proximity and quality. Particles in PSO can keep their stable movement in the search space,
while change their movement mode to adapt the change in the environment. The set of
surrounding particles and its historical experience define a particle's swarm direction [15].
2.3.2
Objective function
Every iteration, each particle changes its position according to the new velocity as
inrespectively. Considering a swarm with P particles, there is a position vector 𝑋𝑗 =
𝑗
𝑗
𝑗
𝑗
𝑗
𝑗
𝑗
𝑗
{𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑁 } and a velocity vector 𝑉𝑗 = {𝑣1 , 𝑣2 , 𝑣3 , … , 𝑣𝑁 } at a t iteration for each
one of the 𝑖 − 𝑡ℎ particle that composes it. These vectors are updated through the dimension
j according to the following equations:
𝑗
𝑗
𝑗
𝑗
𝑗
𝑗
𝑉𝑖 (𝑡 + 1) = 𝜔 ∗ 𝑉𝑖 (𝑡) + 𝑐𝑝 ∗ 𝑟𝑎𝑛𝑑1𝑖 ∗ (𝑝𝑏𝑒𝑠𝑡𝑖 (𝑡) − 𝑋𝑖 (𝑡)) + 𝑐𝑔 ∗ 𝑟𝑎𝑛𝑑2𝑖 ∗
𝑗
(𝑔𝑏𝑒𝑠𝑡 𝑗 (𝑡) − 𝑋𝑖 (𝑡))
12
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep do an to nghiep docx 123docz
luan van hay luan van tot nghiep