I H C QU C GIA TP. HCM
IH
-----------------------
NGUY
NG TU N T
T
T PD
LI U CHU I TH I GIAN
: 604801
LU
TP. H
NG
2
I
IH
- TPHCM
ng d n khoa h
Ng c
ch m nh
c
ch m nh
Lu
TS. Ph
cb ov t
22
7
ih
2013
nH
m:
1.
n Anh (Ch t ch h
ng)
2. TS. Ph
3.
4.
c (PB2)
Ng
5.
n c a Ch t ch H
ng
ng Khoa qu
c s a ch a (n
CH T CH H
NG
n Anh
3
-
-
NHI M V LU
: 604801
I.
gian.
II. NHI M V
I DUNG
1.
m v d li u chu i th
th c ph bi
li u chu i th
tr
i s th
i thu t h
u tu n t ph bi n
2.
s
h c c a vi
t qu
tc
c c a lu
.
3. Chu n b d li u th c nghi m, d li
c hi n ti n x
khoa
li u th
ng ch ng
li u.
4.
ng tu n t
xu t hai gi i thu t h tr
5. Ti
i thu
i thu
III.
xu
t qu th c nghi m
xu
u.
I M V : 21/1/2013
IV.
M V : 21/6/2013
V.
NG D N:
Ng
NG D N
CH NHI M B
(H
TS.
(H
Ng
(H
O
4
L
il
ct
ng d
om
Ng
u ki n t t nh
i dung trong su t th i gian th c hi n lu n
il
ih
TP.HCM, ban ch nhi
, cung c
ng h c t p t t nh
c hi
L i cu
t nh t.
ng ki n th c qu
5
T LU
D li u chu i th
t hi n r t ph bi n trong nhi
a ch
c
ng tri th
trong d li u chu i th
m n
t quan tr
iv
nh ng
k t qu
li u s
nh ng m i quan h mang y u
t th i gian gi
ng/hi
y, lu
t trong th i gian s ng c a
xu t ra m
ng tu n t
gi i thu
cg
u t th i gian. Lu
p hai
ng tu n t ph bi
c
t t p d li u chu i th
bi
ng th
u xu
u tu n t ph
d li u tu n t ,
nhi u chu i th i
mang l
th i kho ng x y ra gi
v id
mang y u t
Hai gi i thu
c
ki n (c th
sau).
xu t s d
(brute-
ng c
-based). Tinh th
n
a c hai gi i thu t khi
u tu n t ph bi
t ng m c t
ng ti p c n l p
i thu
ng c i ti
t h
bi n gi
th
n so
x
i quan h ph
t qu th c nghi m t hai gi i thu
cm
hi u qu c a gi i thu t d
i gi i thu t
brute-force.
t qu th c nghi
i v i d li
lu
m h tr
lu t mang y u t th
tr ra quy
ct ,
nh.
d li u chu i th
ng d ng trong h
6
ABSTRACT
Nowadays, time series is present in many various application domains such
as finance, medicine, geology, meteorology, etc. Analyzing and mining time series
for useful information and hidden knowledge is very significant in those domains to
help users such as data analysts and managers get fascinating insights into important
temporal relationships of objects/phenomena along the time. Therefore, this thesis
introduced a notion of frequent temporal inter-object pattern and accordingly
proposed two frequent temporal pattern mining algorithms on a set of different time
series. As compared to frequent sequential patterns in sequential databases, frequent
temporal inter-object patterns are more informative with explicit and exact temporal
information automatically discovered from many various time series.
The two proposed algorithms which are brute-force and tree-based are
efficiently defined in a level-wise bottom-up approach dealing with the
combinatorial explosion problem in the association analysis task. When analyzing
experimental results, we can see the advantage of tree-based algorithm in
comparision with brute-force algorithm.
As shown in experiments on real financial time series, this thesis results can
be further used to efficiently enhance the temporal rule mining process on time
series for decision making support.
7
L
xin
c hi
ng, ngo i tr
nn
t qu tham kh o t
a lu
cn
b ng c p
Nguy
l y
8
: GI I THI
1.1 Gi i thi u v
....................................................................15
..............................................................................................15
1.2 M c ti
.................................................................................................19
1.3 Ph
..................................................................................................19
...................................................................................................19
1.5 C
.............................................................................................20
:
T .....................................................................22
2.1 D li u chu i th i gian ....................................................................................22
2.1.1 Time Series ................................................................................................22
2.1.2 Chu i con (Subsequence) ..........................................................................22
2.
p (Match) ..................................................................................23
pt
ng (Trivial Match) ...................................................23
ng (Non-trivial match) .................................24
2.1.6 Motif ..........................................................................................................24
2.2 X
li u chu i th i gian ...........................................................................24
(similarity measurement) ...........................................24
2.2.2 Chu
li u (Data Normalization) ..................................................27
u tu n t ph bi n ........................................................................27
.................................................................................................27
2.3.2 M t s
ng g
m s chi u x p x g p t
u tu n t ph bi n ........28
n PAA (Piecewise
Aggregate Approximation) ....................................................................................29
p bi
i chu i th i gian sang d
SAX (Symbolic
Aggregate Approximation) ....................................................................................30
i chu i th i gian sang d
ng .......33
9
i s quan h th i gian Allen ........................................................................35
2.8 Generic Dictionary trong C# ...........................................................................36
U
...................38
li u chu i th i gian ..............................38
iv
d li u chu i th i gian l n ............38
......................................................38
3.2 M t s gi i thu
u ph bi n.........................................................40
3.2.1 Gi i thu t Apriori ......................................................................................40
3.2.2 Gi i thu t FP-Growth ................................................................................41
3.2.3 Gi i thu t GSP ...........................................................................................42
3.2.4 Gi i thu t Prefix Span ...............................................................................42
3.3 M t s
li u chu i th i gian
...............................................................................................................................43
NG TI P C N GI I QUY T V
.............................47
4.1 Gi i thi u .........................................................................................................47
ng tu n t
d li u
chu i th i gian .......................................................................................................47
n chuy
i thu
i sang chu i th
xu
ng .......................49
i con ph bi n t chu i th
ng .50
ng tu n t ph bi
ng ..........52
ng tu n t
ng
p d li u chu i th i gian ..............................................................................54
4.3.1 Gi i thu t Brute Force ...............................................................................55
4.3.2 Gi i thu t d
.................................................................63
: TH C NGHI M ............................................................................74
10
5.1 T ng quan ........................................................................................................74
5.2 Ti n x
li u.............................................................................................75
d li u .................................................................................................75
th c nghi m ...........................................................................................76
5.4.1 Th c nghi m c
nh min-sup ...................................................................78
5.4.2 Th c nghi m c
nh chi
5.4.3 Th c hi n ki m tra s
5.4.4 Th c hi n ki m tra s
i th i gian ........................................82
ng k t h
u t o ra ........................86
a hai gi i thu t ....................................88
5.5 K t qu .............................................................................................................89
................................................................................................91
: K T LU N .....................................................................................92
6.1 T ng k t ...........................................................................................................92
a lu
....................................................................................92
.....................................................................93
11
M cl
1.1 Minh h a d li u chu i th i gian c phi u S&P500 ..................................17
2.1
i con trong m t chu i th i gian [17] ...........................................22
2.2
t chu
p c a chu i con C trong chu i th i gian T
[17] ............................................................................................................................23
2.3
pt
i c a chu i C [17] ...........23
nh 2.4
.....................................................................25
2.5
.............................................................26
2.6 Ma tr n cho DTW [18] ...............................................................................26
nh 2.7 Bi
i thu gi m chi u theo PAA [18] .....................................................30
2.8 Minh h
...................................31
2.9 Bi u di n b ng SAX [17-18] .....................................................................32
2.10 Minh h
trong b
3.1
i [21] ...........................................39
4.1 Minh h
4.2
........................................37
.................................................48
nm tc
4.3 Minh h
...................................................................49
n 2 ...........................................................................51
4.4 Minh h a cho m t m u tu n t ph bi n v
nc am
i
ng c phi u NY ....................................................................................................53
4.5 Minh h a cho m t m u tu n t ph bi n v
ng c phi
nc
i
.........................................................................................54
4.6 Minh h a quan h precedes ........................................................................57
4.7 Minh h a quan h
.........................................................................57
4.8 Minh h a quan h
.....................................................................58
4.9 Minh h a quan h
.....................................................................58
4.10 Minh h a quan h
........................................................................58
4.11 Minh h a quan h
.............................................................59
4.12 Minh h a quan h
........................................................................59
12
4.13
gi i thu t Brute Force ..................................................................63
4.14 Minh h a m t node trong c
4.15
c
....................................................64
..........................................................65
4.16 Minh h a c
c 3............................................................66
4.17 Minh h
c 4 .........................................................................68
4.18
............................................................................69
4.19
th t c BuildTree .........................................................................69
4.20
th t c CombineChildNodes ........................................................70
4.21
th t c CombineNode trong gi i thu t d
......71
4.22
i thu t d
.......72
4.23
..........................................73
5.1 Chu i th i gian ch
5.2 Chu i th i gian ch
phi u S&P500 t
...........76
c phi u BA t
...................77
5.3 Chu i th i gian ch
phi u CSX t
.................77
5.4 Chu i th i gian ch
phi u CAT t
.................77
5.5 Chu i th i gian ch
phi u DE t
5.6
....................78
th bi u di n th i gian ch y ng v i m t chu i th
min-sup = 5 ...............................................................................................................79
5.7
th bi u di n th i gian ch y ng v i hai chu i th i gian S&P500, BA
-sup = 5 ...........................................................................................................80
5.8
th bi u di n th i gian ch y ng v i ba chu i th i gian S&P500, BA,
-sup = 5 ..................................................................................................81
5.9
th bi u di n th i gian ch y ng v i b n chu i th i gian S&P500, BA,
-sup = 5 ........................................................................................81
5.10
th bi u di n th i gian ch y ng v
BA, CAT, CSX,
5.11
i th i gian S&P500,
-sup = 5 .........................................................................82
th bi u di n th i gian ch y ng v i chu i th
u
i th i gian = 100...........................................................................................83
5.12
th bi u di n th i gian ch y ng v i 2 chu i th i gian S&P500, BA
i th i gian = 100 ............................................................................84
13
5.13
th bi u di n th i gian ch y ng v i 3 chu i th i gian S&P500, BA,
i th i gian = 100 ...................................................................85
5.14
th bi u di n th i gian ch y ng v i 4 chu i th i gian S&P500, BA,
i th i gian = 100 .........................................................85
5.15
th bi u di n th i gian ch y ng v i 5 chu i th i gian S&P500, BA,
i th i gian = 100 ..................................................86
5.16 Minh h
ng tu n t
ng v i
....................................................90
14
M c l c b ng
2.1
v
d li u chu i [1] ................................................................28
2.2 B
2.3
[17-18] ........................................32
lo
s d
i sang chu
li u c phi u ..............................................................................................................34
2.4
2.5
i ba m i quan h
n Allen [29] ...................................................35
p quan h thu n ngh ch [29] ...........................................................35
5.1 K t qu khi ch y v i m t chu i th
-sup = 5..........79
5.2 K t qu khi ch y v i hai chu i th
-sup = 5 ...79
5.3 K t qu khi ch y v i ba chu i th
-sup =
5 .................................................................................................................................80
5.4 K t qu khi ch y v i b n chu i th i gian S&P500, BA, CAT, CSX ........81
5.5 K t qu khi ch y v
i th i gian S&P500, BA, CAT, CSX, DE
-sup = 5 ...........................................................................................................82
5.6 K t qu khi ch y v i chu i th i gian
i th i gian
= 100..........................................................................................................................83
5.7 K t qu khi ch y v i 2 chu i th
i
th i gian = 100 ..........................................................................................................84
5.8 K t qu khi ch y v i 3 chu i th i gian S&P500, BA, CAT .....................84
5.9 K t qu khi ch y v i 4 chu i th
u
i th i gian = 100...........................................................................................85
5.10 K t qu khi ch y v i 5 chu i th
chi
i th i gian = 100 .................................................................................86
5.11
ng s
ng k t h
u t o ra t hai gi i thu t ......87
5.12
t ng s
ng k t h p gi a hai gi i thu t...................................88
5.13
ng s
c t o gi a hai gi i thu t ............................88
15
: GI I THI
1.1 Gi i thi u v
S
i c a h th
ch s
um
t
nc
th thi u trong cu c s
u ch
c. T nh ng chi
kh
n ch
s h u nh
it
l n. Ta s d
x
t nhi
p,
p, th
N
m trong d li
i thi u
nc
n nay,
i s ti n b c
vi c thu th
d li
kh
l
ng d li u
c nh
tr
iv
cd
d ng kho d li
vi c truy xu
nh
xu t.
n nh
l
h tr cho
c
c trong kh
ng d li u kh ng
.
d
li
n (non
m
trivial) v i m
p l , ph bi
hi
nay, nhi
c kinh doanh m
to l n. N u ch d
n gi
c.
i mang l i l i nhu
ng d li
p th i. M t trong nh
th
bi n hi n nay
nh kinh nghi
bi t c
li
d li u ch
ng v
u
c ch n s h tr r t nhi u trong vi c d
phi
gi
phi
bi n
16
s d ng t p d li u chu i th
th c hi
ac
Hi n nay nhi
li u ch
c th c hi n trong th
cd
i ta s d
AutoRegressive Integrate Moving Average) do Box(1996) d
h
ng MA [3-5].
d
t nh
d
ngh
ng
-ron
li
-chung Fu,
-
li
it
ch (inter
t giao d ch (intra transactio
ph bi
tm
u
gi i h n trong m t lo i c phi u
n t nhi u lo i c phi
li
L.Feng d
C.Cho, Y.Wu [26]
ch d
ng d ch chuy n v
Khi nh
th
a Lu,
a c phi u [25].
li
ng nh
n d li u
chu i th i gian (time series data). D li u chu i th i gian xu t hi n m
nhi u ng d ng th c t trong cu c s
c kinh t -
li u ch
t
c y t (nh
th
a ch
y trong 4000 b c nh c a 15 t
i 75% s
c s n xu t t
nh d li u chu i th
c a d li u chu i th
i tr s th c, m i
i nh ng th
li u chu i th i gian, ta c n ph
th
u nhau [18]. Khi nh
n th t c
bi
t nh
mx
i. D li u chu i th
c trong nhi
i
ki
ng r t l
a 2 th
i v i d ng d li
nd
ki n x
i th t ho c d ch chuy n th
d li
i
ph bi n
t l n. M t chu i th
tr bi u di n m
k
c....M t
a ch t ho c y t ).
c li n
17
1.1 Minh h a d li u chu i th i gian c phi u S&P500
li u chu i th
ts
ng xem
ng (trend)
th
i
c gi
u ph
bi n xu t hi n l i sau nh ng kho ng th i gian c
- d li u bi
ng h n t 5 t i 7
pl
gi
- d li u chu i th i gia
t
tv
ph bi
l n chi
li u chu i th
ng tri th c ti m n
i ta b
li u chu i th
d li u chu i th
3 trong s 10 th
ra r
nc
li
t trong nh
nhi u s
tg
t
H uh t
li u chu i th
]
c
nh v c th
h tr
tri th c t t p d li u c a h .
u khi nh
nx
i th
m c (indexing) d
xu
t gom c m (clustering) do Aach and
xu t;
p (classificatio
xu
t d
t t ng k t
18
m b
ng s (2002) [1
xu
ki
m u ph
bi
P.Smyth [23
u.
M t trong nh
li u chu i th
lu t k t h
u tu n t ph bi
vi c d
44
t
ng g
n xu
c m u ph bi n r i
t k t h p. Ph m vi c a lu
c th
nh t.
i vi
chu i th
ng tu n t t t p d li u
n ch t tu n t c a d li u chu i th
c ghi nh n t i nh ng th
t
t ph bi
u tu n
u xu t hi
chu i con xu t hi
t ho
li u chu i th
d
li u tu n t
u tu n t ph bi n
li u chu i th i gian ta ph
th
ki n
i kho
m x y ra s ki n. T nh ng m u tu n t ph bi
ng d
ng g p trong d li u chu i th i gian ho
i quan h gi
M
u th
v
u tu n t ph bi
lo i c phi
ac
t k t lu
s
phi
p r i gi
h gi a 2 lo i c phi
c th
phi u X s
phi u Y s gi
t nd
n
i nhi u h tr
p trong kinh doanh.
n
19
1.2 M
M
ng tu n t
ng t t p d
li u th i gian. T t p d li u chu i th
c
c ti n), ta s
u tu n t ph bi
ng. T
t
gi
1.3 Ph
p d li u ch
a th
Nam), ta s ti
t c a chu i th i gian m
(th c hi n t
ch
ng th gi i (ho c c a Vi t
ng ho
c ti
pv
nh ng chu i th
u, ta s ti
ph bi
tc
d
ts
u ch nh v gi i thu
t c a d li u chu i th i gian. Cu
gi
ph bi n v
u tu n t
ng c
m u ph bi
pv
n
ng hi n di
u
c.
t
c hi
hi u su
mb
t th
c hi
m vi cho
s d ng l i m t s
t ph bi n trong
li u chu i th
t ti n x
t thu gi m chi u
tr ir
i sang chu
,v
t bi n
li u chu i tu n t ph bi n.
1.4
c ti n: hi
c bi
li
li u ch
ng trong n n kinh t th
gi
cung c
nh
ng c th
hi n m i quan h gi
t qu c
20
ng trong qu
n ra nh
c a t ng lo i c phi
th
cs
gi
i c phi
quan h
qu
a nh ng m i
ch n l a vi
phi
u
m b t r i ro.
c
a lu
m u (template) g
xu t ra m
ng tu n t ph bi
quy t v
u ph bi
p d li u chu i th
ng tu n t
t
chu i (sequential database) (ch
tr gi i
d li u
qua y u t th i kho ng gi
ki n A x
ki n
c B ch
u ph bi n tu n t
n gi l
th i kho ng v
t
t
m t chu i c th (m
ng nh
nh), lu
ng c th
vi
ti
r
ng. M u
n t nhi
y k t qu c a
n trong vi
ng c th (m t
chu i th
th
c th gi
c m i quan h
u ph bi n tu n t
ng). T
ng tri th
th
is d
ng th
xu t ra hai gi i
thu t nh m h tr
v
u ph bi n tu n t
y th c nghi
th c hi n kh
cs
li
ng
t s . Th
i n m trong m t gi i h
1.5 C
Lu
c c sau:
- Gi i thi
vi
p thi t cho
.
21
-
t: gi i thi u nh
m n n t ng
n th c c a lu
Ch
-
: gi i thi
thuy
-
ng ti p c n gi i quy t v
ng tu n t
: gi i thi
n
d li u chu i
th i gian.
- Th c nghi m
s d ng hai gi i thu t do lu
- K t lu n:
t qu khi
xu t.
a lu
u trong
22
T
2.1 D li u chu i th i gian
2.1.1 Time Series
M t chu i th
i tr s th c, m i tr bi u di n m t
i nh ng th
u nhau [18]. Ho c chu i th i gian T = t1, t2,
tt
m
t m bi
s th c ho
ghi nh n t i nh ng th
s
c
u nhau [17].
chu i th
chu i th
n. Chu i th
m
t chu i th i gian ch ch a
c ghi nh n m
nhau. Chu i th
n t t i nh ng kho ng th
i th
u
i m t th
nhi
2.1.2 Chu i con (Subsequence)
N
t chu i th
ch a m t ph n g m nh
chi
t chu i con C c a T s
nh ng v
i ti
ut im tv
C = tp
17].
p+n-1
v i1
p
m n+1
i ta s d ng c a s
i con t chu i th
trong m t chu i th
2.1 cho th
t
v 3 chu i con A,B,C
ub
2.1
i con trong m t chu i th i gian [17]
23
2
p (Match)
Cho m t s th
s
v ph m vi kho
nh
p gi a 2 chu
t chu i th i gian
T. Bi t r ng T ch a m t chu i con C b
u t i q, n u kho ng
u t i th
a 2 chu i nh
t chu
2.2
t chu
2
t chu i con M b t
cb
pv
c l i.
p c a chu i con C trong chu i th i gian
T [17]
pt
ng (Trivial Match)
Cho m t chu i th i gian T ch a m t chu i con C b
chu
p (match) v
pt
t
ut iv
ng v i C n u p = q (2 chu
t hi n m t chu
mb
u) ho c
ut iv
ym
2.3
ut iv
pt
v
pt
ng
i c a chu i C [17]
24
2
ng (Non-trivial match)
Hai chu
c g
ng n
c chia c t b i m t chu i
pv ic
2 chu
2.1.6 Motif
M t chu i con (subsequence) trong chu i th
l
cl pl
tm t
n ph
m
1
motif: chu
ng nh
C1)
K
i con Ck
p
ng nhi u nh t th a D(Ck, Ci) > 2R (v i v i 1
i
i lo i b
ng h
u ph n t
2.2 X
t motif.
li u chu i th i gian
i v i d li u chu i th
ni
cs d
ng ta ph
ts
h tr cho vi
2.2
(similarity measurement)
iv id
li u chu i th i gian, ta
m
gi a hai chu i th i gian.
: [17]
Cho 2 chu i th i gian
Q = q 1 , q2
n
C = c1, c2
n
Ta xem m i chu
chi
a m i chu
u (v
kho
c:
a hai chu i th
25
n
D(Q,C) =
(qi
ci ) 2
i 1
2.4
kho
n, d
m r ng cho nhi u d
li u chu i th i gian.
Khuy
m: nh y c m v i nhi
c khi 2 chu i th
chi
.
n th
ng [18]
T 2 chu i th
g qua vi
m t chu i th
m t
m c a chu i th
n th
Warping
ng (Dynamic Time
i thi u
r ng ho c thu h
u (signals) theo
chi u th
Gi s
X = x1 , x 2
Y = y1, y2
i th
n
m
m r ng c a t ng chu i th i gian b
t .S
c hai chu i m
ng cho hai chu
pl
n
c