Tải bản đầy đủ (.pdf) (101 trang)

Khai phá mẫu xu hướng tuần tự lên đối tượng từ tập dữ liệu chuỗi thời gian

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.2 MB, 101 trang )

I H C QU C GIA TP. HCM
IH
-----------------------

NGUY

NG TU N T
T

T PD

LI U CHU I TH I GIAN

: 604801

LU

TP. H

NG


2

I
IH

- TPHCM

ng d n khoa h


Ng c

ch m nh

c

ch m nh

Lu

TS. Ph

cb ov t
22

7

ih

2013

nH

m:

1.

n Anh (Ch t ch h

ng)


2. TS. Ph
3.
4.

c (PB2)
Ng

5.

n c a Ch t ch H

ng

ng Khoa qu
c s a ch a (n

CH T CH H

NG

n Anh


3

-

-


NHI M V LU

: 604801

I.
gian.
II. NHI M V

I DUNG

1.

m v d li u chu i th
th c ph bi

li u chu i th

tr

i s th

i thu t h

u tu n t ph bi n

2.

s
h c c a vi


t qu

tc

c c a lu

.

3. Chu n b d li u th c nghi m, d li
c hi n ti n x

khoa

li u th

ng ch ng

li u.

4.

ng tu n t
xu t hai gi i thu t h tr

5. Ti

i thu
i thu

III.


xu

t qu th c nghi m

xu

u.

I M V : 21/1/2013

IV.

M V : 21/6/2013

V.

NG D N:

Ng

NG D N

CH NHI M B

(H

TS.

(H


Ng

(H

O


4

L
il

ct

ng d

om

Ng

u ki n t t nh
i dung trong su t th i gian th c hi n lu n

il

ih

TP.HCM, ban ch nhi
, cung c


ng h c t p t t nh

c hi
L i cu
t nh t.

ng ki n th c qu


5

T LU
D li u chu i th

t hi n r t ph bi n trong nhi

a ch

c

ng tri th

trong d li u chu i th

m n

t quan tr

iv


nh ng

k t qu
li u s

nh ng m i quan h mang y u

t th i gian gi

ng/hi
y, lu

t trong th i gian s ng c a
xu t ra m

ng tu n t
gi i thu

cg

u t th i gian. Lu

p hai

ng tu n t ph bi

c

t t p d li u chu i th

bi

ng th

u xu

u tu n t ph

d li u tu n t ,

nhi u chu i th i

mang l
th i kho ng x y ra gi
v id

mang y u t
Hai gi i thu

c

ki n (c th

sau).

xu t s d

(brute-

ng c


-based). Tinh th

n
a c hai gi i thu t khi

u tu n t ph bi
t ng m c t

ng ti p c n l p
i thu

ng c i ti

t h
bi n gi
th

n so

x
i quan h ph

t qu th c nghi m t hai gi i thu
cm

hi u qu c a gi i thu t d

i gi i thu t


brute-force.
t qu th c nghi

i v i d li

lu

m h tr

lu t mang y u t th
tr ra quy

ct ,

nh.

d li u chu i th

ng d ng trong h


6

ABSTRACT
Nowadays, time series is present in many various application domains such
as finance, medicine, geology, meteorology, etc. Analyzing and mining time series
for useful information and hidden knowledge is very significant in those domains to
help users such as data analysts and managers get fascinating insights into important
temporal relationships of objects/phenomena along the time. Therefore, this thesis
introduced a notion of frequent temporal inter-object pattern and accordingly

proposed two frequent temporal pattern mining algorithms on a set of different time
series. As compared to frequent sequential patterns in sequential databases, frequent
temporal inter-object patterns are more informative with explicit and exact temporal
information automatically discovered from many various time series.
The two proposed algorithms which are brute-force and tree-based are
efficiently defined in a level-wise bottom-up approach dealing with the
combinatorial explosion problem in the association analysis task. When analyzing
experimental results, we can see the advantage of tree-based algorithm in
comparision with brute-force algorithm.
As shown in experiments on real financial time series, this thesis results can
be further used to efficiently enhance the temporal rule mining process on time
series for decision making support.


7

L
xin

c hi

ng, ngo i tr

nn

t qu tham kh o t

a lu

cn


b ng c p

Nguy

l y


8

: GI I THI
1.1 Gi i thi u v

....................................................................15

..............................................................................................15

1.2 M c ti

.................................................................................................19

1.3 Ph

..................................................................................................19
...................................................................................................19

1.5 C

.............................................................................................20
:


T .....................................................................22

2.1 D li u chu i th i gian ....................................................................................22
2.1.1 Time Series ................................................................................................22
2.1.2 Chu i con (Subsequence) ..........................................................................22
2.

p (Match) ..................................................................................23
pt

ng (Trivial Match) ...................................................23
ng (Non-trivial match) .................................24

2.1.6 Motif ..........................................................................................................24
2.2 X

li u chu i th i gian ...........................................................................24
(similarity measurement) ...........................................24

2.2.2 Chu

li u (Data Normalization) ..................................................27
u tu n t ph bi n ........................................................................27
.................................................................................................27

2.3.2 M t s

ng g
m s chi u x p x g p t


u tu n t ph bi n ........28
n PAA (Piecewise

Aggregate Approximation) ....................................................................................29
p bi

i chu i th i gian sang d

SAX (Symbolic

Aggregate Approximation) ....................................................................................30
i chu i th i gian sang d

ng .......33


9

i s quan h th i gian Allen ........................................................................35
2.8 Generic Dictionary trong C# ...........................................................................36
U

...................38

li u chu i th i gian ..............................38
iv

d li u chu i th i gian l n ............38
......................................................38


3.2 M t s gi i thu

u ph bi n.........................................................40

3.2.1 Gi i thu t Apriori ......................................................................................40
3.2.2 Gi i thu t FP-Growth ................................................................................41
3.2.3 Gi i thu t GSP ...........................................................................................42
3.2.4 Gi i thu t Prefix Span ...............................................................................42
3.3 M t s

li u chu i th i gian

...............................................................................................................................43
NG TI P C N GI I QUY T V

.............................47

4.1 Gi i thi u .........................................................................................................47
ng tu n t

d li u

chu i th i gian .......................................................................................................47
n chuy

i thu

i sang chu i th


xu

ng .......................49

i con ph bi n t chu i th

ng .50

ng tu n t ph bi

ng ..........52

ng tu n t

ng

p d li u chu i th i gian ..............................................................................54
4.3.1 Gi i thu t Brute Force ...............................................................................55
4.3.2 Gi i thu t d

.................................................................63

: TH C NGHI M ............................................................................74


10

5.1 T ng quan ........................................................................................................74
5.2 Ti n x


li u.............................................................................................75
d li u .................................................................................................75
th c nghi m ...........................................................................................76

5.4.1 Th c nghi m c

nh min-sup ...................................................................78

5.4.2 Th c nghi m c

nh chi

5.4.3 Th c hi n ki m tra s
5.4.4 Th c hi n ki m tra s

i th i gian ........................................82
ng k t h

u t o ra ........................86
a hai gi i thu t ....................................88

5.5 K t qu .............................................................................................................89
................................................................................................91
: K T LU N .....................................................................................92
6.1 T ng k t ...........................................................................................................92
a lu

....................................................................................92
.....................................................................93



11

M cl
1.1 Minh h a d li u chu i th i gian c phi u S&P500 ..................................17
2.1

i con trong m t chu i th i gian [17] ...........................................22

2.2

t chu

p c a chu i con C trong chu i th i gian T

[17] ............................................................................................................................23
2.3

pt

i c a chu i C [17] ...........23

nh 2.4

.....................................................................25

2.5

.............................................................26


2.6 Ma tr n cho DTW [18] ...............................................................................26
nh 2.7 Bi

i thu gi m chi u theo PAA [18] .....................................................30

2.8 Minh h

...................................31

2.9 Bi u di n b ng SAX [17-18] .....................................................................32
2.10 Minh h

trong b

3.1

i [21] ...........................................39

4.1 Minh h
4.2

........................................37

.................................................48
nm tc

4.3 Minh h

...................................................................49
n 2 ...........................................................................51


4.4 Minh h a cho m t m u tu n t ph bi n v

nc am

i

ng c phi u NY ....................................................................................................53
4.5 Minh h a cho m t m u tu n t ph bi n v
ng c phi

nc

i

.........................................................................................54

4.6 Minh h a quan h precedes ........................................................................57
4.7 Minh h a quan h

.........................................................................57

4.8 Minh h a quan h

.....................................................................58

4.9 Minh h a quan h

.....................................................................58


4.10 Minh h a quan h

........................................................................58

4.11 Minh h a quan h

.............................................................59

4.12 Minh h a quan h

........................................................................59


12

4.13

gi i thu t Brute Force ..................................................................63

4.14 Minh h a m t node trong c
4.15

c

....................................................64
..........................................................65

4.16 Minh h a c

c 3............................................................66


4.17 Minh h

c 4 .........................................................................68

4.18

............................................................................69

4.19

th t c BuildTree .........................................................................69

4.20

th t c CombineChildNodes ........................................................70

4.21

th t c CombineNode trong gi i thu t d

......71

4.22

i thu t d

.......72

4.23


..........................................73

5.1 Chu i th i gian ch
5.2 Chu i th i gian ch

phi u S&P500 t

...........76

c phi u BA t

...................77

5.3 Chu i th i gian ch

phi u CSX t

.................77

5.4 Chu i th i gian ch

phi u CAT t

.................77

5.5 Chu i th i gian ch

phi u DE t


5.6

....................78

th bi u di n th i gian ch y ng v i m t chu i th

min-sup = 5 ...............................................................................................................79
5.7

th bi u di n th i gian ch y ng v i hai chu i th i gian S&P500, BA

-sup = 5 ...........................................................................................................80
5.8

th bi u di n th i gian ch y ng v i ba chu i th i gian S&P500, BA,
-sup = 5 ..................................................................................................81

5.9

th bi u di n th i gian ch y ng v i b n chu i th i gian S&P500, BA,
-sup = 5 ........................................................................................81

5.10

th bi u di n th i gian ch y ng v

BA, CAT, CSX,
5.11

i th i gian S&P500,


-sup = 5 .........................................................................82

th bi u di n th i gian ch y ng v i chu i th

u

i th i gian = 100...........................................................................................83
5.12

th bi u di n th i gian ch y ng v i 2 chu i th i gian S&P500, BA
i th i gian = 100 ............................................................................84


13

5.13

th bi u di n th i gian ch y ng v i 3 chu i th i gian S&P500, BA,
i th i gian = 100 ...................................................................85

5.14

th bi u di n th i gian ch y ng v i 4 chu i th i gian S&P500, BA,
i th i gian = 100 .........................................................85

5.15

th bi u di n th i gian ch y ng v i 5 chu i th i gian S&P500, BA,
i th i gian = 100 ..................................................86


5.16 Minh h

ng tu n t

ng v i
....................................................90


14

M c l c b ng
2.1

v

d li u chu i [1] ................................................................28

2.2 B
2.3

[17-18] ........................................32
lo

s d

i sang chu

li u c phi u ..............................................................................................................34
2.4

2.5

i ba m i quan h

n Allen [29] ...................................................35

p quan h thu n ngh ch [29] ...........................................................35

5.1 K t qu khi ch y v i m t chu i th

-sup = 5..........79

5.2 K t qu khi ch y v i hai chu i th

-sup = 5 ...79

5.3 K t qu khi ch y v i ba chu i th

-sup =

5 .................................................................................................................................80
5.4 K t qu khi ch y v i b n chu i th i gian S&P500, BA, CAT, CSX ........81
5.5 K t qu khi ch y v

i th i gian S&P500, BA, CAT, CSX, DE

-sup = 5 ...........................................................................................................82
5.6 K t qu khi ch y v i chu i th i gian

i th i gian


= 100..........................................................................................................................83
5.7 K t qu khi ch y v i 2 chu i th

i

th i gian = 100 ..........................................................................................................84
5.8 K t qu khi ch y v i 3 chu i th i gian S&P500, BA, CAT .....................84
5.9 K t qu khi ch y v i 4 chu i th

u

i th i gian = 100...........................................................................................85
5.10 K t qu khi ch y v i 5 chu i th
chi

i th i gian = 100 .................................................................................86
5.11

ng s

ng k t h

u t o ra t hai gi i thu t ......87

5.12

t ng s

ng k t h p gi a hai gi i thu t...................................88


5.13

ng s

c t o gi a hai gi i thu t ............................88


15

: GI I THI
1.1 Gi i thi u v
S

i c a h th
ch s

um

t

nc

th thi u trong cu c s
u ch

c. T nh ng chi

kh


n ch

s h u nh

it

l n. Ta s d

x

t nhi

p,

p, th
N

m trong d li

i thi u

nc

n nay,

i s ti n b c
vi c thu th

d li


kh

l

ng d li u

c nh

tr

iv

cd
d ng kho d li

vi c truy xu

nh

xu t.

n nh
l

h tr cho
c
c trong kh

ng d li u kh ng


.

d

li

n (non

m

trivial) v i m

p l , ph bi

hi

nay, nhi

c kinh doanh m

to l n. N u ch d

n gi

c.

i mang l i l i nhu

ng d li
p th i. M t trong nh


th

bi n hi n nay

nh kinh nghi

bi t c

li

d li u ch
ng v

u

c ch n s h tr r t nhi u trong vi c d
phi

gi

phi

bi n


16

s d ng t p d li u chu i th
th c hi


ac

Hi n nay nhi

li u ch

c th c hi n trong th

cd

i ta s d

AutoRegressive Integrate Moving Average) do Box(1996) d

h

ng MA [3-5].

d

t nh

d

ngh

ng

-ron


li

-chung Fu,

-

li
it

ch (inter

t giao d ch (intra transactio

ph bi

tm

u

gi i h n trong m t lo i c phi u

n t nhi u lo i c phi
li
L.Feng d

C.Cho, Y.Wu [26]
ch d

ng d ch chuy n v


Khi nh

th

a Lu,

a c phi u [25].

li

ng nh

n d li u

chu i th i gian (time series data). D li u chu i th i gian xu t hi n m
nhi u ng d ng th c t trong cu c s

c kinh t -

li u ch

t
c y t (nh

th

a ch

y trong 4000 b c nh c a 15 t

i 75% s

c s n xu t t

nh d li u chu i th

c a d li u chu i th

i tr s th c, m i

i nh ng th

li u chu i th i gian, ta c n ph
th

u nhau [18]. Khi nh
n th t c

bi

t nh

mx

i. D li u chu i th

c trong nhi

i
ki


ng r t l
a 2 th

i v i d ng d li

nd

ki n x

i th t ho c d ch chuy n th

d li

i
ph bi n

t l n. M t chu i th

tr bi u di n m

k

c....M t

a ch t ho c y t ).

c li n



17

1.1 Minh h a d li u chu i th i gian c phi u S&P500
li u chu i th
ts

ng xem

ng (trend)

th

i

c gi

u ph

bi n xu t hi n l i sau nh ng kho ng th i gian c
- d li u bi

ng h n t 5 t i 7

pl
gi

- d li u chu i th i gia

t


tv
ph bi

l n chi

li u chu i th

ng tri th c ti m n
i ta b

li u chu i th

d li u chu i th

3 trong s 10 th
ra r

nc

li

t trong nh

nhi u s

tg

t

H uh t

li u chu i th

]
c

nh v c th
h tr

tri th c t t p d li u c a h .
u khi nh

nx

i th

m c (indexing) d
xu

t gom c m (clustering) do Aach and
xu t;

p (classificatio
xu

t d

t t ng k t


18


m b
ng s (2002) [1

xu

ki
m u ph

bi

P.Smyth [23

u.

M t trong nh

li u chu i th

lu t k t h

u tu n t ph bi

vi c d

44

t

ng g


n xu

c m u ph bi n r i

t k t h p. Ph m vi c a lu

c th

nh t.
i vi
chu i th

ng tu n t t t p d li u

n ch t tu n t c a d li u chu i th

c ghi nh n t i nh ng th

t

t ph bi

u tu n

u xu t hi

chu i con xu t hi

t ho


li u chu i th

d

li u tu n t

u tu n t ph bi n

li u chu i th i gian ta ph
th

ki n

i kho

m x y ra s ki n. T nh ng m u tu n t ph bi

ng d

ng g p trong d li u chu i th i gian ho
i quan h gi
M

u th

v

u tu n t ph bi


lo i c phi

ac

t k t lu

s

phi

p r i gi

h gi a 2 lo i c phi

c th
phi u X s

phi u Y s gi
t nd
n

i nhi u h tr

p trong kinh doanh.

n


19


1.2 M
M

ng tu n t

ng t t p d

li u th i gian. T t p d li u chu i th
c

c ti n), ta s

u tu n t ph bi

ng. T

t
gi

1.3 Ph
p d li u ch

a th

Nam), ta s ti

t c a chu i th i gian m

(th c hi n t
ch


ng th gi i (ho c c a Vi t

ng ho

c ti

pv

nh ng chu i th

u, ta s ti

ph bi

tc
d

ts

u ch nh v gi i thu

t c a d li u chu i th i gian. Cu
gi

ph bi n v

u tu n t

ng c


m u ph bi
pv

n

ng hi n di

u

c.
t

c hi

hi u su

mb

t th

c hi

m vi cho

s d ng l i m t s

t ph bi n trong

li u chu i th

t ti n x

t thu gi m chi u
tr ir

i sang chu

,v

t bi n
li u chu i tu n t ph bi n.

1.4
c ti n: hi
c bi

li

li u ch

ng trong n n kinh t th

gi

cung c

nh

ng c th
hi n m i quan h gi


t qu c


20

ng trong qu

n ra nh

c a t ng lo i c phi
th

cs

gi

i c phi

quan h
qu

a nh ng m i

ch n l a vi

phi

u


m b t r i ro.
c

a lu

m u (template) g

xu t ra m

ng tu n t ph bi

quy t v

u ph bi

p d li u chu i th

ng tu n t

t

chu i (sequential database) (ch

tr gi i

d li u

qua y u t th i kho ng gi

ki n A x


ki n

c B ch
u ph bi n tu n t

n gi l

th i kho ng v

t

t

m t chu i c th (m

ng nh

nh), lu

ng c th

vi

ti

r

ng. M u


n t nhi

y k t qu c a

n trong vi

ng c th (m t

chu i th

th

c th gi

c m i quan h
u ph bi n tu n t

ng). T

ng tri th

th

is d

ng th

xu t ra hai gi i

thu t nh m h tr

v

u ph bi n tu n t

y th c nghi

th c hi n kh

cs

li

ng

t s . Th

i n m trong m t gi i h

1.5 C
Lu

c c sau:
- Gi i thi

vi

p thi t cho
.



21

-

t: gi i thi u nh

m n n t ng

n th c c a lu
Ch

-

: gi i thi

thuy
-

ng ti p c n gi i quy t v
ng tu n t

: gi i thi

n
d li u chu i

th i gian.
- Th c nghi m
s d ng hai gi i thu t do lu
- K t lu n:


t qu khi
xu t.
a lu

u trong


22

T
2.1 D li u chu i th i gian
2.1.1 Time Series
M t chu i th

i tr s th c, m i tr bi u di n m t

i nh ng th

u nhau [18]. Ho c chu i th i gian T = t1, t2,

tt

m

t m bi

s th c ho

ghi nh n t i nh ng th


s

c

u nhau [17].
chu i th

chu i th

n. Chu i th

m

t chu i th i gian ch ch a

c ghi nh n m

nhau. Chu i th

n t t i nh ng kho ng th
i th

u

i m t th

nhi

2.1.2 Chu i con (Subsequence)

N

t chu i th

ch a m t ph n g m nh
chi

t chu i con C c a T s
nh ng v

i ti

ut im tv
C = tp

17].

p+n-1

v i1

p

m n+1

i ta s d ng c a s
i con t chu i th
trong m t chu i th

2.1 cho th


t
v 3 chu i con A,B,C

ub

2.1

i con trong m t chu i th i gian [17]


23

2

p (Match)

Cho m t s th
s

v ph m vi kho

nh

p gi a 2 chu

t chu i th i gian

T. Bi t r ng T ch a m t chu i con C b
u t i q, n u kho ng


u t i th

a 2 chu i nh
t chu

2.2

t chu

2

t chu i con M b t
cb

pv

c l i.

p c a chu i con C trong chu i th i gian
T [17]

pt

ng (Trivial Match)

Cho m t chu i th i gian T ch a m t chu i con C b
chu

p (match) v

pt

t

ut iv

ng v i C n u p = q (2 chu

t hi n m t chu

mb

u) ho c

ut iv
ym

2.3

ut iv

pt

v

pt

ng

i c a chu i C [17]



24

2

ng (Non-trivial match)

Hai chu

c g

ng n

c chia c t b i m t chu i

pv ic

2 chu

2.1.6 Motif
M t chu i con (subsequence) trong chu i th
l

cl pl

tm t

n ph
m

1

motif: chu

ng nh

C1)
K

i con Ck

p

ng nhi u nh t th a D(Ck, Ci) > 2R (v i v i 1

i

i lo i b

ng h

u ph n t

2.2 X

t motif.

li u chu i th i gian
i v i d li u chu i th


ni

cs d

ng ta ph

ts

h tr cho vi

2.2

(similarity measurement)

iv id

li u chu i th i gian, ta

m

gi a hai chu i th i gian.
: [17]
Cho 2 chu i th i gian
Q = q 1 , q2

n

C = c1, c2

n


Ta xem m i chu
chi

a m i chu

u (v
kho

c:

a hai chu i th


25

n

D(Q,C) =

(qi

ci ) 2

i 1

2.4

kho


n, d

m r ng cho nhi u d

li u chu i th i gian.
Khuy

m: nh y c m v i nhi

c khi 2 chu i th

chi

.

n th

ng [18]

T 2 chu i th

g qua vi

m t chu i th

m t

m c a chu i th
n th


Warping

ng (Dynamic Time

i thi u
r ng ho c thu h

u (signals) theo

chi u th
Gi s
X = x1 , x 2
Y = y1, y2

i th
n
m

m r ng c a t ng chu i th i gian b
t .S

c hai chu i m
ng cho hai chu

pl

n
c



×