--------------
13
HCM
Tp. HCM ngày 22 tháng 7
2013
1.
2.
3. GV
4.
5.
Thành
i
-
-
: 60.48.01
I.
II.
Tìm
p
III.
: 02/07/2012
IV.
V.
: 21/06/2013
: TS.
Tp. HCM, ngày 21
ii
iii
T
. Trong
.
,
gia
L
.
iv
ABSTRACT
In recent years, data mining has been applied to education, which leads to a
new research area called Educational Data Mining (EDM). In this area,
association rule mining is one of the most used techniques. Therefore, we
have found out research related to the incremental quantitative association
rule mining problem on EDM. This problem is important to support
knowledge discovery from educational data incrementally coming every
semester and every year. However, it has not yet been investigated much in
the existing works.
So this thesis proposes a quantitative association rule mining approach
which is appropriate for an academic credit system which has been popularly
applied in the world as well as the incremental problem. The mined
quantitative rules will give more information to users than the traditional
boolean ones. As a result, such rules will contribute to the support of
educational decision making more conveniently.
v
y
Tp. HCM, ngày 20 tháng 6
vi
M CL C
--------------------------------------------------- ii
---------------------------------------------------------------------------- iii
---------------------------------------------------------------- iv
ABSTRACT ------------------------------------------------------------------------------- v
------------------------------------------------------------------------ vi
------------------------------------------------------------------------ vii
-------------------------------------------------------------- ix
-------------------------------------------------------------- x
---------------------------------------------- 1
--------------------------------------------------------------- 1
-------------------------------- 4
------------------------------------------------------------- 5
----------------------------------------------------------- 5
------------------------------------------------------------ 5
----------------------------------------------- 7
---------------------------------------------------------------------- 7
------------------------------------------------------ 7
----------------------------------------------------- 10
--------------------------------------------------- 10
-------------- 12
----------------------------------------------------- 12
-tree ----------------------------------------------------- 14
-------------------------------------------- 19
vii
----------------------------------------------------------- 19
-------------------------------------------------- 27
---------- 31
3.1
------------------------------------- 31
---------------------------- 33
--------------------------------------------------- 34
---------------- 40
------------------------------------------------------------ 40
------------------------------------------------------- 40
--------------------------------------------------------- 42
------------------------------ 42
-
----- 45
-Growth ------------------------------------ 46
--------------------------- 47
----------------- 50
5.1
--------------------------------------------------- 50
------------------------------------------------------------------- 50
--------------------- 51
------------------------------------------------- 59
-------------------------------------------------- 62
---------------------------------------------------------- 67
--------------------------------------------- 67
--------------------------------------------------------- 67
------------------------------------------------- 68
--------------------------------------------------------------- 69
------------------------------------------------------------------------ 74
viii
DANH M C HÌNH
Hình 2 Hình 2 - 2. Cây FPHình 2 - 3. Cây FPHình 2 - 4. Cây FPHình 2 Hình 2 Hình 2 -
............................................................. 14
.................... 18
......................... 19
.................................................................... 19
........... 20
..................... 22
prelarge và large 1-itemset .......................... 26
Hình 4 ........................................................................................................... 41
Hình 5 Hình 5 Hình 5 Hình 5 Hình 5 Hình 5 Hình 5 Hình 5 Hình 5 -
........................................ 60
........................................ 61
........................................ 61
........................................ 61
........................................ 62
........................................ 62
.................................. 63
................................................................. 64
.......................................................................... 65
ix
DANH M C B NG
B
-
............................................................................ 17
............................ 17
....... 18
........................................................................... 24
............................................... 24
................. 25
......... 25
.................... 27
-
.......... 35
-
.......................... 37
-
............ 38
-
.................................................................... 42
..................................................................... 43
............................................................ 43
............................................................... 44
- 5.
-
.................................................................. 50
............ 52
............ 53
............ 54
............ 55
............ 56
............ 57
............ 58
................................. 59
................................. 63
x
I THI U V
1.1 Gi i thi
tài
, k
Discovery in Databases
c c
.
,21]. KDD nhìn
-
mining
-
, khai
á [1,21].
-terrorism)
k
Mining EDM) [5].
1
Educational Data Mining - JEDM).
- LMS),
ITS) [6] và Trí
AIED)
/mơ hình
,
(sequential pattern mining) [3,4,5,8].
-
nhà
mà
2
môn
A
B
và
:
computer
antivirus_software [support=2%;confidence=60%]
-rút.
et al
3
viên (candidate itemsets
kh
et al
-
et al.[38]
-tree
-tree (Fast Updated FP9,
-large
-
et al
toán Pre-
-
.T
1.2 M c tiêu và ph m vi nghiên c u c
.
4
tài
viên khóa 2005-
-
2012.
tài
c
N
này thì
c ti n
5
,
mà sinh
viên này
N
thì giáo
chun ngành,
,
mà cịn
t khác,
,
6
LÝ THUY T
và
2.1 T ng quan [21]
2.1.1 Phân lo i lu t k t h p
et al. [19]
-
(2.1)
vào các tiêu chí sau:
types of values
:
Boolean association rule
item) thì
à
(quantitative item
association rule
quantitative
7
(2.2)
dimensions
-dimensional association rule).
buys.
age, income, buys trong
multidimensional association
rule).
levels of abstractions) liên
quan
(2.3)
(2.4)
computer
laptop computer
(multilevel association rules).
(single-level association rules).
(nature of the association) liên
(strong association rules).
8
(statis
rules).
correlation
[21, 43]
(numeric)
(boolean atrribu
sau:
{age[25, 40], gender[female]}
{salary[13500, 18700]} (supp = 0.03, conf = 0.8)
13, 500 USD
cơng nhân là
có
và
18, 700 USD trong khi 80% (confidence)
18, 700 USD.
minsup
minconf
minsup và minconf.
attribute, value
hóa (discretization)
interval
9
attribute, interval)
2.1.2 Khai phá lu t k t h p
bài toán con là (1)
ta có Lk
k
= {x1 , x1
xk
X Lk
Lk\
(Apriori-like
-growth (frequent-pattern growthvertical data
format).
FP-
generate-and-test) thì FP-tree (Frequent-Pattern tree).
et al
FUP (Fast-Updated algorithm
2.2 M t s ki n th
I = {i1
X I
itemset.
n}
n
n [13]
(item)
itemset
10
k-
D
(transaction) T
th
D
sup(X) (support(X)
X
I
X
I.
i1
k>,
X trong D, kí
s(X)
D.
sup X = Pr X =
X
T
record)
T D| X T
(2.5)
D
I
minsup
X
(frequent itemset
large itemset
sup(X) minsup.
minsup
không
X
Y= .
X
Y
X, Y
antecedent), Y
consequent
và
support
sup( X
confidence
T D| X
Y T
(2.6)
D
X
conf ( X Y ) = p( Y I | X I )=
11
.
Y
X
Y ) = Pr ( X Y )=
I, và
Y
p (Y T X T)
p(X T)
conf(X
Y ) là
(2.7)
p ph bi
Tomasz Im
hash-tree
pruning
Lk = {l1, l2,
li
k-
Ck = {c1, c2
itemset và count
ci
k-
INPUT
D
OUTPUT
ci
minsup
Answer
D
L1 = {large 1-itemset};
for (k = 2; Lk-1
; k+ + ) do begin
Ck = apriori_gen(Lk-1
forall
t
Ck
D do begin
Ct = subset(Ck, t
forall
t
c
Ct do
c.count++ ;
end ;
Lk = {c
Ck | c
minsup }
12
end ;
Answer =
k
Lk ;
k-
(k+1)-
-
L1
item
minsup. Trong
2 pha:
Ck
Lk-1
apriori_gen().
Ck.
Hàm apriori_gen()
Join step
Lk-1
k-
1)-itemset l1, l2
cho Ck
l1[1], l1
kkk- 1
l1 và l2 (Ck
1[k -1] < l2[k -1]
l1[k-2], l1[k-1], l2[k-
Prune step
itemset ci
Ck
kk-1)-itemset s, s
k-1)-itemset s, s
minsup vì s
sup(s) < minsup
Ck.
ci và s
ci và s
Lk-1
ci
ci
E, K} và minsup = 60%.
13
Lk-1
sup(s) <
s nên sup
Hình 2 - 1.
-tree:
Apriori -like
anti-monotone Apriori heuristic
4
7
-
14
-itemset