UNIVERSITY OF DANANG
UNIVERSITY OF SCIENCE AND TECHNOLOGY
----------
NGUYEN TAN HOANG
RECOMMENDER SYSTEM BASED ON
STATISTICAL IMPLICATIVE FIELD
Specialization: Computer Science
Code: 9 48 01 01
DOCTORAL THESIS SUMMARY
Danang – 2022
The dissertation is completed at:
UNIVERSITY OF SCIENCE AND TECHNOLOGY UNIVERSITY OF DANANG
Academic Instructors:
1. Associate Professor Huynh Xuan Hiep, PhD.
2. Huynh Huu Hung, PhD.
Opponent 1:……………………………..……………
Opponent 2:………………...………...………………
Opponent 3:………………...……...…………………
The dissertation will be defended before the Board of thesis review.
Meeting at: University of Science and Technology – The University of
Da Nang.
At ........ hour ......... day ....... month ....... year ........
The dissertation is available at:
- National Library
- Information and Learning Center, University of Da Nang
1
PREFACE
1. The urgency of the thesis
In the online world, where information is growing at an exponential
rate with the growth of e-commerce, online storage services and
information delivery services, finding the right information for demand
is a challenge for users to be able to make the right decisions. For
businesses and organizations in the field of services and commerce,
obtaining customers' trust in search results is extremely important and
is a really difficult task. Recommender systems quickly prove to be a
very useful tool in assisting in providing necessary and relevant
information to users and commercial and service providers in such
situations. They support in effective decision making, saving time and
effort. However, to meet the increasing demand for quality as well as
quantity of recommendations, the study of new recommender
algorithms or improvement of recommender systems and improvement
of the quality of recommendations, limits or weaknesses of the current
recommender system approaches is the current research trend. This
thesis focuses on proposing a new recommender model based on the
statistical implication field in order to improve the accuracy and
processing time of the recommendations as well as expand the
recommendation capacity to reflect the relationship between the
user/item to a certain degree of implication. Deploy proposed and
experimental models on standard data sets to evaluate and compare
results with other effective models.
2. Objectives, objects and scope of research of the thesis
2.1. Research objectives
The objective of the thesis is to survey the recommender system and
study the basic content of statistical implication, especially implicative
variation and implication fields, as a basis for researching and
proposing a implication rule mining framework (association rule
satisfying the condition of statistical implication), and from that, we
propose the application of a implication rule mining framework in
building an recommender model based on the implication field.
2.2. Research subjects
The subjects of the thesis include: The measures of implication
variation in the implication field formed from the process of statistical
2
implication variation; Collaborative filtering recommender models
based on implication variation and recommender models based on
statistical implication field.
2.3. Research scopes
The scope of the study is:
Learning the theory of statistical implication analysis, especially
implication variation and statistical implication field; collaborative
filtering recommender, studies on recommender systems based on
statistical implication analysis to serve as the basis for the proposal; and
propose new recommender models based on implication field that can
be applied on both binary and non-binary data and improve
recommender efficiency (as measured by the accuracy of the prediction
item, classification of the recommended item, predicted item ratings).
3. Research methodology
Literature review and experiment are two main research methods to
be used by this dissertation.
4. Contribution of the thesis
- Firstly, propose a set of measures of statistical implication
variation (including four measures of implication index variation and
four measures of implication intensity variation) to serve as a basis for
building functional rule mining frameworks and consulting models.
- Secondly, propose an implication association rule mining
framework (implication rule) based on the integration of association
rule mining framework (the mining framework using support and
reliability) with the implication variability measure.
- Thirdly, propose recommender models include (1) The
recommender model based on association rule mining using
implication variation to generate recommendations based on the
implication isoequivalence of association rules and is applied to binary
data sets; (2) The recommender model based on the statistical
implication field developed from the recommender model based on
association rule mining using implication variation and implication rule
mining framework can be applied on both binary data and non-binary
according to significance levels of statistical implication on association
rules, users, and data items.
3
- Fourthly, data partitioning based on the item evaluated on each
transaction instead of the data partitioning method based on the number
of transactions in the data set to improve the quality of training and
evaluation of the recommender model and is applied to the implication
field-based recommender model.
- Finally, develop tools to build, train and evaluate the
implicationfieldRS recommender system and test scenarios for the
proposed recommender model using this tool.
5. Thesis structure
The thesis is organized into parts as the followings.
The opening part introduces the urgency, objectives, objects,
research scope and research methods of the thesis.
Chapter 1: An overview of statistical implicative field and
recommender system.
Chapter 2: Models of recommender system based on implication
field, including a collaborative filtering recommender model based on
implication variation and recommendation model based on statistical
implication field.
Chapter 3: Experiment and evaluate the results.
The conclusion part includes the main contributions and future
work.
Appendices include: (1) Proving the asymmetry of the measures
of statistical implication; and (2) Prove the equivalence of the
implication index formulas in the case of binary data.
4
CHAPTER 1. AN OVERVIEW OF STATISTICAL
IMPLICATION FIELD AND RECOMMENDER SYSTEM
1.1. Statistical implication analysis
Overview of statistical implication analysis (SIA), a method for
studying the rule-like relationship between variables and/or between
variables and rules proposed by Regis Gras in the 1990s, then, SIA
proposes implication measures that have statistical, asymmetrical,
nonlinear properties and rely on statistical probability to evaluate the
relationship between data variables.
1.1.1. Statistical implication measures
SIA includes two main measures to evaluate the degree of
implication of the relationship a → b, which is the implication index
presented by the formula (1.1)
𝑞(𝑎, 𝑏̅)
𝑛𝐴 𝑛𝐵̅
𝑛 ,
𝑖𝑓 𝑎, 𝑏 ∈ {0,1}
𝑛𝐴 𝑛𝐵̅
√
𝑛
𝑛
𝑛𝐵̅
=
𝐴
̅
∑𝑖∈𝐸 𝑎(𝑖)𝑏(𝑖) −
𝑛
,
𝑖𝑓 𝑎, 𝑏 ∈ [0,1]
2
2
2
2
2
2
𝑠𝐵̅ + 𝑛𝐵̅ )
√(𝑛 𝑠𝐴 + 𝑛𝐴 )((𝑛
3
𝑛
{
𝑛𝐴𝐵̅ −
(1.1)
And the implication intensity, according to the formula (1.2)
1
𝜑(𝑎, 𝑏) = {√2𝜋
∞
𝑡2
∫ 𝑒 − 2 𝑑𝑡 ,
𝑛𝐵 ≠ 𝑛
(1.2)
𝑞(𝑎,𝑏̅ )
0,
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
In which, the lower the implication index, the higher the implication
intensity and the higher the level of implication.
5
1.1.2. Implication index variation and implication field
Variation of 𝑞(𝑎, 𝑏̅) for variables(𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ ) creates a scalar
vector field 𝐶 which in Frechet's geometric sense is expressed by the
formula (1.3):
𝜑𝑑𝑞 =
𝜕𝑞
𝜕𝑞
𝜕𝑞
𝜕𝑞
𝑑𝑛 +
𝑑𝑛 +
𝑑𝑛 +
𝑑𝑛 ̅
𝜕𝑛
𝜕𝑛𝐴 𝐴 𝜕𝑛𝐵 𝐵 𝜕𝑛𝐴𝐵̅ 𝐴𝐵
= 𝑔𝑟𝑎𝑑𝑞. 𝑑𝑀
(1.3)
Where 𝑀 is a point with coordinates(𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ ) of the scalar
vector field 𝐶, 𝑑𝑀 is the differential component vertor of the variation
and grad q is the partial derivative vertor of the variation. This gradient
field satisfies the Schwartz criterion for mixed differential for each pair
of variables. 𝑋, 𝑌 ∈ {𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ } as formula (1.4) and is called the
implication field in this thesis.
𝜕 𝜕𝑞(𝑎, 𝑏̅)
𝜕 𝜕𝑞(𝑎, 𝑏̅)
(
)=
(
)
𝜕𝑛𝑋
𝜕𝑛𝑌
𝜕𝑛𝑌
𝜕𝑛𝑋
(1.4)
The implication field generated from the variation of the implication
index, consisting of the set of equipotential surface of the implication
rule with the same statistical implication value determined by the
equation (1.5)
𝑞(𝑎, 𝑏̅) −
𝑛𝐴 𝑛𝐵̅
𝑛 =0
𝑛𝐴 𝑛𝐵̅
√
𝑛
𝑛𝐴𝐵̅ −
(1.5)
1.2. Recommender system
1.2.1. definition
A recommender system consists of a set of users denoted by 𝑈
(users), and a set of items denoted by 𝐼 (items). Furthermore, the set of
ratings in the system is represented by matrice 𝑅𝑈×𝐼 , and the set of
possible values for a rating is 𝑆 (Scores). The recommender system
model is built as a function (formula (1.6)).
𝑓: 𝑈 × 𝐼 → 𝑆
(1.6)
And its task is to predict the rating 𝑓(𝑢, 𝑖) of a user 𝑢 ⊂ 𝑈 for a new
item 𝑖 ⊂ 𝐼. This function is then used to recommend the target user ua
an item 𝑖 ∗ which evaluates to the highest value estimate as the formula
(1.7).
𝑖 ∗ = 𝑎𝑟𝑔 max 𝑓(𝑢𝑎 , 𝑗)
𝑗∈𝐼\𝐼𝑢
(1.7)
6
1.2.2. Evaluation
The evaluation of the recommender model will be carried out
according to the approaches: splitting, bootstraping and k-fold crossevaluation. There are two common groups of measures to evaluate
recommender systems, namely, the group of rating prediction accuracy
measures (MAE, MSE, RMSE) and the group of items classification
accuracy recommendation (precision, recall, F1).
1.2.3. Classification
In terms of techniques, the recommender system is built according
to content filtering; collaborative filtering, including memory-based
(user-based, item-based) and model-based (build machine learning
models for recommender systems); other techniques and hybridization
of techniques. Among them, the most commonly used and effective
technique is collaborative filtering
1.2.4. Research status and recommendations
Learn about the research and development of recommender systems
in general and recommender systems based on collaborative filtering in
particular, especially collaborative filtering recommender systems
based on association rule mining and collaborative filtering models
based on statistical implication analysis and then point out their
limitations and propose a research direction to build recommender
systems based on statistical implication field.
1.3. Chapter summary
Chapter 1 focuses on obtaining the understanding on (1) statistical
implication analysis, especially implication variation and statistical
implication field; (2) recommender system including definition,
classification, evaluation, application domains. Besides, this chapter
also presents weaknesses of existing recommender systems based on
rules mining and statistical implication analysis as the basis for
sketching research proposals.
7
CHAPTER 2. RECOMMENDATION MODELS BASED ON
STATISTICAL IMPLICATION FIELD
2.1. Collaborative filtering recommendation model based on
implicative variation
2.1.1 The problems of rule-based recommender system
For recommender systems, association rule mining (ARM)
algorithms face a number of problems that make the quality of the rules
not good enough for recommendations, including (1) ARM framework
only deals with binary data; (2) The time and quality requirements of
the rule for the recommended problem have not been met; (3)
Confidence of the rule is insensitive and does not show the correlation
between premises and consequences; (4) Symmetrical measures such
as confidence, lift and some other measures are not suitable for
recommendation problems where the role of items/users is not always
the same; (5) The support decreases with the increase in the size of the
rule; (6) The number of rules generated increases exponentially with
the number of items; and (7) The nature of the association rule mining
framework is not concerned with the number of counter-examples,
while in fact a rule has a confirmation number (𝑛𝐴𝐵 ) the higher and the
number of counterexamples (𝑛𝐴𝐵̅ ) and the lower the counter-examples,
the stronger the rule. Therefore, using statistical implication analysis
measures is a possible solution to address these limitations.
2.1.2 Statistical implication variation measure and the threshold
of implication variation
Measures are one of the key elements in building recommendation
models, for a collaborative filtering recommender model based on
association rule mining using implication variation measures, in
addition to the framework measures. In order to exploit the rule as
support and reliability, it is necessary to build a measure of implication
variation to filter out a set of implicative equipotential surface of the
rules as the basis for recommendations of the recommender model.
Statistical implication variation measure
8
The proposed measures used for the recommender model based on
association rule mining using implication variation include measures of
the value of the implication index. 𝑞(𝑎, 𝑏̅) and magnitude of
implication 𝜑(𝑎, 𝑏) variation according to factors 𝑛, 𝑛𝐴 , 𝑛𝐵 và 𝑛𝐴𝐵̅
described in Table 1.1.
Table 1.1 Statistical implication variation measure
Measure
Description
Implication index
𝑞𝑛
variation according
to 𝑛
Implication index
𝑞𝑛𝐴
variation according to
𝑛𝐴
𝑞𝑛𝐵
Implication index
variation according to
𝑛𝐵
formulas
.𝑞(𝑎, 𝑏̅) + ∆𝑞𝑛 = 𝑞(𝑎, 𝑏̅) +
𝑛𝐴 𝑛𝐵
̅
𝑛
𝜑𝑛
𝜑𝑛𝐴
𝜑𝑛𝐵
𝜑𝑛𝐴𝐵̅
Implication index
variation according to
𝑛𝐴𝐵̅
Implication intensity
variation according to
𝑛
Implication intensity
variation according to
𝑛𝐴
Implication intensity
variation according to
𝑛𝐵
Implication intensity
variation according to
𝑛𝐴𝐵̅
(𝑛𝐴𝐵̅ +
)
.𝑞(𝑎, 𝑏̅) + ∆𝑞𝑛𝐴 = 𝑞(𝑎, 𝑏̅) +
−
1 𝑛𝐴𝐵
̅
2
√
𝑛𝐵
̅
3
2
𝑛
1
𝑛
( ) − √ 𝐵̅
𝑛
2 𝑛
𝐴
𝐴
𝑛
.𝑞(𝑎, 𝑏̅) + ∆𝑞𝑛𝐵 = 𝑞(𝑎, 𝑏̅) +
1
2
𝑛
𝑛𝐴𝐵̅ ( 𝐴)
−
𝑞𝑛𝐴𝐵̅
1
2√𝑛
1
2
𝑛
−
1
2
3
1
1 𝑛
(𝑛 − 𝑛𝐵 )−2 + ( 𝐴)2 (𝑛 −
2
𝑛
𝑛𝐵 )
.𝑞(𝑎, 𝑏̅) + ∆𝑞𝑛𝐴𝐵̅ = 𝑞(𝑎, 𝑏̅) +
1
𝑛 (𝑛−𝑛𝐵 )
√ 𝐴
𝑛
.𝜑(𝑎, 𝑏) + ∆𝜑𝑛 = 𝜑(𝑎, 𝑏) +
𝑞𝑛 (𝑎,𝑏̅ ) −𝑡
1
𝑒 2
∫
√2𝜋 𝑞(𝑎,𝑏̅ )
2
𝑑𝑡
.𝜑(𝑎, 𝑏) + ∆𝜑𝑛𝐴 = 𝜑(𝑎, 𝑏) +
𝑞𝑛 (𝑎,𝑏̅) −𝑡
1
𝑒 2
∫ 𝐴
√2𝜋 𝑞(𝑎,𝑏̅ )
2
𝑑𝑡
.𝜑(𝑎, 𝑏) + ∆𝜑𝑛𝐵 = 𝜑(𝑎, 𝑏) +
𝑞𝑛 (𝑎,𝑏̅ ) −𝑡
1
𝑒 2
∫ 𝐵
√2𝜋 𝑞(𝑎,𝑏̅ )
2
𝑑𝑡
.𝜑(𝑎, 𝑏) + ∆𝜑𝑛𝐴𝐵̅ = 𝜑(𝑎, 𝑏) +
1
𝑞𝑛
̅
(𝑎,𝑏̅ )
∫ 𝐴𝐵
√2𝜋 𝑞(𝐴,𝐵̅)
𝑒
−𝑡2
2
𝑑𝑡
Threshold of Statistical implication variation
In the experiment, on a equipotential surface consisting of a set of
rules whose implication values are approximately the same with an
implication threshold 𝜃, his threshold of implication variation needs to
be determined. Depending on the measure, there is a threshold of
implication index variation and a threshold of implication intensity
variation.
9
Threshold of implication index variation
Implication index varies according to one of the values , 𝜉 ∈
(𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ ) where there is an item added or removed from the data
set and is determined by the equation (2.1)
𝛿𝑞(𝑎, 𝑏̅)
𝛥𝑞(𝑎, 𝑏̅)
= 𝑚𝑎𝑥
+ 𝑜 (𝑞(𝑎, 𝑏̅)),
𝛿𝜉
𝛥𝜉
𝜉 ∈ (𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ )
(2.1)
Threshold of implication intensity variation
Just like the implication index variation, the threshold of implication
intensity variation is determined by the equation (2.2).
𝜕𝜑(𝑎, 𝑏)
𝛥𝜑
= 𝑚𝑎𝑥
+ 𝑜(𝜑(𝑎, 𝑏)), 𝜉 ∈ (𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ )
𝛥𝜉 𝛥𝜉
𝜕𝜉
2.1.3
(2.2)
Association rule and rules mining framework
2.1.3.1 Association rule
To build the model, the association rules are modeled and expressed
in the form of statistical implication analysis, as the equation (2.3).
ℛ𝐴𝑆𝑆
𝑛𝐴 ≤ 𝑛, 𝑛𝐵 ≤ 𝑛
𝑛𝐵 ≤ 𝑛, max(0, 𝑛𝐴 + 𝑛𝐵 − 𝑛)
|
≤ 𝑛𝐴𝐵̅ ≤ min(𝑛𝐴 , 𝑛𝐵 )
= (𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ )
|
𝑙𝑒𝑛𝑔ℎ𝑡ℛ𝐴𝑆𝑆 ≤ 𝑘
{
|𝑟ℎ𝑠ℛ𝐴𝑆𝑆 | = 1
(2.3)
}
where the ℛ𝐴𝑆𝑆 rule is represented by the set of 4 (𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵 )
from the point of view of statistical implication as formula (2.3), and
satisfy the constraints 𝑛𝐴 ≤ 𝑛, 𝑛𝐵 ≤ 𝑛 , 𝑛𝐵 ≤ 𝑛, max(0, 𝑛𝐴 + 𝑛𝐵 −
𝑛) ≤ 𝑛𝐴𝐵 ≤ min(𝑛𝐴 , 𝑛𝐵 ), the lengh of rule must be smaller than the
threshold k and the right hand side of the rule is limited to 1 item to
eliminate the long rules that are insignificant in the recommended field,
and to reduce the mining time as well as to limit the number of rules in
the management and calculating scope.
2.1.3.2 The implication variation-based association rule mining
framework
The implication variation-based association rule mining framework
𝐹ℛ 𝐴𝑆𝑆 is used to generate the set of association rules (ℛ𝐴𝑆𝑆 ) using
10
Apriori algorithm and support and confidence thresholds (𝑚𝑖𝑛𝑠𝑢𝑝 and
𝑚𝑖𝑛𝑐𝑜𝑛𝑓). Then combined with the implication variation measure as
shown in Table 1.1 to filter out the rules with the highest statistical
significance. This framework algorithm is based on the customized
association rule mining framework as shown in Figure 2.1.
Figure 1.1 The implication variation-based association rule mining framework
This framework is modeled as the equation (2.4) and is operated in
the following steps (1) Use the frequent itemset mining algorithm like
apriori to generate frequent item sets that satisfy the support from the
matrix 𝑅𝑈𝐼 transformed from the data set D. (2) Generate rules from
frequent item sets that satisfy the minimum confidence threshold; (3)
Build measures of imp implication variation and use them to filter
strong rules with high degree of implication to meet the requirements
of the recommendation problem; (4) extracting implication
equipotential surface according to the threshold of variation 𝜃 for
recommendation.
11
𝐹ℛ𝐴𝑆𝑆
2.1.4
𝑛𝐴 ≤ 𝑛, 𝑛𝐵 ≤ 𝑛
𝑛𝐵 ≤ 𝑛, max(0, 𝑛𝐴 + 𝑛𝐵 − 𝑛)
(𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ ), |
(≤ 𝑛𝐴𝐵̅ ≤ min(𝑛𝐴 , 𝑛𝐵 ))
support s
s ≥ 𝑚𝑖𝑛𝑠𝑢𝑝, 𝑐 ≥ 𝑚𝑖𝑛𝑐𝑜𝑛𝑓
=
confidence c
𝑆𝐼𝐴𝑚𝑒𝑎𝑠𝑢𝑟𝑒 𝑖𝑚𝑝|𝑙𝑒𝑛𝑔ℎ𝑡ℛ𝐴𝑆𝑆 ≤ 𝑘, | 𝑟ℎ𝑠ℛ𝐴𝑆𝑆 | = 1
𝑖𝑚𝑝 ℜ 𝑆𝐼𝐴𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑,
{
}
(2.4)
Proposed recommender model
The general model of rule mining is depicted in Figure 2.2, whereby
the mining framework FRASS is used as the basis for building a
recommender model based on the implication variation by user and by
item, besides other collaborative filtering recommendation models are
also integrated to evaluate and compare with the recommender model,
following an evaluation procedure as described in the next section.
Figure 2.2 The implication variation-based collaborative filtering model
Evaluation of the proposed model
The recommender model is evaluated according to a proposed
model evaluation procedure as shown in Figure 2.3
12
dataset
Training set
Evaluating measures
Testing set
Evaluation model
Evaluating result
Recommender
Recommender
model
Result of model
algorithms
Figure 2.3 Recommender system model evaluation precedure
In which k-fold cross-evaluation (with k=5) with the number of
repetitions is 2 as the method used, the data is divided into training
and test sets according to the number of transactions in the data set.
The evaluation procedure is depicted in the flowchart in Figure 2.4,
whereby the evaluation measures used include two groups: (1)
predictive accuracy (MAE, MSE and RMSE) and (2) classification
accuracy of recommended items (Precision, recall, and F1).
Hình 2.4 Flowchart of the recommender system evaluation algorithm
2.2
field
Recommender system model based on statistical implication
13
2.2.1 Problems about recommender systems based on statistical
implication analysis
Existing recommender models based on statistical implication
analysis, including association rule mining recommender models
using statistical implication variation, are contributing to enriching
solution studies to improve the efficiency of the collaborative
filtering recommender system. However, they still have some
limitations such as (1) Only processing on binary data, leading to a
problem that needs to be solved is combinatorial explosion and
information loss due to processing non-binary data; (2) For the rule
mining-based models of these works, implication measures are all
proposed in the post-processing stage of the rule mining task, as a
result, they do not contribute significantly to the limitation of the
combinatorial explosion of rule results in large data sets, which
require large processing time and storage space.
To overcome these limitations, the recommender model based on
the statistical implication field is proposed based on the development
and improvement of the recommender model based on association
rule mining using implication variation.
2.2.2 Implication rule and implication rule mining framework
The recommender model based on the statistical implication field
has extended the association rule mining framework into the
implication rule mining framework, including
Modeling the quantitative implication rule
To solve the limitation of association rule mining framework on
non-binary data, the quantitative implication (hereinafter referred to
as implication rule) is built on frequent item sets that satisfy both
reliability and validity and implication variation in rule generation,
this helps to solve problems on non-binary data and effectively
contributes to limiting combinatorial explosion during rule
generation. Like association rule, implication rule is also modeled as
equation (2.5):
14
|
0 ≤ 𝑛𝐴 ≤ 𝑛𝐵 ≤ 𝑛 ,
0 ≤ 𝑛𝐴𝐵̅ ≤ 𝑛𝐵
𝑙𝑒𝑛𝑔ℎ𝑡ℛ𝐼𝑀𝑃 ≤ 𝑘
ℛ𝐼𝑀𝑃 = (𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ )
{
|𝑟ℎ𝑠ℛ𝐼𝑀𝑃 | = 1
(𝑠𝑢𝑝𝑝𝑜𝑟𝑡 ≥ 𝑚𝑖𝑛𝑠𝑢𝑝𝑝,
|
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ≥ 𝑚𝑖𝑛𝑐𝑜𝑛𝑓
𝑆𝐼𝐴 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 𝑖𝑚𝑝 ℜ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑)}
(2.5)
where ℜ determined by the equation (2.6)
" ≤ ",
ℜ={
" ≥ ",
𝜕𝑞(𝑎, 𝑏̅)
imp 𝜖 {
| 𝜉 ∈ (𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ )}
𝜕𝜉
𝜕𝜑(𝑎, 𝑏)
imp 𝜖 {
| 𝜉 ∈ (𝑛, 𝑛𝐴 , 𝑛𝐵 , 𝑛𝐴𝐵̅ )}
𝜕𝜉
(2.6)
Modeling the implication rules mining framework
The implication rule mined by the implication rule mining
framework is developed from the association rule mining framework
as shown in Figure 2.5 and modeled according to the formula (2.7)
Figure 2.5 Flowchart of the implication rule mining framework algorithm
𝐹𝑅 𝐼𝑀𝑃
𝐼𝑅𝑀 𝑎𝑙𝑔𝑜𝑟𝑖𝑡ℎ𝑚𝑠
0 ≤ 𝑛𝐴 ≤ 𝑛𝐵 ≤ 𝑛
𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝑠,
0 ≤ 𝑛𝐴𝐵̅ ≤ 𝑛𝐴
= {(
)|
}
𝑠𝑚𝑖𝑛 ≤ 𝑠,
𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑐,
𝑆𝐼𝐴 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 𝑖𝑚𝑝 𝑐𝑚𝑖𝑛 ≤ 𝑐, 𝑖𝑚𝑝𝑚𝑖𝑛 ℜ 𝑖𝑚𝑝
(2.7)
15
This framework works in the following steps (1) Use the apriori
algorithm to generate frequent item sets that satisfy the support
threshold 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 from the matrix 𝑅𝑈𝐼 transformed from the data
set 𝐷. This step inherits the algorithm (2) Build implication variation
measures imp and integrates into the rule mining framework to
generate implication rules from frequent itemsets that satisfy the
minimum confidence threshold and satisfy implication variation
measures; (3) building and extracting equipotential surface
according to the threshold of variation 𝜃 for recommendation.
2.2.3 Proposed model
The proposed statistical implication field-based recommender
model is shown in Figure 2.6, this model evolves from the
recommender model based on association rule mining using
implication variation through additional developments such as
following (1) implication rule mining framework evolved from
association rule mining framework to generate implication rules from
binary and non-binary data sets; (2) adding a data partitioning approach
to building, training and evaluating recommender models based on the
number of items evaluated per transaction of the dataset to improve
model training and make the model have better results; (3) The
recommender system evaluation algorithm has added a group of
evaluation measures based on the proposed item position rating
(including nDCG and RankScore measures) so that the evaluation
reflects more deeply the effectiveness of the recommender model.
Figure 2.6 Recommender model based on Implication Field
16
2.2.4 Evaluation of the proposed model
The evaluation procedure of the recommender model is still
the same as that of the collaborative filtering recommender model
based on implication variation, also using the k-fold crossevaluation method but with two important additions as follows
(1) In addition to the method of partitioning the observed data into
training and test sets according to the number of transactions in
the data set, the model is also supplemented with a partitioning
method according to the number of evaluation items on each
transaction to solve the issue of " bottleneck" in determining the
number of known items in advance for too sparse data in the
recommender problems, which helps to increase the efficiency of
model training, making the recommendation quality better. (2) A
position-ranking measures of items in the recommendation list of
the model are added to evaluate recommendations' quality,
include nDCG and Rankscore, as shown in the model evaluation
algorithm in Figure 2.7.
Figure 2.7 Flowchart of the recommendation model evaluation algorithm
2.3
Chapter summary
Chapter 3 proposes a new approach based on the implication
variation in the implication field to mine association rules in the
collaborative filtering recommender problem. The first proposal is a
collaborative filtering recommender system model based on
implicative variation to improve the efficiency of rule-based and
17
memory-based collaborative filtering recommender models on binary
data sets. Next, the recommender model based on the implication field
is proposed by upgrading the original proposed model on the nonbinary data set to further improve its performance compared with
existing collaborative filtering recommender models and recommender
models based on statistical implication analysis.
18
CHAPTER 3. EXPERIMENT AND RESULTS
This chapter focuses on presenting the organization and
implementation of evaluation experiments, comparing the models, that
were proposed in Chapter 2, with the memory-based and rule-based
collaborative filtering recommender models combined, in addition, it is
also compared with the previously proposed statistical implication
recommender models.
3.2. Experimental tools
Experiments were performed on implicationfieldRS tools
developed in R language that inherit the RecommenderLab tool
packages for building and evaluating recommender system models and
the Rchic tool package
for processing statistical implication
information.
3.3. Experiment of collaborative filtering recommender model
based on implicative variation
The association rule mining recommender model using implicative
variation is built in two approaches of item-based and user-based and
therefore they are conducted experimentally, compared with the
collaborative filtering models also in the two directions above. The
implicationfieldRS tool was used to conduct the experiment.
3.3.1. Item-based recommender model
The model is tested on the Movielens dataset with a binary threshold
of 3 (that is, movie ratings of 3 or more are assigned 1 and 0 if
different).
The model is evaluated, compared offline with collaborative
filtering recommender models on two groups of evaluation measures:
predictive accuracy evaluation group (MAE, MSE and RMSE); and
recommendative classification accuracy evaluation group (Precision,
recall, and F1) according to the following experimental scenarios.
Scenario 1. Survey and recommendation based on implication
variation equipotential surface.
The model has generated an implication field consisting of a set of
implication equipotential surface of association rules satisfying the
threshold of implication variation, these surfaces have irregular density,
19
high implication density in the equipotential surface of values with
index values imply less variation and more concentration of values in
equipotential surface and less in other equipotential surface. This shows
the agreement of the rule with the trend of the implication index, when
the implication index varies to a certain amount, then the rule is not
accepted at a certain implication threshold it will move to another
equipotential surface with a more suitable implication threshold. And
so, it will help to advise users on the data items with the most
appropriate level of implication. A target user will be recommended the
movie or list of movies that he or she will like according to the
respective rule content based on the previous movies they have seen
based on the rules in the equipotential surfaces.
Scenario 2. Comparison of recommendation item prediction
accuracy with collaborative filtering recommender models.
The experimental results show that the accuracy of predicting the
recommendation item of the recommendation model based on
association rule mining using implicative variation (ISF) has superior
results, the prediction error evaluation indexes of RMSE, MSE and
MAE of the ISF model are the lowest, followed by the user-based
collaborative filtering models including the model using the Cosine
measure (UBCFcosine), using the Pearson measure (UBCFpeason) and
finally the Item-based collaborative filtering models include those
using Consine degrees (IBCFcosine), and using Pearson measures
(IBCFpeason). Thereby, it shows that the measure of implicative
variation and the custom association rule mining framework to satisfy
the implication measure contribute to the association rule mining model
to improve the recommendation results significantly..
Scenario 3. Comparison of classification accuracy with
collaborative filtering recommender models.
The experimental results of the ISF model have superior
classification accuracy of IBCFcosine, IBCFpeason, and UBCFpeason
models and are close to the accuracy of the UBCFcosine model through
the evaluation of precision, recall, and ROC curve.
3.3.2. user-based recommender model
The evaluation is similar to that in User-based association rule
mining recommender model using implicative variation, also
20
performed on the Movielens dataset and on the same scenarios as done
on the mining recommender model. association rule using implication
variation by user. The experimental results obtained on the scenarios
are similar to the experiments on Item-based association rule mining
recommender model using implicative variation.
Through two experiments, Item-base and User-based association
rule mining recommender model using implicative variation, it shows
that the recommendation model has contributed significantly to
improving the collaborative filtering recommender model by
association rule mining model.
3.3. Experiment of recommender model based on statistical
implication field
The recommender model based on the statistical implication field
was experimentally evaluated by the k-fold cross-validation method
(with k = 5) and repeated twice, on the MSWeb binary data set and the
non-binary. Binary Movielens dataset, these datasets are partitioned by
the number of transactions and by the number of items evaluated on
transaction.
3.3.1. Experiment on data partitioned by the number of transactions
Scenario 1. Comparison of association rule model and implication
rule model on binary data set
Compared with the collaborative filtering recommender model
based on association rule model, the experimental results on the
precision classifiers precision, recall, F1 as well as the ROC curve and
recall/precision on the data recommender model based on implication
field is much better.
Scenario 2. Comparison of association rule model and implication
rule model on quantitative dataset
On the quantitative data set, the classification accuracy based on the
Precision, recall, and F1 measures of the IFARRS recommender model
is also much better than the recommender model based on the
association rule mining model.
Scenario 3. Recommender performance and timing
21
This scenario compares the performance and recommendation
generating time (including model building time and recommendation
item prediction) between the recommender model based on the
implication field and the association rule mining model. Experiments
show that the recommender model based on the statistical implication
field has faster model construction and execution time, respectively, by
53% (the time to build the recommender model) and 37% (the time to
execute the recommender model) is based on association rule mining,
while the generated rule set is reduced to about 9% compared to the
rule set generated by the recommender model based on association rule
mining. This meets the requirement of time and better processing rule
set for a recommender system.
Scenario 4. Comparison with collaborative filtering recommender
models on quantitative data set
Comparing according to the classification accuracy criteria, the
statistical implication field-based recommender model gives superior
results compared to the collaborative filtering recommender models
both on the item and on the traditional user using the Cosine and
Pearson similarity measure.
3.3.2. Experiment on partitioned data according to the evaluation
item of the transaction
Analysis of equipotential surfaces in the implication field
The survey results are presented in the form of 3D scatter plots and
3D graphs, representing the common warm (red) equipotential surfaces
in the implication intensity range from 0.8 to 1.0, and the remaining
scattering are equipotential surfaces whose magnitudes are implication
to decrease with decreasing color (blue). The survey results are
presented in a countour graph, whereby the implicative field with
equipotential surfaces has a spectrum of varying values of implicative
intensity concentrated in the range of 0.8 to 1 is represented by the gray
spectrum, and the rest is represented by the green gradient color
spectrum. The implicative intensity variation on equipotential surfaces
is presented in 3D, it is easy to see that the implication patterns with
high implicative intensity concentrated on warm colored equipotential
surface and decrease rapidly in the low intensity region indicated in
22
blue. Common recommendations will be filtered on high-intensity
equipotential surfaces, while recommendations for rare and specific
items will be provided in lower implication equipotential surfaces.
Scenario 1. Comparison with traditional recommendation models
In this experiment scenario, the statistical implication field-based
recommender system model (ISFRS), is compared with the traditional
user-based collaborative filtering recommendation models for both
Cosine measures ( 𝑈𝐵𝐶𝐹_𝑐𝑅𝑆 ) and Pearson ( 𝑈𝐵𝐶𝐹_𝑝𝑠𝑅𝑆 ), and
recommends item-based model collaborative filtering for both Cosine
(𝐼𝐵𝐶𝐹_𝑐𝑅𝑆) and Adjusted Cosine (𝐼𝐵𝐶𝐹_𝑎𝑐𝑅𝑆) measures, The dataset
used in this experiment is Movielens non-binary dataset. For the
collaborative filtering models to have good results, by testing on many
neighboring parameters 𝑘 = 2,5,10,15 and it is found that k = 15 is
better than other values. The recommendation models have been tested
on measures of two groups of measures: classification and rating. First,
the models are tested on the categorical accuracy measures, the results
include the ROC curve, precision /recall, F1, whereby the 𝐼𝑆𝐹𝑅𝑆
model is the best, followed by the User-based collaborative filtering
model collaboration uses both Pearson and Cosine measures, and
finally the weakest model is the item-based collaborative filtering
model (in the case of both Pearson and the adjusted Cosine measures.
The results in this experiment show the contribution of both the
proposed 𝐼𝑆𝐹𝑅𝑆 model and the proposed data partitioning method to
the assessment in improving the classification and rating ability as well
as the mining quality generation of the model compared to the proposed
models based on traditional collaborative filtering.
Scenario 2. Comparison with implication recommendation models
In this experiment scenario, the MSWeb binary dataset is used to
compare the implication field recommender system (ISFRS) model
with two other existing statistical implication analysis application
models including works using the implication index and implication
intensity (𝐼𝐼𝐼𝑅𝑆) and the model using the implicative measure Phicoherence measure - Cohesion- and importance measure -Gamma
(𝑃𝐶𝐺𝑅𝑆) on two types of measures as in scenario 1. First are the
classification accuracy measures including precision/recall, ROC and
F1 curves, the experimental results show the superiority of the 𝐼𝐹𝑆𝑅𝑆
23
recommendation model over the 𝑃𝐶𝐺𝑅𝑆 model and the model. 𝐼𝐼𝐼𝑅𝑆
model, in which the weakest is the 𝐼𝐼𝐼𝑅𝑆 model on all 3 measures. The
second is the rating accuracy measures, the experimental results shown
are also quite similar to the results on the group of accuracy
classification measures, that is, the 𝐼𝑆𝐹𝑅𝑆 model has the best results
rating categories according to the criteria 𝑛𝐷𝐶𝐺 and 𝑅𝑎𝑛𝑘𝑠𝑐𝑜𝑟𝑒
measure, followed by the 𝑃𝐶𝐺𝑅𝑆 model and the worst is the 𝐼𝐼𝐼𝑅𝑆
model.
This suggests that the recommendation system based on the
statistical implication field is potentially better in both classification
and rating than the existing statistical implication recommendation
model. The experiment demonstrated that the proposed 𝐼𝑆𝐹𝑅𝑆 solved
the three problems of these systems. Therefore, it is clear that this is a
new and promising trend in applying statistical implication analysis
theory to the field of recommender systems.
3.4.Chapter summary
Chapter 3 focuses on organizing the implementation of experiments
to evaluate the models proposed in Chapter 2, including the preparation
of data sets, experimental tools, execution of experimental scenarios.
Accordingly, they are experimentalised and compared with memorybased and rule-based collaborative filtering recommender models. In
addition, they are also compared with the existing statistical implicative
analysis approach recommender models. Experimental results show
that the proposed models in the thesis have significantly contributed to
improving the effectiveness of the recommendation system.