Tải bản đầy đủ (.pdf) (128 trang)

Ecommerce graph based recommendation system

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.38 MB, 128 trang )

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY

VO THI KIM NGUYET

ECOMMERCE GRAPH-BASED RECOMMENDATION
SYSTEM

Major: COMPUTER SCIENCE
Major code: 8480101

MASTER’S THESIS

HO CHI MINH CITY, July 2023


THIS THESIS IS COMPLETED AT
HO CHI MINH UNIVERSITY OF TECHNOLOGY – VNU-HCM
Supervisor:
Le Thanh Van, Ph.D
Examiner 1:
Assoc. Prof. Dr. Huynh Tuong Nguyen, Ph.D
Examiner 2:
Ton Long Phuoc, Ph.D
This master’s thesis is defended at Ho Chi Minh City University of
Technology (HCMUT) – VNU-HCM on 11th July 2023
Master’s Thesis Committee:
1. Assoc. Prof. Dr. Tran Ngoc Thinh, Ph.D

Chairman


2. Assoc. Prof. Dr. Huynh Tuong Nguyen, Ph.D

Examiner 1

3. Ton Long Phuoc, Ph.D

Examiner 2

4. Le Thanh Van, Ph.D

Commissioner

5. Nguyen Tien Thinh, Ph.D

Secretary

Approval of the Chairman of the Master’s Thesis Committee and Dean of
Faculty of Computer Science and Engineering after the thesis being corrected
(If any).
CHAIRMAN OF THESIS COMMITTEE

DEAN OF FACULTY OF
COMPUTER SCIENCE AND ENGINEERING


i

VIETNAM NATIONAL UNIVERSITY - HO CHI MINH CITY
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY


SOCIALIST REPUBLIC OF VIETNAM
Independence – Freedom - Happiness

THE TASK SHEET OF MASTER’S THESIS
Full name: Vo Thi Kim Nguyet

Student code: 2270346

Date of birth: Oct 10th 1995

Place of birth: Ho Chi Minh City

Major: Computer Science

Major code: 8480101

I.

THESIS TITLE: E-commerce graph-based recommendation system

(Hệ thống gợi ý dựa trên phương pháp đồ thị trong thương mại điện tử)
II. TASKS AND CONTENTS:
1. Introduction:
• Introduce the research topic and its significance.
• Provide an overview of the structure of the thesis.
2. Literature Review:
• Conduct a comprehensive review of existing product recommendation
techniques.
• Analyze strengths and weaknesses of different approaches.
• Identify gaps in the literature that the research aims to address.

3. Problem Statement:
• Clearly state the problem being addressed in the research.
• Highlight the need for improved recommendation approaches in the
context of e-commerce.
4. Methodology:
• Design the research approach for developing and evaluating the
recommendation system.
• Define the criteria and metrics for evaluating the effectiveness of the
system.
5. Graph-Based Recommendation System Implementation:
• Develop the recommendation system using graph embedding techniques.
• Implement graph construction methods based on user behavior data.
• Incorporate Node2Vec and FAISS for graph embedding and indexing.
6. Experimental Evaluation:
• Conduct experiments to evaluate the performance of the developed system.
• Compare the results with other existing recommendation models.
• Collect and analyze data on key evaluation metrics.
7. Discussion of Findings:
• Analyze and interpret the results of the experimental evaluation.
• Discuss the implications of the findings in relation to the research
objectives.


ii

8. Conclusion:
• Summarize the key findings and contributions of the research.
• Discuss the practical implications of the research outcomes.
9. Future Research Directions:
• Suggest avenues for further research and improvements in the

recommendation system.
• Highlight areas where the proposed approach could be extended or refined.
10. References:
• List all the sources and references cited throughout the thesis.
11. Appendices:
• Include any supplementary material, code snippets, graphs, or diagrams
that enhance understanding.
III. THESIS START DAY: Feb-06-2023
IV. THESIS COMPLETION DAY: Jun-09-2023
V.

SUPERVISOR: Le Thanh Van, Ph.D
Ho Chi Minh City, Jun-09-2023
SUPERVISOR
(Full name and signature)

CHAIR OF PROGRAM COMMITTEE
(Full name and signature)

DEAN OF FACULTY OF COMPUTER SCIENCE AND ENGINEERING
(Full name and signature)

Note: Student must pin this task sheet as the first page of the Master’s Thesis
booklet


iii

ACKNOWLEDGEMENT
This thesis marks the culmination of my research journey into graph-based

modeling and its applications in data analysis and machine learning. Graphs offer a
unique perspective to understand complex relationships within vast datasets.
Throughout this work, I explore fundamental concepts of graph-based modeling,
delve into graph embedding techniques, and evaluate their efficacy in solving realworld problems. I extend my gratitude to my advisors, mentors, colleagues, and
family for their unwavering support and encouragement. My hope is that this thesis
inspires further research and innovative applications of graph-based models in
various domains. Thank you for joining me on this journey.
Sincerely,
Nguyet Vo

Ho Chi Minh City, June 2023


iv

ABSTRACT
This thesis presents a graph-based recommendation system tailored for
personalized content suggestions in ecommerce. Utilizing graph embedding methods
such as DeepWalk and Node2Vec as part of Random Walks technique, the system
captures users’ behavioural sequences and generates embeddings for items. These
embeddings facilitate pairwise similarity calculations among items, forming the basis
for content recommendations rooted in similarity metrics. To tackle challenges like
sparsity and cold start, additional information is seamlessly integrated into the graph
embedding framework. Empirical evaluation using clickstream data demonstrates the
superiority of the proposed approach over traditional collaborative filtering
techniques in terms of both accuracy and efficiency. The study contributes a novel
graph-based recommendation system addressing scalability, sparsity, and cold start
issues, further enriched by the incorporation of supplementary data to enhance
recommendation system efficacy. The results suggest that graph-based techniques
hold potential for enhancing personalized recommendation systems across diverse

domains, including ecommerce.


v

TÓM TẮT LUẬN VĂN THẠC SĨ
Luận văn đề xuất một hệ thống gợi ý dựa trên đồ thị cho việc cá nhân hóa gợi
ý nội dung trong thương mại điện tử. Dự án sử dụng các kỹ thuật đồ thị như DeepWalk
và Node2Vec của kỹ thuật Random Walks để nắm bắt chuỗi hành vi của người dùng
và đề xuất các danh sách sản phẩm phù hợp. Những kỹ thuật này giúp tính tốn độ
xác suất giữa các cặp/ danh sách sản phẩm, từ đó tạo nền tảng cho gợi ý cho người
dùng dựa trên hành vi của họ. Để giải quyết các thách thức như người dùng/ sản phẩm
mới và khả năng mở rộng hoặc sự thưa thớt, thuật toán sẽ được bổ sung và tích hợp
hệ thống nhằm đưa ra những gợi ý thông minh, phù hợp với sở thích của từng khách
hàng. Đánh giá thực nghiệm bằng dữ liệu lớn của hành vi khách hàng cho thấy
phương pháp đề xuất vượt trội so với các phương pháp lọc cộng tác truyền thống về
cả độ chính xác và hiệu suất. Nghiên cứu đóng góp một hệ thống gợi ý dựa trên đồ
thị nhằm giúp khách hàng nhanh chóng định vị được những sản phẩm họ quan tâm
để từ đó đưa ra quyết định đúng đắn khi mua sắm online cũng như khả năng cải thiện
hiệu suất của hệ thống gợi ý cá nhân hoá.


vi

DECLARATION OF AUTHORSHIP
I hereby declare that this thesis was carried out by myself under the
guidance and supervision of Le Thanh Van, Ph.D; and that the work contained
and the results in it are true by author and have not violated research ethics. The
data and figures presented in this thesis are for analysis, comments, and
evaluations from various resources by my own work and have been duly

acknowledged in the reference part.
In addition, other comments, reviews and data used by other authors, and
organizations have been acknowledged, and explicitly cited.
I will take full responsibility for any fraud detected in my thesis. Ho Chi
Minh City University of Technology (HCMUT) – VNU-HCM is unrelated to any
copyright infringement caused on my work (if any).
Ho Chi Minh City, June 2023
Author

Vo Thi Kim Nguyet


vii

TABLE OF CONTENTS
LIST OF FIGURES ................................................................................................ ix
LIST OF TABLES .................................................................................................. xi
CHAPTER 1: INTRODUCTION ............................................................................1
1.1.

Background on recommendation systems and the importance of

personalization ......................................................................................................... 1
1.2.

Research Questions ...................................................................................... 1

1.3.

Methodology ................................................................................................ 3


1.4.

Contributions ................................................................................................ 6

1.5.

Scope of the research .................................................................................... 7

1.6.

Implications .................................................................................................. 7

1.7.

Novelty of the topic ...................................................................................... 7

1.8.

Outline .......................................................................................................... 7

CHAPTER 2: OVERVIEW OF RECOMMENDATION SYSTEM ...................9
2.1.

Recommendation System methods .............................................................. 9

2.2.

Research Problem ....................................................................................... 14


2.3.

Overview of existing literature on graph-based recommendation systems 16

2.4.

Graph-based learning Approaches for Recommender System (RS) .......... 17

2.5.

Research results in application of Graph-based learning in Recommender

System ................................................................................................................... 18
2.6.

Comparison of different graph-based algorithms....................................... 23

2.7.

The advantages of using UMAP and FAISS in combination with Deep

Walk and Node2Vec.............................................................................................. 27
2.8.

Session-based Recommendation System ................................................... 31

2.9.

Summary .................................................................................................... 33


CHAPTER 3: IMPLEMENTATION ....................................................................34
3.1.

Proposed Methodologies ............................................................................ 34

3.2.

Data collection and its characteristics ........................................................ 36


viii

3.3.

Data cleaning and preparation .................................................................... 39

3.4.

Explanation of how the data was transformed into a graph-based

representation ........................................................................................................ 53
3.5.

Random Walks algorithm ........................................................................... 60

3.6.

Visualization with UMAP .......................................................................... 63

3.7.


Embedding Vector Search with FAISS ...................................................... 65

3.8.

Evaluation of the Recommendation System .............................................. 67

CHAPTER 4: EXPERIMENTAL RESULTS ......................................................73
4.1. All Machine Learning (ML) Models .................................................................73
4.2. Association Rules ...............................................................................................80
4.3. Traditional Recommendation Techniques .........................................................83
4.4. Sequence Models for Session-Level Data .........................................................86
CHAPTER 5: DISCUSSION AND CONCLUSION............................................95
REFERENCES ........................................................................................................98
APPENDIX ............................................................................................................102


ix

LIST OF FIGURES
Figure 1.1. Example of PDP views in a session..........................................................4
Figure 1.2. Project Flow ..............................................................................................8
Figure 2.1. Taxonomy of Recommendation System.................................................10
Figure 2.2. The demonstration of graph learning based recommender systems .......22
Figure 2.3. BFS and DFS search strategies from node 𝑢(𝑘 = 3).............................28
Figure 2.4. Illustration of the random walk procedure in node2vec. The walk just
transitioned from 𝑡 to 𝑣 and is now evaluating its next step out of node 𝑣. Edge
labels indicate search biases 𝛼 ..................................................................................30
Figure 2.5. NARM Architecture ...............................................................................32
Figure 3.1. Overview of graph embedding in Taobao: (a) Users’ behavior

sequences: One session for user 𝑢1, two sessions for user 𝑢2 and 𝑢3; these
sequences are used to construct the item graph; (b) The weighted directed item
graph 𝒢 = (𝒱, ℰ); (c) The sequences generated by random walk in the item graph;
(d) Embedding with Skip-Gram ................................................................................36
Figure 3.2. Dataset Overview....................................................................................37
Figure 3.3. Attributes of raw dataset .........................................................................38
Figure 3.4. Daily Visits summary .............................................................................41
Figure 3.5. Brands summary .....................................................................................43
Figure 3.6. Categories summary ...............................................................................45
Figure 3.7. Customers summary ...............................................................................49
Figure 3.8. Flow of a user in a session ......................................................................54
Figure 3.9. Example for view of all products of a user in a session .........................55
Figure 3.10. Directed Graph......................................................................................56
Figure 3.11. Directed Graph with Weight .................................................................57
Figure 3.12. Undirected Graph with Weight.............................................................58
Figure 3.13. Middle and Remaining proportion .......................................................59
Figure 3.14. Final results of embedding vectors .......................................................61
Figure 3.15. Similarity scores ...................................................................................61


x

Figure 3.16. Node2Vec Embeddings Visualization ..................................................63
Figure 3.17. New dataset is generated from embedding vectors ..............................63
Figure 3.18. Category-code in level 1 visualization .................................................64
Figure 3.19. Cosine Similarity and L2 scores ...........................................................66
Figure 3.20. An imbalance in user interactions ........................................................70
Figure 3.21. List of results for a specific user ...........................................................71
Figure 4.1. Correlation among categories .................................................................74
Figure 4.2. Each chunk in Association Rules algorithm ...........................................81

Figure 4.3. Result in Association Rule algorithm .....................................................82
Figure 4.4. Traditional Recomendation System Models Results ..............................84
Figure 4.5. Model and Result in NARM model ........................................................88
Figure 4.6. Co-occurrence matrix of items based on adjacency of items in same
session .......................................................................................................................89
Figure 4.7. Item & User Similarity matrix ................................................................89
Figure 4.8. Heterogenous Global Graph ...................................................................90
Figure 4.9. Model Training and Evaluation ..............................................................91
Figure 4.10. Results in HG-GNN model...................................................................92


xi

LIST OF TABLES
Table 2.1. Comparison of different graph-based algorithms ....................................23
Table 3.1. Top-N Metrics Result...............................................................................68
Table 4.1. Comparison among dimensional reduction techniques ...........................75
Table 4.2. Comparisons among ML models and Graph-based approach .................79
Table 4.3. Comparisons among traditional and graph-based models .......................85
Table 4.4. Comparison among metrics in HG-GNN model .....................................92
Table 4.5. Comparison between Sequence Models and Graph-based model ...........94


1

CHAPTER 1: INTRODUCTION
This chapter aims to enhance personalized recommendation systems in
ecommerce by exploring advanced algorithms and conducting A/B testing for realworld insights. Using a large-scale dataset, we propose a state-of-the-art algorithm to
improve accuracy and efficiency. Our goal is to bridge theory and practice,
revolutionizing recommender systems and maximizing user satisfaction in

ecommerce. Leveraging a large-scale dataset from REES46 [1], we aim to
revolutionize recommender systems, maximizing user satisfaction and platform
diversity.
1.1.

Background on recommendation systems and the importance of
personalization

This thesis focuses on recommender systems in ecommerce, where
personalized content recommendation is crucial due to the vast number of
products available on numerous websites. Traditional recommender systems like
Content-Based Filtering and Collaborative Filtering encounter issues with
scalability, sparsity, and cold start problems, reducing their effectiveness in
handling large-scale and sparse transaction records. To address these challenges,
the project proposes the use of graph embedding techniques, specifically Random
Walks, to capture users’ behavioral sequences and generate item embeddings.
These embeddings can then be utilized to recommend products to users, group
similar products, and classify transactions based on meta-information about
clusters, items, and users’ transaction use cases. The approach also employs
Facebook AI Similarity Search (FAISS) for generating recommendations through
embedding vector search. By leveraging graph-based learning, this thesis offers
insights into enhancing recommender systems in ecommerce platforms while
tackling the limitations faced by traditional methods.
1.2.

Research Questions

This thesis aims to address the effectiveness of graph-based techniques,
namely Random Walks, and FAISS, in capturing users’ behaviour sequences and



2

generating item embeddings for improved product recommendations in
ecommerce, compared to traditional methods. To investigate these research
questions thoroughly, the following methodology will be employed:
a. Data Collection: To understand users’ behaviour and preferences, data
will be collected from an ecommerce platform through user
interactions, transaction records, and item metadata.
b. Graph-Based Recommendation System: The proposed graph-based
recommendation system, incorporating Random Walk, and FAISS, will
be implemented and fine-tuned to ensure its applicability to real-world
data.
c. Comparison with traditional models: Traditional models will be
developed and configured as a baseline for comparison against the
graph-based approach.
d. Data Verification: To validate the results and ensure robustness, the
system will be tested on a large dataset with diverse user interactions
and product attributes.
e. Performance Metrics: The accuracy and efficiency of the graph-based
and collaborative filtering systems will be evaluated using standard
recommendation metrics like Mean Average Precision (MAP), Recall,
Precision, and Normalized Discounted Cumulative Gain (NDCG).
f. User Survey: User feedback will be collected through surveys to gauge
user satisfaction and preference for recommendations generated by
each approach.
g. A/B Testing: A randomized control trial (A/B testing) will be
conducted to compare the user engagement and conversion rates
between the two recommendation systems.
By employing the above methodology, this thesis aims to shed light on the

following main questions:


3

-

How to survey? Instead of relying solely on user surveys, we will implement
an A/B test and divide customer groups to assess the Click-Through Rate
(CTR) for different recommendation strategies. This way, we can gauge the
effectiveness of the new algorithm in real-world scenarios and gather valuable
insights into its performance.

-

What algorithm and solution and how to ensure implementation on real
data? The graph-based recommendation system, incorporating Random
Walks, and FAISS, will be implemented using appropriate libraries and
frameworks. Rigorous testing on real-world data will ensure its suitability and
effectiveness.

-

How to verify the results? The results will be verified through comprehensive
evaluation using standard recommendation metrics, comparing the graphbased approach against the traditional approach. Additionally, A/B testing will
provide valuable insights into user engagement and conversion rates.

By addressing these questions, this thesis endeavors to contribute to the
advancement of personalized recommendation systems in ecommerce platforms,
providing valuable insights into the potential benefits of graph-based techniques for

more accurate and efficient product recommendations.
1.3.

Methodology

The main objective of this project is to build a graph-based learning
recommendation system for the eCommerce platform, targeting end-users. The
system will achieve two key objectives:
-

Recommend similar products to users based on their preferences and browsing
behavior.

-

Provide personalized recommendations by leveraging users’ behaviour history
and search behaviour to recommend the best product for each user.

Additionally, the project aims to address the challenge of converting clickstream
data into various types of graphs over different event types, considering the unique


4

characteristics and requirements of the eCommerce platform. Three types of cooccurrence graphs will be constructed:
-

Co-occurrence graph over Product Page (PDP) views: This type of graph is
constructed by creating a node for each product page that is viewed by a user
in a single session. An edge is then created between two nodes if the two

product pages are viewed together.

-

Co-occurrence graph over products that are added to cart together: This type
of graph is constructed by creating a node for each product that is added to the
cart by a user. An edge is then created between two nodes if the two products
are added to the cart together.

-

Co-occurrence graph over products that are bought together: This type of
graph is constructed by creating a node for each product that is purchased by
a user. An edge is then created between two nodes if the two products are
purchased together.

Figure 1.1. Example of PDP views in a session
Each type of graph provides a different representation of user interactions with
products and serves as the basis for generating relevant recommendations. The
methodology of this project involves using an undirected graph for a recommendation
system based on clickstream data. Unlike directed graphs, which capture one-way
relationships, undirected graphs represent bidirectional relationships between items
and users. Recommendations are made based on node similarity, calculated using
measures like cosine similarity or Jaccard similarity.


5

The project follows several steps to achieve its objectives:
-


Graph Construction and Transformation: Clickstream data is pre-processed
and transformed into three types of co-occurrence graphs, representing
different product relationships based on user actions.

-

Graph Embedding Techniques: To handle the large graph size efficiently, the
Random Walks algorithm, combining DeepWalk and Node2Vec techniques,
is applied. DeepWalk captures user behavior sequences and generates
embeddings, while Node2Vec captures local and global structural information
to enhance the embeddings. These techniques convert the graphs into
meaningful low-dimensional embeddings, which capture similarities and
relationships between items.

-

UMAP for Analysis and Visualization: UMAP (Uniform Manifold
Approximation and Projection) is utilized for cluster visualization, projecting
the embeddings into 2D or 3D space. This aids in understanding spatial
distribution and relationships among data clusters.

-

FAISS for Efficient Similarity Search: FAISS (Facebook AI Similarity
Search) is used for efficient search of nearest neighbors in the item-item
recommendation process. It indexes and organizes the embeddings, enabling
quick and accurate retrieval of similar items.

-


Performance Evaluation: The recommendation system's performance is
evaluated using rank-aware top-N metrics, such as Precision at N, Recall at N,
Mean Average Precision (MAP), and Normalized Discounted Cumulative
Gain (NDCG). These metrics comprehensively assess the system’s accuracy,
relevance, diversity, and ranking position of the recommendations.
+ Precision at N: This metric measures the proportion of relevant items among
the top-N recommended items. It evaluates the accuracy of the
recommendations by considering how many of the recommended items are
actually relevant to the user.


6

+ Recall at N: This metric calculates the proportion of relevant items that were
included in the top-N recommendations. It assesses the system’s ability to
retrieve all relevant items, regardless of their ranking position.
+ Mean Average Precision (MAP): MAP measures the average precision of
the recommendations across different users. It considers both the relevance of
the recommended items and their ranking positions, providing a
comprehensive evaluation of the system’s performance.
+ Normalized Discounted Cumulative Gain (NDCG): NDCG takes into
account the relevance of the recommended items and their ranking positions.
It assigns higher weights to relevant items that are ranked higher in the
recommendation list, emphasizing the importance of accurate ranking.
By utilizing Random Walks for embedding generation, UMAP for analysis, and
FAISS for similarity search, the project aims to develop a graph-based
recommendation system that provides accurate, interpretable, and efficient
recommendations for the eCommerce platform, enhancing the user experience and
driving business success.

This case study focuses on enhancing personalized recommendation systems in
ecommerce, with the primary goals of improving user experience, platform diversity,
and long-tail product discovery, utilizing free eCommerce behavior data from
REES46 [1]. The project aims to develop a graph-based recommendation system to
optimize personalized content recommendations in ecommerce platforms.
1.4.

Contributions

This

research

makes

several

contributions

to

personalized

content

recommendation in e-commerce platforms. The main contribution is the proposal of
a graph-based recommendation system that addresses scalability, sparsity, and cold
start problems faced by traditional recommender systems. By constructing an item
graph and incorporating side information into the graph embedding framework, the
system generates informative and effective item embeddings. Experimental results



7

demonstrate that the proposed system outperforms traditional methods in terms of
accuracy and efficiency.
Further Enhancements: To enhance the system’s effectiveness, UMAP is used as
a dimensionality reduction strategy to visualize clusters in the embedding space,
aiding the recommendation process. Additionally, FAISS is employed for generating
recommendations based on similarity measures, enabling fast and efficient nearest
neighbour search on large-scale datasets for real-time recommendation systems.
1.5.

Scope of the research

The thesis focuses on applying graph embedding techniques (DeepWalk and
Node2Vec) for generating embeddings from category attributes to improve
recommendation system performance. It explores evaluation metrics, implications,
advantages, and limitations of the proposed system, along with practical applications
and future research possibilities.
1.6.

Implications

The findings of this research hold scientific and practical significance.
Scientifically, it contributes to the field of recommendation systems by advancing our
understanding of graph-based methods and their application in product
recommendation. Practically, the developed recommendation system can be
deployed in e-commerce platforms to provide personalized and relevant
recommendations, enhancing user satisfaction and engagement.

1.7.

Novelty of the topic

The topic of graph embedding techniques in product recommendation is relatively
novel and promising. The research stands out by combining DeepWalk and
Node2Vec algorithms, incorporating user feedback and contextual information, and
providing nearest neighbour analysis through FAISS. The comprehensive evaluation
framework, qualitative and quantitative measures, and visualization techniques
contribute to the uniqueness of the approach.
1.8.

Outline


8

Here is a flow to represent the approach for completing thesis outlining the steps
can be a helpful visual aid:

Figure 1.2. Project Flow
The project involves data collection and preparation, constructing a graph
representation, and applying graph-based algorithms like DeepWalk and Node2Vec
for embeddings. The model utilizes vector embeddings and FAISS indexing for
efficient similarity search. Evaluation metrics are used to assess the recommendation
system’s performance with suitable metrics like Precision@k, Recall@k, MRR@k,
and nDCG@k, experiments analyze its impact and scalability, as well as comparing
it to existing approaches. The final chapter summarizes key findings, discusses
strengths and limitations, and suggests future research directions.
Then, the rest of this thesis is structured as follows. Chapter 2 provides a literature

review of recommender systems and graph-based learning techniques. Chapter 3
describes the methodology used in this research, including the pre-processing of
clickstream data, the construction of the item graph, and the application of the
DeepWalk and Node2Vec technique to generate item embeddings. Chapter 4 presents
the experimental results, including the evaluation of the proposed system using
clickstream data and the comparison with traditional methods. Chapter 5 discusses
the findings and their implications for the field of personalized content
recommendation in ecommerce platforms and summarizes the main contributions of
this research and provides recommendations for future research.


9

CHAPTER 2: OVERVIEW OF RECOMMENDATION SYSTEM
The chapter focuses on the significance of recommendation systems in the
rapidly growing e-commerce industry. These systems have become essential in
providing personalized suggestions to users, aiding in product discovery, and
improving

user

experiences.

The

literature

review

delves


into

various

recommendation system methods explored by machine learning and data mining
experts, considering the challenges posed by the vast number of users and items in
the online economy. The review aims to highlight the successes and limitations of
these

methods,

offering

insights

into

their

effectiveness

in

enhancing

recommendation accuracy, user engagement, and platform diversity. This knowledge
will serve as the basis for developing an advanced graph-based learning
recommendation system tailored to the specific challenges of e-commerce. The
ultimate goal is to boost user satisfaction, business success, and contribute to the

ongoing advancements of personalized recommendation systems in the ever-evolving
e-commerce landscape.
2.1.

Recommendation System methods
The literature review focuses on the significance of recommendation systems

in the e-commerce industry, particularly on Alibaba’s platform Taobao, China’s
largest online consumer-to-consumer (C2C) platform, stands out with a profitable
share of 1/75th of Alibaba’s total e-commerce traffic [2]. Researchers have explored
machine learning and data mining strategies to address the challenges of
implementing effective recommender systems in the vast online economy. The
review aims to examine previous research on recommendation methods, their
successes, and limitations, with the goal of developing an advanced graph-based
learning recommendation system for e-commerce. This approach falls under the
“Collaborative Filtering” branch, which utilizes user-item interactions to make
recommendations. In this case, the graph-based technique employs Random Walks
algorithm to create a recommendation graph, and FAISS is used for efficient nearest
neighbor search within this graph. The method leverages user interactions


10

collaboratively to find relevant items based on similarities with other users or items
in the graph. The ultimate objective is to enhance user satisfaction and business
success in the dynamic world of e-commerce.

Figure 2.1. Taxonomy of Recommendation System
Based on the taxonomy of recommender systems in above figure [25], we can
see a general overview of the classifications of recommender system models and their

characteristics:
a) Collaborative Filtering:
-

Characteristics: Collaborative filtering recommends items to users based on
the preferences and behaviors of similar users. It does not require item
attributes or domain knowledge and can handle large datasets. And it is
struggle in sparsity and sclability.

-

Types:


11

o User-Based Collaborative Filtering: Recommends items based on the
preferences of users who are similar to the target user.
o Item-Based Collaborative Filtering: Recommends items based on the
preferences of users who have shown interest in similar items.
b) Content-Based Filtering:
-

Characteristics: Content-based filtering recommends items to users based on
the attributes and features of the items and the user’s past preferences. It
requires

item

attributes


and

domain

knowledge

for

effective

recommendations. And it is usually struggle in cold start problem.
-

Types:
o Profile-Based: Creates user profiles based on their historical
preferences and recommends items with similar attributes to those in
the user profile.
o Item-Based: Recommends items similar to the ones the user has shown
interest in, based on shared attributes.

c) Hybrid Recommender Systems:
-

Characteristics:

Hybrid

systems


combine

multiple

recommendation

approaches to improve recommendation accuracy and overcome limitations of
individual methods.
-

Types:
o Weighted

Hybrid:

Assigns

different

weights

to

individual

recommendation techniques and combines their results.
o Switching Hybrid: Switches between different recommendation
techniques based on user preferences or the availability of data.
o Feature


Combination:

Combines

features

recommendation methods to create a unified model.
d) Knowledge-Based Recommender Systems:

from

different


12

-

Characteristics: Knowledge-based systems use domain-specific knowledge
and user preferences to generate recommendations. They are effective for
scenarios with limited user data.

-

Types:
o Rule-Based Systems: Use predefined rules and user preferences to
make recommendations.
o Case-Based Reasoning: Recommends items based on similarities to
past cases where users expressed preferences.


e) Matrix Factorization:
-

Characteristics: Matrix factorization methods aim to factorize the user-item
interaction matrix into latent feature matrices to capture underlying patterns
and make predictions.

-

Types:
o Singular Value Decomposition (SVD): Traditional matrix factorization
technique that reduces the dimensionality of the user-item matrix.
o Alternating Least Squares (ALS): An optimization-based matrix
factorization method commonly used in collaborative filtering.
Here are some research papers related to recommender systems based on the

information provided in the previous sections:
“Cold Start Items paper with an approach A Hybrid Recommendation Model”
by Wei et al. [10] addressed the Complete Cold Start (CCS) and Incomplete Cold
Start (ICS) problems by combining Collaborative Filtering (CF) and Deep Learning
Neural Network. Their model, SADE, incorporated content features of items to
predict ratings for cold start items. The study demonstrated the effectiveness of their
approach using Netflix dataset.
Kupisz and Unold [11] developed a Collaborative Filtering (CF)
recommendation system using Hadoop and Apache Spark. They utilized the


×