Doctoral Dissertation
Graph-based Modeling and Query Optimization
for Heterogeneous IoT Data
Department of Electronics and Computer Engineering
Graduate School, Chonnam National University
Van-Quyet Nguyen
August 2019
Contents
List of Figures
vi
List of Tables
viii
Acknowledgements
ix
Abstract
x
1 Introduction
1
1.1
Motivation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Research Problems and Objectives . . . . . . . . . . . . . . . . . . . . .
3
1.3
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.4
Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.5
Dissertation Publications . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2 Background
2.1
10
IoT Data Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.1.1
Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.1.2
Highly Connected Data . . . . . . . . . . . . . . . . . . . . . . .
11
i
2.2
2.3
2.4
2.5
2.1.3
Dynamic Changes . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.1.4
Massive Real-time Data . . . . . . . . . . . . . . . . . . . . . . .
12
IoT Architecture Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.2.1
Things Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.2.2
Communication Layer . . . . . . . . . . . . . . . . . . . . . . . .
13
2.2.3
Data Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.2.4
Application Layer . . . . . . . . . . . . . . . . . . . . . . . . . .
14
IoT Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.3.1
Collecting IoT Data . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.3.2
Storing IoT Data . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.3.3
Analyzing IoT Data . . . . . . . . . . . . . . . . . . . . . . . . .
16
Graph Querying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.4.1
Reachability Queries . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.4.2
Regular Path Queries . . . . . . . . . . . . . . . . . . . . . . . .
17
2.4.3
Shortest Path Queries . . . . . . . . . . . . . . . . . . . . . . . .
18
Comparison of Relational Databases and Graph Databases for Heterogeneous IoT Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.5.1
Comparison of Key Features
. . . . . . . . . . . . . . . . . . . .
18
2.5.2
Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . .
20
3 Graph-based Modeling for Heterogeneous IoT Data
3.1
25
A Graph-based View on IoT Data . . . . . . . . . . . . . . . . . . . . .
25
3.1.1
26
Things Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
3.1.2
Spatial Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.1.3
Social Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
3.2
Definition of Graph Models . . . . . . . . . . . . . . . . . . . . . . . . .
27
3.3
IoT Graph Data Modeling . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.3.1
Definition of Graph Data Modeling . . . . . . . . . . . . . . . . .
32
3.3.2
Graph-based Modeling for IoT Data . . . . . . . . . . . . . . . .
32
3.4
Analysis of Heterogeneous IoT Graph Data . . . . . . . . . . . . . . . .
34
3.5
Enriching Graph Models with Location Trustiness . . . . . . . . . . . .
36
3.5.1
Calculating Trustiness of Location based on Sensor Data . . . . .
36
3.5.2
Weighted Graph Models using Trustiness of Location: A Case
Study on Resilient Network Provisioning . . . . . . . . . . . . . .
40
4 An Optimization Technique for Regular Path Queries on Large Graphs 43
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.3
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
4.3.1
Definition and Categorization of Regular Path Queries . . . . . .
49
4.3.2
Evaluating RPQs . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
4.4
Unit-Subquery Cost Matrix (USCM) . . . . . . . . . . . . . . . . . . . .
53
4.5
Estimating the Searching Cost of RPQs with USCM . . . . . . . . . . .
54
4.5.1
Estimating the Searching Cost of a Concatenation RPQ . . . . .
55
4.5.2
Estimating the Searching Cost of an Alternation RPQ . . . . . .
56
4.5.3
Estimating the Searching Cost of a Kleene Star RPQ . . . . . . .
58
iii
4.5.4
4.6
4.7
4.8
4.9
Estimating the Searching Cost of a Highly Complex RPQ . . . .
60
Estimating Result Size of RPQs with USCM . . . . . . . . . . . . . . .
61
4.6.1
Estimating Result Size of a Concatenation RPQ . . . . . . . . .
62
4.6.2
Estimating Result Size of an Alternation RPQ . . . . . . . . . .
62
4.6.3
Estimating Result Size of a Kleene Star RPQ . . . . . . . . . . .
64
4.6.4
Estimating Result Size of a Highly Complex RPQ . . . . . . . .
64
Efficient Parallel Evaluation of RPQs using Estimated Cost . . . . . . .
65
4.7.1
Estimating Parallel Evaluation Cost . . . . . . . . . . . . . . . .
66
4.7.2
Parallel Evaluation of RPQs based on Minimum Estimated Evaluation Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
4.8.1
Evaluation Settings . . . . . . . . . . . . . . . . . . . . . . . . . .
69
4.8.2
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . .
71
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
5 A Scalable Approach for Shortest Path Queries on Large Dynamic
Graphs
79
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
5.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
5.3
Emergency Evacuation System for Large Smart Buildings . . . . . . . .
84
5.3.1
Overview of System Architecture . . . . . . . . . . . . . . . . . .
84
5.3.2
Smart Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
5.3.3
Smart Guidance Agents . . . . . . . . . . . . . . . . . . . . . . .
85
iv
5.3.4
Global Coordinator . . . . . . . . . . . . . . . . . . . . . . . . . .
86
5.4
LCDT-based Weighted Graph Model for Providing Situation Awareness
87
5.5
A Distributed Approach for Shortest Path Queries in Evacuation Routing 88
5.6
Caching Strategy for Dynamic Evacuation Routes
. . . . . . . . . . . .
91
5.6.1
Observation that Motivates Caching . . . . . . . . . . . . . . . .
91
5.6.2
A Caching Strategy . . . . . . . . . . . . . . . . . . . . . . . . .
92
5.6.3
Updating Evacuation Routes using Caches . . . . . . . . . . . . .
94
5.7
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
5.8
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6 Conclusion
102
6.1
Summary of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.2.1
Predicting Congestion in Large Smart Buildings for Emergency
Evacuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.2.2
A Framework for Multiple-query Optimization of RPQs . . . . . 104
References
105
Abstract in Korean
118
Appendices
119
v
List of Figures
2.1
A general IoT architecture stack . . . . . . . . . . . . . . . . . . . . . .
2.2
Performance comparison between relational database and graph database
on Sakila dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
12
22
Performance comparison between relational database and graph database
on Gnutella dataset
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
3.1
A conceptual view of IoT data . . . . . . . . . . . . . . . . . . . . . . .
26
3.2
Weighted Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.3
Node-Labeled Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.4
Edge-Labeled Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.5
Property Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.6
The format of nodes and edges in the property graph . . . . . . . . . . .
33
3.7
An example illustrates the relationship between temperature and trust
value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.8
A model of sensors distribution on the geo-mapping matrix . . . . . . .
39
3.9
Ordinary setting of primary/backup paths of a flow . . . . . . . . . . . .
41
3.10 Scenario of large scale disaster
. . . . . . . . . . . . . . . . . . . . . . .
vi
41
4.1
Example of query automaton with different types of RPQ . . . . . . . .
51
4.2
Example of a directed graph representing a social shopping network. . .
52
4.3
A tree representing all possible paths satisfying an RPQ with a Kleene
Star operator at ai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
4.4
A tree representing possible paths satisfying a complex RPQ. . . . . . .
65
4.5
Response time comparison of parallel RPQs evaluation on different graphs. 72
4.6
Comparing the evaluation cost between USCM-based and AUT, TRL
approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7
73
Response time comparison for parallel RPQs evaluation with varied
graph sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
4.8
Comparison of response time with varied number of subqueries. . . . . .
74
4.9
Accuracy evaluation with different parameters of graph. . . . . . . . . .
76
4.10 Accuracy evaluation with different parameters of query. . . . . . . . . .
77
5.1
Overview of system architecture. . . . . . . . . . . . . . . . . . . . . . .
85
5.2
Comparison of the effectiveness among evacuation methods with scenario
disaster event happened at normal regions in Donald Bren Hall . . . . .
5.3
Comparison of the effectiveness among evacuation methods with scenario
disaster event happened at critical regions in Donald Bren Hall . . . . .
5.4
99
Comparison of the effectiveness among evacuation methods with scenario
disaster event happened at critical regions in Ten-Story Building . . . .
5.6
98
Comparison of the effectiveness among evacuation methods with scenario
disaster event happened at normal regions in Ten-Story Building . . . .
5.5
98
99
The impact of update interval on evacuation system. . . . . . . . . . . . 100
vii
List of Tables
2.1
Performance of SQL queries on Sakila dataset . . . . . . . . . . . . . . .
21
2.2
Performance of Cypher queries on Sakila dataset . . . . . . . . . . . . .
21
2.3
Performance of SQL queries on Gnutella dataset . . . . . . . . . . . . .
22
2.4
Performance of Cypher queries on Gnutella dataset . . . . . . . . . . . .
23
3.1
Node Types Description . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.2
Edge Types Description . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.3
Analysis of IoT Graph Characteristics . . . . . . . . . . . . . . . . . . .
35
4.1
Example of USCM for a graph of social shopping network . . . . . . . .
53
viii
Acknowledgements
I would like to express my deep gratitude my supervisor, Professor Kyungbaek Kim, for
his valuable guidance, enthusiastic encouragement and useful critiques of this research
work. His enthusiasm, inspiration, and great efforts helped me to grow as a research
scientist. He gave me not only advises on research but also on my life. I really appreciate
everything he has done to me!
Also, I would like to thank the rest of my examination committee Prof. HyungJeong Yang, Prof. Hyung-Seok Lim, Prof. Hyuk-Ro Park, and Prof. Dong-Han Ham
for their helpful comments and engaging conversation during my defense.
I would like to send my thankful messages to all members in our DNS laboratory for
their suggestions and comments in many works. I also would like to thank all my friends
at Chonnam National University, for sharing many useful research skills as well as life
in Korea. I would like to sincerely thank my colleagues at the Faculty of Information
Technology, Hung Yen University of Technology and Education, for our debates in
computer science and exchange of research skills during my graduate program.
I would like to take this special occasion to thank my parents, my sisters, my
brothers, for their support and encouragement throughout my study.
Finally, a very special appreciation goes out to my wife, Nguyen Thi Huyen, for
her patience, encouragement, and her love. Thank my beloved daughter, Nguyen Minh
Nguyet, and my son, Nguyen Tuan Kiet, for being the sources of unending joy and love
during my research.
ix
Abstract
In an Internet of Thing (IoT) environment, entities with different attributes and capacities are going to be collaborated in a highly connected. Specifically, not only the
mechanical and electronic devices but also other entities such as people, locations and
applications are connected to each other. Understanding and managing these connections play an important role for businesses, which identify opportunities for new IoT
services. Traditional approaches for storing and querying IoT data are used of relational database management systems (RDMS) such as MySQL or MSSQL. However,
using RDMS is not flexible and sufficient for handling highly connected heterogeneous
IoT data because these data have deeply complex relationships which require nested
queries and complex joins on multiple tables. Fortunately, graph databases have been
recently developed for storing and analyzing highly connected data. This dissertation
studies on graph-based modeling and query optimization for heterogeneous IoT data.
Firstly, we focus on how to represent heterogeneous IoT data in a graph model
with nodes representing entities and edges representing the relationships between these
entities. Our graph model fuses a social graph, a spatial graph, a things graph, and
incorporates the relationships between them into one graph. We analyze the graph
characteristics with the changes of heterogeneous IoT data which is generated by the
combination of different types of graph data. We also propose estimation models for
enriching our graph model based on IoT data itself. Using nodes and edges properties
which are obtained from raw data in IoT systems, may not always solve real challenges
and issues in particular. For instance, many fire sensors are placed at the corridors of a
smart building; however, when a fire event happens, how to know which corridor has a
hazard intensity being higher than others based on sensors’ data is a key challenge. In
x
this dissertation, we develop a few models for estimating the values of hyper-properties
(i.e. location trustiness) in order to enrich our graph model.
Secondly, to obtain knowledge from a heterogeneous IoT data, we need to deal with
a key challenge of querying on a large graph. Using regular path queries (RPQs) is
a common way to explore patterns in graph databases. Traditional automata-based
approaches for evaluating RPQs on large graphs are restricted in the graph size and/or
highly complex queries, which causes a high evaluation cost. Recently, threshold rare
label based approach applied to large graphs has been proved to be effective. However,
there is still a room for improvement because the rare labels in a graph provide coarse
information which could not always guarantee the minimum evaluation cost, especially
with parallel computing supports. In this dissertation, we propose a cost-based optimization technique for answering an RPQ on large graphs, which may minimize both
the searching and joining cost of RPQ evaluation in parallelism.
Finally, we investigate on a scalable approach for answering shortest path queries
on large graphs which are built upon massive data in dynamic changing environments.
We consider evacuation routing in a large smart building as a case study. In which,
a network of smart indicators is considered as a graph. This graph is weighted by
using disaster conditions (e.g., hazard intensity) and building conditions (e.g., corridor
capacity). The existing approaches have considered distributed shortest path queries
for finding evacuation routes in large smart buildings. Specifically, they separate the
evacuation procedure in a large smart building into small processes on each floor/subregion. However, these approaches are lack of consideration of global view information,
which is about the hazard intensity and the crowd congestion in the whole building.
For example, the evacuees following an evacuation route from a higher floor could face
a hazardous area or crowd congestion when they move to the lower floors. Another
limitation of the existing approaches is the late updating about the information of
disaster events and congestion since obtaining them often takes much time. Therefore,
we propose a scalable routing approach which not only generates effective routes for
evacuees but also quickly updates route with the consideration of current disaster status
and building conditions which change dynamically during the evacuation time.
xi
Chapter 1
Introduction
1.1 Motivation
In an Internet of Thing (IoT) environment, entities with different attributes and capacities are going to be connected in a highly connected fashion [1]. Specifically, not only
the mechanical and electronic devices but also other entities such as people, locations
and applications are connected to each other. The main technical requirements of an
IoT application include (1) a flexible data model and (2) real-time response.
Most IoT applications must work with dynamic and speedily changing systems
due to new entities are coming online and/or the connection between these entities
can change regularly. This requires a data model that enables to easily represent the
entities, and support adding, deleting and updating relations between entities without
impacting application availability. Fortunately, graph databases are purposely-built to
store highly connected data with nodes representing entities and edges representing
relationships between these entities.
There are a lot of real-life IoT applications exploiting graph-based techniques as a
key component to bring various benefits to a variety of domains.
• Evacuation Systems in Smart Buildings: Smart buildings are becoming a reality
with the support of smart devices such as smart indicators, smart sensors, smart
1
cameras, and RFIDs [2, 3]. These smart devices play an important role in monitoring and tracking the events/conditions inside the building to provide useful
information for building management systems. Recently, the weighted graphbased approaches using IoT data in smart buildings have proved to be efficient
in dynamically find the evacuation routes during disaster situations [4].
• Smart Transport Services: The IoT technology allows people to get a new experience in the taxi industry. For example, transportation network companies
like Uber, Grab, or Kakao have created smart services by collecting, storing, and
processing the data from a huge number of smartphones running their application. The locations of customers and taxi drivers mixed with data on traffic flow,
weather, and other events to generate a weighted graph which enables picking
up the best driver for customers [5, 6]. These are good examples of the business
value that IoT can bring by using graph databases.
• Social Networks: A social media application (e.g., Facebook) is about connections
between people, therefore, it is basically has a graph structure. It is obvious that
graph databases are well-suited to social media applications. They speed up
the development of such applications, enhance an app�s overall performance, and
support companies a better way to understand their data [7, 8].
• IT network analysis: As the complexity of IoT network and IT infrastructure
grows, we need a more powerful configuration management database than a relational database can provide. The graph database is used to connect the network,
data center, and IT assets in order to get important insights into the relationships
between different operations within the network [9, 10].
Understanding connections between things in such IoT applications above plays an
important role for businesses, which identify opportunities for new services. To do
this, businesses need techniques that can evaluate the connections quickly and easily
in a real-time manner. Traditional approaches for storing and querying IoT data are
used of relational database management systems (RDMS) such as MySQL or MSSQL.
However, using RDMS is not flexible and sufficient for handling heterogeneous IoT data
2
because these data have deeply complex relationships which require nested queries and
complex joins on multiple tables [11]. Motivated by useful IoT applications and the
limitations of traditional IoT data management systems, this dissertation studies on
graph-based modeling and query optimization for heterogeneous IoT data management.
1.2 Research Problems and Objectives
This thesis aims at investigating and providing efficient methods for managing heterogeneous IoT data while considering cost-efficiency and its impact on the system
performance based on graph models and graph queries optimization techniques. To address the challenges that were discussed in the previous section, this thesis has identified
and investigated the following research problems:
• How to design a graph data model for heterogeneous IoT data representation? As a data modeling process, we need to translate IoT data in a
conceptual view to a graph model. During the graph modeling process, we determine (1) which entities in the dataset should be nodes (or vertex), (2) which
should be edges, and (3) which should be properties. The result is a blueprint
of whole entities, relationships, and properties in the dataset. We can use that
blueprint to create a visualization model.
• How to utilize the raw IoT data for enriching graph models? Using
only raw IoT data for graph modeling is insufficient in some complex cases of
understanding real situations which require multiple factors (attributes) to be
considered together. For instance, many fire sensors are placed at the corridors
of a smart building; however, when a fire event happens, how to know which
corridor has a hazard intensity being higher than others based on sensors� data
is a key challenge. To obtain a better performance of using graph databases, we
need to provide hyper-properties (e.g., evacuation cost of a route segment) which
are modeled by combining other raw IoT data. The hyper-properties will be used
directly with graph querying techniques. This improves the performance of IoT
3
applications. There are two questions that need to be considered while creating
hyper-properties: (1) which data should be chosen for modeling a hyper-property?
and (2) how to model the hyper-properties?
• How to optimize regular path queries on large graphs? Using Regular
Path Queries (RPQs) is a common way to explore patterns in graph databases.
Traditional automata-based approaches for evaluating RPQs on large graphs are
restricted in the graph size and/or highly complex queries, which causes a high
evaluation cost. Recently, threshold rare label based approach applied on large
graphs has been proved to be effective. However, there is still a room for improvement because the rare labels in a graph provide coarse information which
could not always guarantee the minimum evaluation cost, especially with parallel computing supports. We investigate on development of an efficient method
answering an RPQ on large graphs by separating it into subqueries, which may
minimize both the searching and joining cost of RPQ evaluation in parallelism.
• How to develop a scalable approach for answering shortest path queries
to deal with large graph databases in dynamic changing environments?
We consider evacuation routing in a large smart building as a case study. In
which, a network of smart indicators is considered as a large graph. This graph
is weighted by using disaster conditions (e.g., hazard intensity) and building conditions (e.g., corridor capacity). The existing approaches have considered distributed shortest path queries for finding evacuation routes in large smart buildings. Specifically, they separate the evacuation procedure in a large smart building into small processes on each floor/sub-region. However, these approaches are
lack of consideration of global view information, which is about the hazard intensity and the crowd congestion in the whole building. Consequently, in some cases,
the evacuees following an evacuation route from a higher floor could face a hazardous area or a crowd congestion when they move to the lower floors. Another
limitation of the existing approaches is the late updating about the information
of disaster events and congestion since obtaining them often takes much time.
Therefore, we investigate a scalable routing approach which not only generates
4
effective routes for evacuees but also quickly updates routes as the disaster status
and building conditions could change during the evacuation time.
1.3 Contributions
To address the research problems mentioned in Section 1.2, this thesis made the following key contributions:
1. A study on graph models for constructing graph databases from IoT data. (Chapter 3).
• The proposed models fuse a social graph, a spatial graph, and a things graph,
and incorporates the relationships among them to generate graph databases.
• A study on location trustiness based on multimodal information. This work
focuses on analyzing the output signals of disaster sensors such as smoke
sensor and temperature sensor, and calculating the impact of events based on
data features. A case study of using weighted graph models with trustiness
of location in the IoT environment is presented.
2. A study on optimization of regular path queries (RPQs) for large graphs (Chapter
4).
• The cost functions and algorithms for estimating the searching cost and the
result size of a given RPQ by using Unit-Subquery Cost Matrix (USCM) are
presented. The proposed method can estimate the searching cost and the
result size not only for simple queries (concatenation and alternation RPQs)
but also complex queries (RPQs having Kleene Star operator) and highly
complex queries (nested queries).
• An efficient method to improve the performance of evaluating an RPQ by
splitting it into smaller subqueries based on their searching cost and result
size consideration.
5
3. A scalable approach for answering shortest path queries to deal with massive data
in a dynamic changing environment: a case study of evacuation systems for large
smart buildings (Chapter 5).
• A design of a flexible and scalable evacuation system for large smart buildings based on the integration of IoT devices and a distributed graph based
evaluation technique.
• A LCDT model representing the impact of disaster conditions and building
conditions on every route segment in the building.
• A distributed shortest path algorithm that exploits the LCDT model for
efficient evacuation route planning.
• A caching strategy which generates effective evacuation routes more frequently in a given time, and it increases situation awareness of dynamic
routes to improve their effectiveness.
• A simulation tool to evaluate and prove that our approach outperforms other
ones.
1.4 Dissertation Structure
This dissertation is organized as follows. In Chapter 2, the IoT data characteristics and
challenges of IoT data management are described. We briefly present a stack of an IoT
architecture. Also, three main graph querying techniques for accessing IoT data are
presented. In Chapter 3, graph data models are formally defined. A graph model for
constructing a graph database from IoT data is proposed. The proposed model fuses a
social graph, a spatial graph, and a things graph into one graph model, and incorporates
the relationships among them. This chapter also presents a study on location trustiness
based on multimodal information. Chapter 4 proposes an optimization technique for
improving the performance of regular path queries on large graphs. Chapter 5 presents
a scalable approach for answering shortest path queries on large graphs in dynamic
changing environments: a case study of evacuation systems for large smart buildings.
Chapter 6 summarizes the paper and shows our future works.
6
1.5 Dissertation Publications
In the following, I include the papers from my work related to this dissertation. The
list of published papers is as follows:
International Journals
1. Van-Quyet Nguyen, Quyet-Thang Huynh, Kyungbaek Kim, “Estimating Searching Cost of Regular Path Queries on Large Graphs by Exploiting Unit-Subqueries”,
Journal of Heuristics, 2018. DOI: 10.1007/s10732-018-9402-0
2. Van-Quyet Nguyen and Kyungbaek Kim, “Efficient Regular Path Query Evaluation by Splitting with Unit-Subquery Cost Matrix”, IEICE Transactions on
Information and Systems, Vol.E100.D, No.10, pp.2648-2652, October 2017.
Domestic (Korean) Journals
1. Van-Quyet Nguyen, Giang-Truong Nguyen, Sinh-Ngoc Nguyen, Kyungbaek
Kim, “Leveraging Social Media for Enriching Disaster related Location Trustiness”, Journal of Digital Contents Society Vol.18 No.3 pp. 567-575, June , 2017.
2. Tiep Vu Duc, Quyet Nguyen-Van, Kyungbaek Kim, “Design and Implementation of Geo-Social information based Personalized Warning Notification system”,
Smart Media Journal Vol.5 No.2 pp. 42-50, June 30, 2016.
International Conferences
1. Van-Quyet Nguyen, Huu-Duy Nguyen, Quyet-Thang Huynh, Nalini Venkatasubramanian, Kyungbaek Kim, “A Scalable Approach for Dynamic Evacuation
Routing in Large Smart Buildings”, In Proceedings of the Fifth International
7
Conference on Smart Computing (SMARTCOMP 2019), (pp. 292-300), June
12-14, 2019, Washington D.C., US.
2. Van-Quyet Nguyen and Kyungbaek Kim, “Estimating the Evaluation Cost of
Regular Path Queries on Large Graphs”, In Proceedings of the Eighth Symposium
on Information and Communication Technology. ACM, (pp. 92-99). Nha Trang,
Vietnam, December 7-8, 2017.
3. Van-Quyet Nguyen, Sinh-Ngoc Nguyen, Kyungbaek Kim, “Enabling DisasterResilient SDN with Location Trustiness”, In Proceedings of the 4th International
Conference on Computer Applications and Information Processing Technology
(CAIPT 2017), August 8-10, 2017, Bali, Indonesia.
4. Nguyen Van Quyet, Kyungbaek Kim, “Study on Location Trustiness based on
Multimodal Information”, In Proceedings of the 4th International Conference on
Smart Media and Applications (SMA), January 11-12, Danang, Vietnam.
Domestic (Korean) Conferences
1. Van-Quyet Nguyen and Kyungbaek Kim, “Comparison of Relational Databases
and Graph Databases for Heterogeneous IoT Data Management”, In Proceedings of KISM Spring Conference April 26-27, 2019, Korea National University of
Transportation, Chungju, South Korea.
2. Van-Quyet Nguyen, Huu-Duy Nguyen, Giang-Truong Nguyen, Kyungbaek
Kim, “A Graph Model of Heterogeneous IoT Data Representation: A Case Study
from Smart Campus Management”, In Proceedings of KISM Fall Conference
November 02-03, 2018, Pukyong National University, Busan, South Korea.
3. Quyet Nguyen-Van, Ngoc Nguyen-Sinh, Kyungbaek Kim, “VLTNet: A framework for Visualizaing location trustiness weighted Network Topology”, In Proceedings of KISM Spring Conference April 29-30, 2016, Silla University, Busan,
South Korea.
8
4. Nguyen Van Quyet, Kyungbaek Kim, “Method of Calculating Trustiness of
Location Based on Earthquake Information”, In Proceedings of KISM Fall Conference, October 23-24, 2015, Chonson University, Gwangju, South Korea.
9
Chapter 2
Background
In this chapter, we first present the major characteristics of IoT data as well as challenges in IoT data management. We then describe an overview of the architecture of
a general IoT platform. We mainly focus on IoT data management with consideration of data collecting, data storing, and data analyzing. Finally, for exploiting highly
connected IoT data, graph querying techniques are introduced.
2.1 IoT Data Characteristics
We briefly present four characteristics of IoT data as the following.
2.1.1 Heterogeneity
With different ‘things’ in IoT depending on types of devices and services, IoT platforms
generate different kinds of data including structured, semi-structured, and unstructured
data [12]. For example, in an IoT platform for smart building management, the data
generated from many kinds of devices (e.g., temperature sensors, smoke sensors, cameras, etc.) are unstructured; meanwhile, information about people or rooms in the
building can be managed as structured data using data tables or as semi-structured
data using XML files [13, 14]. Thus, how to manage heterogeneous IoT data so that
10
they can be easily explored in IoT applications is considered as a challenge.
2.1.2 Highly Connected Data
In an Internet of Thing (IoT) environment, entities with different attributes and capacities are going to be connected in a highly connected fashion. Specifically, not only
the mechanical and electronic devices but also other entities such as people, locations
and applications are connected to each other. For example, in a smart building at Amsterdam, called the Edge, there are 22,000 IoT connected devices [15]. Some questions
would be raised when a sensor reported that a room was getting colder such as why
could the room be getting colder?, what is the status of the heating systems?, or how
many people are in the room?. To answer these questions, it is necessary to analyze the
things surrounding the room. Therefore, understanding and managing these connections between things are challenges in IoT data management, which plays an important
role for businesses.
2.1.3 Dynamic Changes
Most IoT applications work with dynamic and speedily changing data due to new entities are coming online and/or the connection between these entities can be changed
regularly [16, 17]. For example, in smart transport service, a car can share the information about the status of the road it is going on to other cars nearby it. Those
connections among cars can make a network of cars on the road. But, cars might move
so fast and disconnect with others, so how to manage those kinds of connections in
an efficient and effective way. Therefore, there is a need for a data model to easily
represent the entities, and support adding, deleting and updating relations between
entities without impacting application availability.
11
Figure 2.1: A general IoT architecture stack
2.1.4 Massive Real-time Data
Massive IoT data is generated every minute through multiple kinds of devices and
services such as sensors and social networking [18, 19]. For instance, a huge number
of images is generated in real-time through cameras in a smart building, which is
unsuitable to be resided in a traditional relational database (e.g., MySQL, SQL Server).
Therefore, modeling such kind of data in an easy way and real-time processing are
considered as challenges.
2.2 IoT Architecture Stack
Figure 2.1 shows a common architecture design of an IoT platform [20, 21, 22]. It
consists of four layers: Things Layer, Communication layer, Data Management Layer,
and Application layer.
12