Data Modelling – SKY Case Study
Overview of the data modelling for the project SKY in Cambodia
April 2011
1
Agenda
Introduction
Project overview
Data modelling
2
Introduction
What is the data modelling?
Data modelling is the method used to define and analyze data requirements needed
to support the business processes of a project. The data requirements are recorded
as a conceptual data model with associated data definitions.
In this case study, you will notably discover :
– How the requirements have influenced the data modelling
– What are the typical questions when modelling a database
3
Introduction
Attention :
At any time, in case you don’t understand any point, please let me know
by putting your hand 'up!
4
H-SPIS Project Overview
Context (1/2)
GRET, a French Non Governmental Organization (NGO), has built an Health
Insurance program called SKY that enables insured people to access to quality
health care in getting rid of money concern and ensuring that future health events
won’t impoverish them.
The program SKY has two main objectives:
Secure the assets and incomes of vulnerable Cambodian households by limiting
the economic consequences of large health expenditures.
Insured people pay a regular premium/contribution. In exchange, when the insured
people are sick, they received all health care needed to recover, without paying
anything.
Facilitate and encourage access for these households to quality health care, both
at primary and secondary levels, to prevent severe health risks
5
H-SPIS Project Overview
Context (2/2)
The GRET works in cooperation with two other organizations specialized in health
micro-insurance:
Health Equity Fund (HEF) aims at covering health expenses of poor
identified families in Cambodia totally for free. URC plays role as an
implementer and sub grant to Cambodian NGO to operate HEF at district
level.
HIP, a new program involved by the GRET, consists in creating a social
security for formal sector workers. It works with the GMAC (union of Garment
Manufacturing employers) representing 80% of the export from Cambodia
and 20% of the Gross National Product under the agreement and observance
of the Cambodian Ministry of Health (MoH). The insurance premium is paid
for half by the factory, the other half by the workers
6
H-SPIS Project Overview
Objectives
GRET wants to build a common application enabled to gather the common
requirements of the 3 organizations GRET, URC and HIP, by maintaining the
particular specificities of the 3 streams
GRET
H-SPIS
URC
HIP
The new combined project is named H-SPIS for
Health Social Protection Insurance System!
The objectives of the new system H-SPIS, combining the 3 operators applications, are
multiple:
Increase the covered population thanks to the sharing of the operator’s network
Share the costs and staff based in the different hospitals and offices in the country
Become a reference for a National Social Insurance in Cambodia : quite ambitious!
7
H-SPIS Project Overview
Data base Requirements (1/2)
The data of the 3 operators GRET, URC and HIP must be contained in a common
central database in the Head Office
In the provinces or operational districts , the field operators of GRET, URC and HIP
cannot access the central database by a standard web connection which is too slow:
they must access a local database which contains only the data of the OD
The local database must be replicated with the central database aimed at
consolidating all the information of the country, for all the 3 operators
8
H-SPIS Project Overview
Data base Requirements (2/2)
The combined database requirements has lead to the following architecture:
Global Database
Global Database
(HIP, GRET, URC)
(HIP, GRET, URC)
Hospitals
•Control
membership
Head Office
• Project Management
• IT Management
• Finance (e.g.: invoice
•payments)
Consultation
Data Replication
Consultation
• Track
Consumption
Field Office in OD1
Field Office
in OD2
• Enter member & premium data
• Enter consumption data/claims
Field Office
in OD3
Database
in OD2
Data entry
Health Centre 1
Health Centre 2
• Control membership
• Track consumption
Computerized tasks
Manual tasks
• Control membership
• Track consumption
Legend:
9
…
…
…
…
…
9
H-SPIS Data modeling
1. Data modeling process
To perform the data modeling, you need to proceed the following steps:
1.
2.
3.
4.
5.
6.
7.
Identify entity types
Identify attributes
Apply naming conventions
Identify relationships
Assign keys
Normalize to reduce data redundancy
Denormalize to improve performance
10
H-SPIS Data modeling
1. Identify entity types
H-SPIS Conceptual Data model
Health Consumption Domain
UseService
Household Domain
UseContact
Factory
Persons
Addresses
Health Center
UseFood
UseDrug
UseTranport
Family
A conceptual data
model is a map of
concepts and their
relationships.
This describes the
semantics of an
organization
Services
Wards
Villages
Health
Facilities
OD
Etc…
SKY
Products
HF Pricing
Master Reference Domain
Contract
Fund Reserve
It represents a
series of assertions
about its nature
Premium
Calls
Coupons
Bonus
Contract Domain
11
H-SPIS Data modeling
2. Identify attributes
An attribute is a property or characteristic of an entity. Color, for example, is an
attribute of your hair.
Each entity type will have one or more data attributes:
For the family: name, family size, standard of living… are some of its attributes
Health Center : level of health care, quality of health care, number of wards,
capacity, ….
Example : modelling the attributes of a person
TBL_HOUSEHOLD_PERSON
TBL_HOUSEHOLD_PERSON
PK_PERSON_ID
PK_PERSON_ID
BIRTH_OF_DATE
BIRTH_OF_DATE
VAR_LATIN_NAME
VAR_LATIN_NAME
VAR_KHMER_NAME
VAR_KHMER_NAME
VAR_LATIN_FIRSTNAME
VAR_LATIN_FIRSTNAME
VAR_KHMER_FIRSTNAME
VAR_KHMER_FIRSTNAME
NUM_AGE
NUM_AGE
DATE_REGISTER_DATE
DATE_REGISTER_DATE
VAR_JOB
VAR_JOB
…………
…………
…………
…………
H-SPIS Data modeling
3. Apply Naming Conventions
For code standardisations, it is important to respect a naming convention rule :
The main goal of adopting a naming convention for database objects is so that
you and others can easily identify the type and purpose of all objects contained in
the database. The typical objects where the naming convention is applied are the
following :
1.Tables
2.Columns (incl. Primary, Foreign and Composite Keys)
3.Indexes
4.Constraints
5.…
For H-SPIS Project, the following rule of naming convention was applied :
For Tables, names are structured according to: TBL_DOMAIN_NAME_TABLE
NAME
Example : TBL_HOUSEHOLD_PERSON represents a table from the Household
For columns, names are structured according to: COLUMN TYPE_COLUMN
_NAME
Example : DATE_REGISTER_DATE is a date, NUM_AGE is a numeric column,
VAR_LATIN_NAME is a varchar type (character type)
H-SPIS Data modeling
4. Identify relationships
In the real world entities have relationships with other entities. For example,
persons BELONG to families, families LIVE AT addresses, and Families are PART OF
the area around the Health Centers. Belong, live at, and are part of are all terms
that define relationships between entities.
The relationships between entities are conceptually identical to the relationships
(associations) between objects. For each relationship, you need also to define a
cardinality of the relationship between two entities as followed:
Persons
1
n
Health Center
1
1
1
Family
1
Addresses
H-SPIS Data modeling
3. Identify entity types : Focus on reference domain
Master Reference data comprises a set of data which consistently defines and
manages the non-transactional data entities of an organization.
Reference data have the objective of providing processes for collecting, matching,
quality-assuring, persisting and distributing such data throughout an organization
to ensure consistency and control in the ongoing maintenance and application use
of this information.
Example:
- geographical data : lists of villages, operational districts, provinces,...
- medical data : lists of diseases, health facilities, service and wards
- contract data : list of premium contracts and tariff rates
Reference data management enables the following features of your system :
Data quality : in the user interface, the data entry is typically constrained and
limited to a list value retrieved if the reference tables and manual data entries are
limited.
Configurability and maintainability : the database and program development
are based on reference data which are propagated automatically in the overall
system. When you add a new value in your reference data, it’s then automatically
distributed in the overall system without changes performed in the code.
H-SPIS Data modeling
3. Identify entity types : Focus on reference domain
Example : adding a new family in the system in the GUI (Graphical User Interface)
Real sector
Formal
Informal
Semi-Formal
Real sector
Formal
Unformal
Health centers
Provincial Hospital
Mkak
Ou Ambel
Kampong Svay
…………
Province
Battambang
Kampong Chhnang
Phnom Penh
Kampot
….
All data types which come from a pre-defined and limited list of values are:
- in the GUI : displayed in drop down list so that the user can choose the
value to select
- from a database perspective, these values are retrieved from tables
covered by the reference domain
H-SPIS Data modeling
3. Identify entity types : Focus on reference domain
Example of tables of the reference domain and their content
- List of health facilities
- List of type of sector
H-SPIS Data modeling
3. Identify entity types : Focus on reference domain
Example : Modelling part of the Household Domain by designing the family, person and
address tables, knowing that :
A family lives at one address entered into the system
A family contains one or several family in its
the address information are hierarchal and a village belongs to a commune
which belongs to a district which belongs to a province:
H-SPIS Data modeling
3. Identify entity types : Focus on reference domain
Modelling the family table and its address
TBL_HOUSEHOLD_ADRESS
TBL_HOUSEHOLD_ADRESS
TBL_REFERENCE_VILLAGE
TBL_REFERENCE_VILLAGE
TBL_HOUSEHOLD_FAMILY
TBL_HOUSEHOLD_FAMILY
PK_FAMILY_ID
PK_FAMILY_ID
FK_ADDRESS
FK_ADDRESS ID
ID
FK_SECTOR_ID
FK_SECTOR_ID
NUM_FAMILY_CODE
NUM_FAMILY_CODE
VAR_LATIN_NAME
VAR_LATIN_NAME
VAR_KHMER_NAME
VAR_KHMER_NAME
NUM_FAMILY_SIZE
NUM_FAMILY_SIZE
DATE_REGISTER_DATE
DATE_REGISTER_DATE
NUM_OLD_FAMILY_CODE
NUM_OLD_FAMILY_CODE
…………
…………
…………
…………
PK_ADDRESS_ID
PK_ADDRESS_ID
FK_VILLAGE_ID
FK_VILLAGE_ID
NUM_STREET_NO
NUM_STREET_NO
NUM_HOUSE_NO
NUM_HOUSE_NO
…………
…………
…………
…………
PK_VILLAGE_ID
PK_VILLAGE_ID
FK_COMMUNE_ID
FK_COMMUNE_ID
VAR_VILLAGE_NAME
VAR_VILLAGE_NAME
TBL_REFERENCE_COMMUNE
TBL_REFERENCE_COMMUNE
PK_COMMUNE__ID
PK_COMMUNE__ID
FK_COMMUNE_ID
FK_COMMUNE_ID
VAR_VILLAGE_NAME
VAR_VILLAGE_NAME
TBL_REFERENCE_DISTRICT
TBL_REFERENCE_DISTRICT
TBL_REFERENCE_PROVINCE
TBL_REFERENCE_PROVINCE
PK_PROVINCE__ID
PK_PROVINCE__ID
VAR_DPROVINCE_NAME
VAR_DPROVINCE_NAME
PK_DISTRICT__ID
PK_DISTRICT__ID
FK_PROVINCE_ID
FK_PROVINCE_ID
VAR_DISTRICT_NAME
VAR_DISTRICT_NAME
H-SPIS Data modeling
3. Identify entity types : Focus on reference domain
How to manage the reference domain?
Even though the reference data are non-transactional and are supposed to change
very rarely, there are some cases where a new value needs to be added in the
system.
In that case, the rights to perform the changes on reference data are attributed to
a limited specific users of the application, commonly called administrators of the
database, who can add a new line into the database or update some others.
H-SPIS Data modeling
5. Identify keys : primary keys
In relational database design, a unique key can uniquely identify each row
in a table. A unique key comprises a single column or a set of columns.
A unique key is generally an technical identifier, automatically
incremented for each new row added in the table and is disconnected
from any business signification
Example : in H-SPIS the family code for SKY follows the format :
OD Code | Health Center code | internal counter
For URC and HIP, they have other rules for the code generation. Thus, it was
absolutely mandatory to distinguish the family code from an internal
unique id in the table to avoid conflict between the codes of the different
operators, that could generate application crashes.
To avoid this, the recommendation is to manage one technical id as
the primary key, completely independent from the functional key
On top of the technical unique key, the distributed in architecture for H-SPIS
required to add in the primary key following colums:
The ID the Operational district to avoid any technical conflicts while
replicating the local database to the central database
H-SPIS Data modeling
5. Identify keys : primary keys
Difference between the functional and technical key
Functional Key
Technical Key
Represents a value
which is comprehensible
for the business and
users
Doesn’t mean any
signification for the
business or the users (just
internal key in the system)
Is supposed to define
in an unique way the
records in the table
Define (as a fact) in an
unique way the records in
the table by automatically
incrementing the
technical key of +1 for
each new row
Example : for a person, one’s national ID is the functional key
although a technical unique id is defined as primary key for the
table
H-SPIS Data modeling
5. Identify keys : primary keys
What happens in case we do not include the OD ID in the primary key of
each table?
Error for row in double in the
TBL_HOUSEHOLD_FAMILY
PK_
NUM_
TBL_HOUSEHOLD_FAMILY
TBL_HOUSEHOLD_FAMILY
FAMILY_ID
FAMILY_CODE
VAR_
LATIN_NAME
1
4343422
LONG
1
6666353
SALZE
Global Database
1
043GGXX
KHANG
(HIP, GRET, URC)
2
5354091
SONG
2
5204532240
GOURSAT
3
0135244
PRAK
3
0135112120
TOMG
34402
FSFR34
BOUN
….
TBL_HOUSEHOLD_FAMILY
TBL_HOUSEHOLD_FAMILY
PK_
FAMILY_ID
NUM_
FAMILY_CODE
VAR_
LATIN_NAME
1
6666353
SALZE
2
5204532240
GOURSAT
TBL_HOUSEHOLD_FAMILY
TBL_HOUSEHOLD_FAMILY
….
TBL_HOUSEHOLD_FAMILY
TBL_HOUSEHOLD_FAMILY
PK_
FAMILY_ID
NUM_
FAMILY_CODE
VAR_
LATIN_NAME
1
4343422
LONG
2
5354091
SONG
3
0135244
PRAK
Legend:
….
….
Replication
Replication process
process from
from local
local tt
oo centralized
centralized database
database
….
PK_
FAMILY_ID
NUM_
FAMILY_CODE
VAR_
LATIN_NAME
1
043GGXX
KHANG
34402
FSFR34
BOUN
3
0135112120
TOMG
….
….
….
H-SPIS Data modeling
5. Identify keys : primary keys
What happens if we add the OD ID in the primary key of each table?
TBL_HOUSEHOLD_FAMILY
TBL_HOUSEHOLD_FAMILY
No error during replication process since the
primary key is composed of the technical ID
plus the OD ID
PK_
FAMILY
_ID
PK_OD_ID
NUM_
FAMILY_C
ODE
VAR_
LATIN_NA
ME
1
OD1
4343422
LONG
1
OD2
6666353
SALZE
1
OD3
043GGXX
KHANG
Global Database
2
OD1
5354091
SONG
(HIP, GRET, URC)
2
OD2
520453224
0
GOURSA
T
3
OD1
0135244
PRAK
3
OD1
013511212
0
TOMG
34402
OD1
FSFR34
BOUN
TBL_HOUSEHOLD_FAMILY
TBL_HOUSEHOLD_FAMILY
PK_
FAMILY
_ID
PK_OD_ID
NUM_
FAMILY_C
ODE
VAR_
LATIN_NA
ME
1
OD2
6666353
SALZE
2
OD2
520453224
0
TBL_HOUSEHOLD_FAMILY
TBL_HOUSEHOLD_FAMILY
PK_
FAMILY
_ID
PK_OD_ID
NUM_
FAMILY_C
ODE
VAR_
LATIN_NA
ME
1
OD1
4343422
LONG
2
OD1
5354091
SONG
3
OD1
0135244
PRAK
….
….
….
….
….
GOURSA
T
TBL_HOUSEHOLD_FAMILY
TBL_HOUSEHOLD_FAMILY
PK_
FAMILY
_ID
PK_OD_ID
NUM_
FAMILY_C
ODE
VAR_
LATIN_NA
ME
1
OD3
043GGXX
KHANG
34402
OD3
FSFR34
BOUN
3
OD3
013511212
0
TOMG
….
….
….
H-SPIS Data modeling
5. Identify keys : foreign keys
The foreign key identifies a column or a set of columns in one (referencing)
table that refers to a set of columns in another (referenced) table. The
columns in the referencing table must be the primary key or other candidate
key in the referenced table. The values in one row of the referencing
columns must occur in a single row in the referenced table.
A primary key can be targeted by multiple foreign keys from other
tables.
But a primary key does not necessarily have to be the target of any
foreign keys.
Thus, a row in the referencing table cannot contain values that don't
exist in the referenced table (except potentially NULL). This way
references can be made to link information together and it is an
essential part of database normalization.
Multiple rows in the referencing table may refer to the same row in the
referenced table. Most of the time, it reflects the one (parent table or
referenced table) to many (child table, or referencing table) relationship.
A table may have multiple foreign keys, and each foreign key can have a
different referenced table.