BiomedicalEngineering632
EICA (Liu, 2004). Here, we apply the ICA algorithm on
T
m
P
which is in the reduced
subspace containing the first
m
eigenvectors. To find the statistically independent basis
images, each PCA basis image is the row of the input variables and the pixel values are
observations for the variables. Thus,
T
I
CA m
U W P
(5)
where
U
is the obtained basis images comprised with the coefficient
I
CA
W
and the
eigenvectors
T
m
P
. Some of the basis images are shown in Fig. 4. The reconstructed image set
X
is then described as
1
.
T
m ICA
X VP VW U
(6)
Therefore, the IC representation
U
can be computed by the rows of the feature vector
R
followed as
1
I
CA
R
VW
. (7)
For the final step of FICA, FLD is performed on the IC feature vectors of
R
. FLD is based on
the class specific information which maximizes the ratio of the between-class scatter matrix
and the within-class scatter matrix. The formulas for the within,
W
S
and between,
B
S
scatter
matrix are defined as follows:
~ ~
1
( )( ) ,
c
T
W k i k i
i r Ci
k
S r r r r
(8)
~ ~
1
( - )( - )
c
T
B i i m i m
i
S N r r r r
(9)
where
c
is the total number of classes,
i
N
the number of facial expression images,
k
r
the
feature vector from all feature vector
R
,
i
r
the mean of class
i
C
, and
m
r
the mean of all
feature vectors
R
.
The optimal projection
d
W
is chosen from the maximization of ratio of the determinant of
the between class scatter matrix of the projection data to the determinant of the within class
scatter matrix of the projected samples as
( ) | | / | |
T T
d d B d d w d
J
W W S W W S W
(10)
where
d
W
is the set of discriminant vectors of
B
S
and
W
S
corresponding to the
1c
largest
generalized eigenvalues. The discriminant ratio is derived by solving the generalized
eigenvalue problem such that
B
d W d
S W S W
(11)
where
is the diagonal eigenvalue matrix. This discriminant vector
d
W
forms the basis of
the
( - 1)c
dimensional subspace for a
c
-class problem.
Fig. 3. Facial expression representation onto the reduced feature space using PCA. These are
also known as eigenfaces.
Fig. 4. Sample IC basis images.
Finally, the final feature vector
G
and the feature vector
test
G
for testing images can be
obtained by the criterion
,
T
d
G R W
(12)
1
.
T T
test test d test m ICA d
G R W X P W W
(13)
As the result of FICA, the vectors of each separated classes can be obtained. As can be seen
in Fig. 5, the feature vectors associated with a specific expression are concentrated in a
separated region in the feature space showing its gradual changes of each expression. The
features of the neutral faces are located in the centre of the whole feature space as the origin
of the facial expression, and the feature vectors of the target expressions are located in each
HumanFacialExpressionRecognition
UsingFisherIndependentComponentAnalysisandHiddenMarkovModel 633
EICA (Liu, 2004). Here, we apply the ICA algorithm on
T
m
P
which is in the reduced
subspace containing the first
m
eigenvectors. To find the statistically independent basis
images, each PCA basis image is the row of the input variables and the pixel values are
observations for the variables. Thus,
T
I
CA m
U W P
(5)
where
U
is the obtained basis images comprised with the coefficient
I
CA
W
and the
eigenvectors
T
m
P
. Some of the basis images are shown in Fig. 4. The reconstructed image set
X
is then described as
1
.
T
m ICA
X VP VW U
(6)
Therefore, the IC representation
U
can be computed by the rows of the feature vector
R
followed as
1
I
CA
R
VW
. (7)
For the final step of FICA, FLD is performed on the IC feature vectors of
R
. FLD is based on
the class specific information which maximizes the ratio of the between-class scatter matrix
and the within-class scatter matrix. The formulas for the within,
W
S
and between,
B
S
scatter
matrix are defined as follows:
~ ~
1
( )( ) ,
c
T
W k i k i
i r Ci
k
S r r r r
(8)
~ ~
1
( - )( - )
c
T
B i i m i m
i
S N r r r r
(9)
where
c
is the total number of classes,
i
N
the number of facial expression images,
k
r
the
feature vector from all feature vector
R
,
i
r
the mean of class
i
C
, and
m
r
the mean of all
feature vectors
R
.
The optimal projection
d
W
is chosen from the maximization of ratio of the determinant of
the between class scatter matrix of the projection data to the determinant of the within class
scatter matrix of the projected samples as
( ) | | / | |
T T
d d B d d w d
J
W W S W W S W
(10)
where
d
W
is the set of discriminant vectors of
B
S
and
W
S
corresponding to the
1c
largest
generalized eigenvalues. The discriminant ratio is derived by solving the generalized
eigenvalue problem such that
B
d W d
S W S W
(11)
where
is the diagonal eigenvalue matrix. This discriminant vector
d
W
forms the basis of
the
( - 1)c
dimensional subspace for a
c
-class problem.
Fig. 3. Facial expression representation onto the reduced feature space using PCA. These are
also known as eigenfaces.
Fig. 4. Sample IC basis images.
Finally, the final feature vector
G
and the feature vector
test
G
for testing images can be
obtained by the criterion
,
T
d
G R W
(12)
1
.
T T
test test d test m ICA d
G R W X P W W
(13)
As the result of FICA, the vectors of each separated classes can be obtained. As can be seen
in Fig. 5, the feature vectors associated with a specific expression are concentrated in a
separated region in the feature space showing its gradual changes of each expression. The
features of the neutral faces are located in the centre of the whole feature space as the origin
of the facial expression, and the feature vectors of the target expressions are located in each
BiomedicalEngineering634
expression region: within each expression feature region contains the temporal variations of
the facial features. As shown in Fig. 6, a test sequence of sad expression is projected onto the
sad feature region. The projections are evolving according to the time from
1
( )P t
to
8
( )P t
,
describing facial feature changes from the neural to the peak of sad expression.
Fig. 5. Exemplar feature plot for four facial expressions.
(a)
(b)
Fig. 6. (a) Test sequences of sad expression and (b) their corresponding projections onto the
feature space.
2.3 Spatiotemporal Modelling and Recognition via HMM
Hidden Markov Model (HMM) is a statistical method of modeling and recognizing
sequential information. It has been utilized in many applications such as pattern recognition,
speech recognition, and bio-signal analysis (Rabiner, 1989). Due to its advantage of
modeling and recognizing consecutive events, we also adopted HMM as a modeler and
recognizer for facial expression recognition where expressions are concatenated from a
neutral state to a peak of each particular expression. To train each HMM, we first perform
vector quantization of training dataset of facial expression sequences to model sequential
spatiotemporal signatures. Those obtained sequential spatiotemporal signatures are then
used to train each HMM, learning each facial expression. More details are given in the
following sections.
2.3.1 Code Generation
As HMM is normally trained with the symbols of sequential data, the feature vectors
obtained from FICA must be symbolized. The symbolized feature vectors then become a
codebook which is a set of symbolized spatiotemporal signature of sequential dataset, and
the codebook is then regarded as a reference for recognizing the expression. To obtain the
codebook, vector quantization is performed on the feature vectors from the training datasets.
In our work, we utilize the Linde, Buzo and Gray (LBG)’s clustering algorithm for vector
quantization (Linde et al, 1980). The LBG approach selects the first initial centroids and
splits the centroids of the whole dataset. Then, it continues to split the dataset according to
the codeword size.
After vector quantization is done, the index numbers are regarded as the symbols of the
feature vectors to be modeled with HMMs. Fig. 7 shows the symbols of the codebook with
the size of 32 as an example. The index of codeword located in the center of the whole
feature space indicates the neutral faces and the other index numbers in each class feature
space represents a particular expression reflecting gradual changes of an expression in time.
-6
-4
-2
0
2
4
6
x 10
4
-5
0
5
x 10
4
-5
-4
-3
-2
-1
0
1
2
3
4
x 10
4
4
20
12
8
28
24
16
13
5
15
32
11
1
29
7
23
17
21
27
3
19
31
9
25
30
22
26
14
18
6
10
2
Angry
Happy
surprise
sad
codebook
Fig. 7. Exemplary symbols of the codebook in the feature space. Only four out of six
expressions are shown for clarity of presentation.
HumanFacialExpressionRecognition
UsingFisherIndependentComponentAnalysisandHiddenMarkovModel 635
expression region: within each expression feature region contains the temporal variations of
the facial features. As shown in Fig. 6, a test sequence of sad expression is projected onto the
sad feature region. The projections are evolving according to the time from
1
( )P t
to
8
( )P t
,
describing facial feature changes from the neural to the peak of sad expression.
Fig. 5. Exemplar feature plot for four facial expressions.
(a)
(b)
Fig. 6. (a) Test sequences of sad expression and (b) their corresponding projections onto the
feature space.
2.3 Spatiotemporal Modelling and Recognition via HMM
Hidden Markov Model (HMM) is a statistical method of modeling and recognizing
sequential information. It has been utilized in many applications such as pattern recognition,
speech recognition, and bio-signal analysis (Rabiner, 1989). Due to its advantage of
modeling and recognizing consecutive events, we also adopted HMM as a modeler and
recognizer for facial expression recognition where expressions are concatenated from a
neutral state to a peak of each particular expression. To train each HMM, we first perform
vector quantization of training dataset of facial expression sequences to model sequential
spatiotemporal signatures. Those obtained sequential spatiotemporal signatures are then
used to train each HMM, learning each facial expression. More details are given in the
following sections.
2.3.1 Code Generation
As HMM is normally trained with the symbols of sequential data, the feature vectors
obtained from FICA must be symbolized. The symbolized feature vectors then become a
codebook which is a set of symbolized spatiotemporal signature of sequential dataset, and
the codebook is then regarded as a reference for recognizing the expression. To obtain the
codebook, vector quantization is performed on the feature vectors from the training datasets.
In our work, we utilize the Linde, Buzo and Gray (LBG)’s clustering algorithm for vector
quantization (Linde et al, 1980). The LBG approach selects the first initial centroids and
splits the centroids of the whole dataset. Then, it continues to split the dataset according to
the codeword size.
After vector quantization is done, the index numbers are regarded as the symbols of the
feature vectors to be modeled with HMMs. Fig. 7 shows the symbols of the codebook with
the size of 32 as an example. The index of codeword located in the center of the whole
feature space indicates the neutral faces and the other index numbers in each class feature
space represents a particular expression reflecting gradual changes of an expression in time.
-6
-4
-2
0
2
4
6
x 10
4
-5
0
5
x 10
4
-5
-4
-3
-2
-1
0
1
2
3
4
x 10
4
4
20
12
8
28
24
16
13
5
15
32
11
1
29
7
23
17
21
27
3
19
31
9
25
30
22
26
14
18
6
10
2
Angry
Happy
surprise
sad
codebook
Fig. 7. Exemplary symbols of the codebook in the feature space. Only four out of six
expressions are shown for clarity of presentation.
BiomedicalEngineering636
2.3.2 HMM and Training
HMM used in this work is a left-to-right model useful to model a sequential event in a
system (Rabiner, 1989). Generally, the purpose of HMM is to determine the model
parameter
with the highest probability of the likelihood
Pr( | )O
when observing the
sequential data
1 2
{ , , , }
T
O O O O
. A HMM model is denoted as
{ , , }A B
and each
element can be defined as follows (Zhu et al., 2002). Let us denote the states in the model by
1 2
{ , , , }
N
S s s s
and each state at a given time
t
by
1 2
{ , , , }
t
Q q q q
. Then, the state
transition probability
A
, the observation symbol probability
B
, and the initial state
probability
are defined as
1
{ }, Pr( | ), 1 , ,
ij ij t j t i
A
a a q S q S i j N
(14)
{ ( )}, Pr( | ), 1 ,
j t j t t j
B
b O b O q S j N
(15)
1
{ }, Pr( ).
j j j
q S
(16)
In the learning step, we set the variable,
( , )
t
i j
, the probability of being in the state
i
q
at
time
t
and the state
j
q
at time
1t
, to re-estimate the model parameters, and we also define
the variable,
( )
t
i
, the probability of being in the state
i
q
at time
t
as follows
1 1
( ) ( ) ( )
( , ) ,
P r( | )
t ij j t t
t
i a b O j
i j
O
(17)
1
( ) ( , )
N
t
j
i i j
(18)
where
( )
t
i
is the forward variable and
( )
t
i
is the backward variable such that
1 1
,( ) ( )
i i
i b O
(1 )i q
(19)
1 1
1
( ) [ ( ) ] ( ),
N
t t ij j t
i
j i a b O
( 1, 2, , 1)t T
(20)
( ) 1,
T
i
(1 )i q
(21)
1 1
1
( ) ( ) ( ).
N
t ij j t t
j
i a b O i
( 1, 2, ,1)t T T
(22)
Using the variables above, we can estimate the updated parameters
A
and
B
of the model of
via estimating probabilities as follows
-1
1
-1
1
( , )
,
( )
T
t
t
ij
T
t
t
i j
a
i
(23)
-1
1
-1
1
( )
( )
( )
t
T
t
t
O k
j
T
t
t
i
b k
i
(24)
where
ij
a
is the estimated transition probability from the state
i
to the state
j
and
( )
j
b k
the estimated observation probability of symbol
k
from the state
j
.
When training each HMM, a training sequence is projected on the FICA feature space and
symbolized using the LBG algorithm. The obtained symbols of training sequence are
compared with the codebook to form a proper symbol set to train the HMM. Table 1
describes the examples of symbol set for some expression sequences. Symbols in the first
two frames are revealing the neutral states whose symbols are on the center of the whole
feature subspace and the symbols are assigned into each frame as each expression gradually
changes to its target state.
After training the model, the observation sequences
1 2
{ , , , }
T
O O O O
from a video dataset
are evaluated and determined by the proper model with the likelihood
Pr( | )O
. The
likelihood of the observation
O
given the trained model
can be determined via the
forward variable in the form
1
Pr( | ) ( ) .
N
T
i
O i
(25)
The criterion for recognition is the highest likelihood value of each model. Figs. 8 and 9
show the structure and transition probabilities for the anger case before and after training
with the codebook size of 32 as an example.
Expression
Frame
1
Frame
2
Frame
3
Frame
4
Frame
5
Frame
6
Frame
7
Frame
8
Joy 24 32 30 30 14 14 10 10
Sad 32 32 24 16 13 12 4 12
Angry 21 21 13 9 7 8 22 25
Surprise 23 34 34 26 19 19 27 27
Table 1. Example of codebook symbols of the training expression data.
HumanFacialExpressionRecognition
UsingFisherIndependentComponentAnalysisandHiddenMarkovModel 637
2.3.2 HMM and Training
HMM used in this work is a left-to-right model useful to model a sequential event in a
system (Rabiner, 1989). Generally, the purpose of HMM is to determine the model
parameter
with the highest probability of the likelihood
Pr( | )O
when observing the
sequential data
1 2
{ , , , }
T
O O O O
. A HMM model is denoted as
{ , , }A B
and each
element can be defined as follows (Zhu et al., 2002). Let us denote the states in the model by
1 2
{ , , , }
N
S s s s
and each state at a given time
t
by
1 2
{ , , , }
t
Q q q q
. Then, the state
transition probability
A
, the observation symbol probability
B
, and the initial state
probability
are defined as
1
{ }, Pr( | ), 1 , ,
ij ij t j t i
A
a a q S q S i j N
(14)
{ ( )}, Pr( | ), 1 ,
j t j t t j
B
b O b O q S j N
(15)
1
{ }, Pr( ).
j j j
q S
(16)
In the learning step, we set the variable,
( , )
t
i j
, the probability of being in the state
i
q
at
time
t
and the state
j
q
at time
1t
, to re-estimate the model parameters, and we also define
the variable,
( )
t
i
, the probability of being in the state
i
q
at time
t
as follows
1 1
( ) ( ) ( )
( , ) ,
P r( | )
t ij j t t
t
i a b O j
i j
O
(17)
1
( ) ( , )
N
t
j
i i j
(18)
where
( )
t
i
is the forward variable and
( )
t
i
is the backward variable such that
1 1
,( ) ( )
i i
i b O
(1 )i q
(19)
1 1
1
( ) [ ( ) ] ( ),
N
t t ij j t
i
j i a b O
( 1, 2, , 1)t T
(20)
( ) 1,
T
i
(1 )i q
(21)
1 1
1
( ) ( ) ( ).
N
t ij j t t
j
i a b O i
( 1, 2, ,1)t T T
(22)
Using the variables above, we can estimate the updated parameters
A
and
B
of the model of
via estimating probabilities as follows
-1
1
-1
1
( , )
,
( )
T
t
t
ij
T
t
t
i j
a
i
(23)
-1
1
-1
1
( )
( )
( )
t
T
t
t
O k
j
T
t
t
i
b k
i
(24)
where
ij
a
is the estimated transition probability from the state
i
to the state
j
and
( )
j
b k
the estimated observation probability of symbol
k
from the state
j
.
When training each HMM, a training sequence is projected on the FICA feature space and
symbolized using the LBG algorithm. The obtained symbols of training sequence are
compared with the codebook to form a proper symbol set to train the HMM. Table 1
describes the examples of symbol set for some expression sequences. Symbols in the first
two frames are revealing the neutral states whose symbols are on the center of the whole
feature subspace and the symbols are assigned into each frame as each expression gradually
changes to its target state.
After training the model, the observation sequences
1 2
{ , , , }
T
O O O O
from a video dataset
are evaluated and determined by the proper model with the likelihood
Pr( | )O
. The
likelihood of the observation
O
given the trained model
can be determined via the
forward variable in the form
1
Pr( | ) ( ) .
N
T
i
O i
(25)
The criterion for recognition is the highest likelihood value of each model. Figs. 8 and 9
show the structure and transition probabilities for the anger case before and after training
with the codebook size of 32 as an example.
Expression
Frame
1
Frame
2
Frame
3
Frame
4
Frame
5
Frame
6
Frame
7
Frame
8
Joy 24 32 30 30 14 14 10 10
Sad 32 32 24 16 13 12 4 12
Angry 21 21 13 9 7 8 22 25
Surprise 23 34 34 26 19 19 27 27
Table 1. Example of codebook symbols of the training expression data.
BiomedicalEngineering638
Fig. 8. HMM structure and transition probabilities for anger before training.
Fig. 9. HMM structure and transition probabilities for anger after training.
3. Experimental Setups
To assess the performance of our FER system, a set of comparison experiments were
performed with each feature extraction method including PCA, generic ICA, PCA-LDA,
EICA, and FICA in combination with the same HMMs. We recognized six different, yet
commonly tested expressions: namely, anger, joy, sadness, surprise, fear, and disgust. The
following subsections provide more details.
3.1 Facial Expression Database
The facial expression database used in our experiment is the Cohn-Kanade AU-coded facial
expression database consisting of facial expression sequences with a neutral expression as
an origin to a target facial expression (Cohn et al., 1999). The image data in the Cohn-
Kanade AU-coded facial expression database displays only the frontal view of the face and
each subset is comprised of several sequential frames of the specific expression. There are
six universal expressions to be classified and recognized. Facial expressions include 97
subjects with the subsets of some expressions. For data preparation, 267 subsets of 97
subjects which contain 8 sequences per expression are selected. A total of 25 sequences of
anger, 35 of joy, 30 of sadness, 35 of surprise, 30 of fear, and 25 of disgust sequences are used
in training and for the testing purpose, 11 of anger, 19 of joy, 13 of sadness, 20 of surprise, 12
of fear, 12 of disgust subsets are used.
3.2 Recognition Setups for RGB Images
From the database mentioned above, we selected 8 consecutive frames from each video
sequences. The selected frames are then realigned with the size of 60 by 80 pixels.
Afterwards, histogram equalization and delta image generation were performed for the
feature extraction. A total of 180 sequences from all expressions were used to build the
feature space.
S1
S2
S3
S4
0.429
0.310
0.261
0.615
0.333
0.205
0.780
0.220
1
S1 S2 S3 S4
0.333
0.333
0.333
0.333
0.333
0.333
0.5
0.5
1
As we tried to assess our FER system, we applied a total of 180 and 87 image sequences for
training and testing respectively. Next, we performed the experiments to empirically
determine the optimal number of features and the size of the codebook. To do this, we
tested a range of feature numbers selected in the PCA step. Once the optimal number of
features was determined, the experiment for the size of the codebook was conducted. We
test the performance with the different sizes (2
n
, n=4, 5, 6) of the codebook for vector
quantization along with HMM in order to determine the optimal settings.
Finally, we compared the different feature extraction methods under the same HMM
structure. Previously, PCA and ICA have been extensively explored due to its strong ability
of building a feature space, and PCA-LDA has been one of the good feature extractor
because of the LDA classifier that finds out the best linear discrimination from the PCA
subspace. In this regard, our FICA results have been compared with the conventional
feature extraction methods namely PCA, generic ICA, EICA, and PCA-LDA based on the
results for the optimal number of features with the same codebook size, and HMM
procedure.
3.3 Recognition Setups for Depth Images
Some drawbacks associated with RGB images are known that they are highly affected by
lighting conditions and colors causing the distortion of the facial shapes. As one way of
overcoming these limitations is the use of depth images. These depth images generally
reflect 3-D information of facial expression changes. In our study, we performed
preliminary studies of testing depth images and examined their performance for FER. Fig.
10 shows a set of facial expression of surprise from a depth camera called Zcam
(www.3dvsystems.com). We tested only four basic expressions in this study: namely, anger,
joy, sadness, and surprise using the method presented in the previous section (Lee et al.,
2008b).
Fig. 10. Depth facial expression images of joy.
4. Experimental Results
Before testing the presented FER system, the system requires setting of two parameters:
namely the number of features and the size of codebook. In our experiments, we have tested
the eigenvectors in the range from 50 to 190 with the training data and have decided
empirically 120 as the optimal number of eigenvectors since it provided the best overall
recognition rate. As for the size of the codebook, we have tested the codebook size of 16, 32,
and 64, and then decided 32 as the optimal codebook size since it provided the best overall
recognition rate for the test data (Lee et al., 2008a).
HumanFacialExpressionRecognition
UsingFisherIndependentComponentAnalysisandHiddenMarkovModel 639
Fig. 8. HMM structure and transition probabilities for anger before training.
Fig. 9. HMM structure and transition probabilities for anger after training.
3. Experimental Setups
To assess the performance of our FER system, a set of comparison experiments were
performed with each feature extraction method including PCA, generic ICA, PCA-LDA,
EICA, and FICA in combination with the same HMMs. We recognized six different, yet
commonly tested expressions: namely, anger, joy, sadness, surprise, fear, and disgust. The
following subsections provide more details.
3.1 Facial Expression Database
The facial expression database used in our experiment is the Cohn-Kanade AU-coded facial
expression database consisting of facial expression sequences with a neutral expression as
an origin to a target facial expression (Cohn et al., 1999). The image data in the Cohn-
Kanade AU-coded facial expression database displays only the frontal view of the face and
each subset is comprised of several sequential frames of the specific expression. There are
six universal expressions to be classified and recognized. Facial expressions include 97
subjects with the subsets of some expressions. For data preparation, 267 subsets of 97
subjects which contain 8 sequences per expression are selected. A total of 25 sequences of
anger, 35 of joy, 30 of sadness, 35 of surprise, 30 of fear, and 25 of disgust sequences are used
in training and for the testing purpose, 11 of anger, 19 of joy, 13 of sadness, 20 of surprise, 12
of fear, 12 of disgust subsets are used.
3.2 Recognition Setups for RGB Images
From the database mentioned above, we selected 8 consecutive frames from each video
sequences. The selected frames are then realigned with the size of 60 by 80 pixels.
Afterwards, histogram equalization and delta image generation were performed for the
feature extraction. A total of 180 sequences from all expressions were used to build the
feature space.
S1
S2
S3
S4
0.429
0.310
0.261
0.615
0.333
0.205
0.780
0.220
1
S1
S2
S3
S4
0.333
0.333
0.333
0.333
0.333
0.333
0.5
0.5
1
As we tried to assess our FER system, we applied a total of 180 and 87 image sequences for
training and testing respectively. Next, we performed the experiments to empirically
determine the optimal number of features and the size of the codebook. To do this, we
tested a range of feature numbers selected in the PCA step. Once the optimal number of
features was determined, the experiment for the size of the codebook was conducted. We
test the performance with the different sizes (2
n
, n=4, 5, 6) of the codebook for vector
quantization along with HMM in order to determine the optimal settings.
Finally, we compared the different feature extraction methods under the same HMM
structure. Previously, PCA and ICA have been extensively explored due to its strong ability
of building a feature space, and PCA-LDA has been one of the good feature extractor
because of the LDA classifier that finds out the best linear discrimination from the PCA
subspace. In this regard, our FICA results have been compared with the conventional
feature extraction methods namely PCA, generic ICA, EICA, and PCA-LDA based on the
results for the optimal number of features with the same codebook size, and HMM
procedure.
3.3 Recognition Setups for Depth Images
Some drawbacks associated with RGB images are known that they are highly affected by
lighting conditions and colors causing the distortion of the facial shapes. As one way of
overcoming these limitations is the use of depth images. These depth images generally
reflect 3-D information of facial expression changes. In our study, we performed
preliminary studies of testing depth images and examined their performance for FER. Fig.
10 shows a set of facial expression of surprise from a depth camera called Zcam
(www.3dvsystems.com). We tested only four basic expressions in this study: namely, anger,
joy, sadness, and surprise using the method presented in the previous section (Lee et al.,
2008b).
Fig. 10. Depth facial expression images of joy.
4. Experimental Results
Before testing the presented FER system, the system requires setting of two parameters:
namely the number of features and the size of codebook. In our experiments, we have tested
the eigenvectors in the range from 50 to 190 with the training data and have decided
empirically 120 as the optimal number of eigenvectors since it provided the best overall
recognition rate. As for the size of the codebook, we have tested the codebook size of 16, 32,
and 64, and then decided 32 as the optimal codebook size since it provided the best overall
recognition rate for the test data (Lee et al., 2008a).
BiomedicalEngineering640
4.1 Recognition via RGB Images
For recognition comparison between FICA and four other types of conventional feature
extraction methods including PCA, ICA, EICA, and PCA-LDA, all extraction methods
mentioned above were implemented with the same HMMs for recognition of facial
expressions. The results from each experiment in this work represent the best recognition
rate with the empirical settings of the selected number of features and the codebook size.
For the PCA case, we computed eigenvectors of all the dataset and selected 120 eigenvectors
to train the HMMs. As shown in Table 2, the recognition rate using the PCA method was
54.76%, the lowest recognition rate. Then, we employed ICA to extract the ICs from the
dataset. Since the ICA produces the same number of ICs as the number of original
dimensions of dataset, we empirically selected 120 ICs with the selection criterion of
kurtosis values for each IC for training the model. The result of ICA method in Table 3
shows the improved recognition rate than the result of PCA. We also compared the EICA
method. We first chose the proper dimension in the PCA step, and processed ICA from the
selected eigenvalues to extract the ECIA basis. The results are presented in Table 4, and the
total mean of recognition rate from EICA representation of facial expression images was
65.47% which is higher than the generic ICA and PCA recognition rates. Moreover, the best
conventional approach PCA-LDA was performed for the last comparison study and it
achieved the recognition rate of 82.72% as shown in Table 5. Using the settings above, we
conducted the experiment of FICA method implemented with HMMs, and it achieved the
total mean of recognition rate, 92.85% and expression labeled as surprise, happy, and sad
were recognized with the high accuracy from 93.75% to 100% as shown in Table 6.
Label Anger Joy Sadness Surprise Fear Disgust
Anger 30 0 20 0 10 40
Joy 4 48 8 8 28 4
Sad 0 6.06 81.82 12.12 0 0
Surprise 0 0 0 68.75 12.50 18.75
Fear 0 8.33 50 8.33 33.33 0
Disgust 0 8.33 25 0 0 66.67
Average 54.76
Table 2. Person independent confusion matrix using PCA (unit : %).
Label Anger Joy Sadness Surprise Fear Disgust
Anger 30 0 10 30 10 20
Joy 4 60 0 0 36 0
Sad 0 6.06 87.88 6.06 0 0
Surprise 0 0 12.50 81.25 0 6.25
Fear 0 25 25 8.33 33.33 8.33
Disgust 0 8.33 25 0 0 66.67
Average 59.86
Table 3. Person independent confusion matrix using ICA.
Label Anger Joy Sadness Surprise Fear Disgust
Anger 60 0 0 0 20 20
Joy 4 72 8 4 12 0
Sad 0 6.06 87.88 6.06 0 0
Surprise 0 0 12.50 81.25 0 6.25
Fear 0 16.67 16.67 8.33 50 8.33
Disgust 25 8.33 25 0 0 41.67
Average 65.47
Table 4. Person independent confusion matrix using EICA.
Label Anger Joy Sadness Surprise Fear Disgust
Anger 60 0 10 0 0 30
Joy 0 88 0 0 8 4
Sad 0 6.06 87.88 6.06 0 0
Surprise 0 0 0 93.75 6.25 0
Fear 0 8.33 8.33 8.33 75 0
Disgust 0 0 0 0 8.33 91.67
Average 82.72
Table 5. Person independent confusion matrix using PCA-LDA.
Label Anger Joy Sadness Surprise Fear Disgust
Anger 80 0 0 0 0 20
Joy 0 96 0 0 4 0
Sad 0 0 93.75 0 6.25 0
Surprise 0 0 0 100 0 0
Fear 0 8.33 0 0 91.67 0
Disgust 0 0 0 0 8.33 91.67
Average 92.85
Table 6. Person independent confusion matrix using FICA.
As mentioned above, the conventional feature extraction based FER system produced lower
recognition rate than the recognition rate of our method, 92.85%. Fig. 11 shows the summary
of recognition rate of the conventional compared against our FICA-based method.
4.2 Recognition via Depth Images
A total of 99 sequences were used with 8 images in each sequence, displaying the frontal
view of the faces. A total of 15 sequences for each expression were used in training, and for
the testing purpose, 10 of anger, 10 of joy, 8 of surprise, and 11 of sadness subsets were used.
We empirically selected 60 eigenvectors for dimension reduction, and test the performance
with the codebook size of 32. On the data set of RGB and depth facial expressions of the
HumanFacialExpressionRecognition
UsingFisherIndependentComponentAnalysisandHiddenMarkovModel 641
4.1 Recognition via RGB Images
For recognition comparison between FICA and four other types of conventional feature
extraction methods including PCA, ICA, EICA, and PCA-LDA, all extraction methods
mentioned above were implemented with the same HMMs for recognition of facial
expressions. The results from each experiment in this work represent the best recognition
rate with the empirical settings of the selected number of features and the codebook size.
For the PCA case, we computed eigenvectors of all the dataset and selected 120 eigenvectors
to train the HMMs. As shown in Table 2, the recognition rate using the PCA method was
54.76%, the lowest recognition rate. Then, we employed ICA to extract the ICs from the
dataset. Since the ICA produces the same number of ICs as the number of original
dimensions of dataset, we empirically selected 120 ICs with the selection criterion of
kurtosis values for each IC for training the model. The result of ICA method in Table 3
shows the improved recognition rate than the result of PCA. We also compared the EICA
method. We first chose the proper dimension in the PCA step, and processed ICA from the
selected eigenvalues to extract the ECIA basis. The results are presented in Table 4, and the
total mean of recognition rate from EICA representation of facial expression images was
65.47% which is higher than the generic ICA and PCA recognition rates. Moreover, the best
conventional approach PCA-LDA was performed for the last comparison study and it
achieved the recognition rate of 82.72% as shown in Table 5. Using the settings above, we
conducted the experiment of FICA method implemented with HMMs, and it achieved the
total mean of recognition rate, 92.85% and expression labeled as surprise, happy, and sad
were recognized with the high accuracy from 93.75% to 100% as shown in Table 6.
Label Anger Joy Sadness Surprise Fear Disgust
Anger 30 0 20 0 10 40
Joy 4 48 8 8 28 4
Sad 0 6.06 81.82 12.12 0 0
Surprise 0 0 0 68.75 12.50 18.75
Fear 0 8.33 50 8.33 33.33 0
Disgust 0 8.33 25 0 0 66.67
Average 54.76
Table 2. Person independent confusion matrix using PCA (unit : %).
Label Anger Joy Sadness Surprise Fear Disgust
Anger 30 0 10 30 10 20
Joy 4 60 0 0 36 0
Sad 0 6.06 87.88 6.06 0 0
Surprise 0 0 12.50 81.25 0 6.25
Fear 0 25 25 8.33 33.33 8.33
Disgust 0 8.33 25 0 0 66.67
Average 59.86
Table 3. Person independent confusion matrix using ICA.
Label Anger Joy Sadness Surprise Fear Disgust
Anger 60 0 0 0 20 20
Joy 4 72 8 4 12 0
Sad 0 6.06 87.88 6.06 0 0
Surprise 0 0 12.50 81.25 0 6.25
Fear 0 16.67 16.67 8.33 50 8.33
Disgust 25 8.33 25 0 0 41.67
Average 65.47
Table 4. Person independent confusion matrix using EICA.
Label Anger Joy Sadness Surprise Fear Disgust
Anger 60 0 10 0 0 30
Joy 0 88 0 0 8 4
Sad 0 6.06 87.88 6.06 0 0
Surprise 0 0 0 93.75 6.25 0
Fear 0 8.33 8.33 8.33 75 0
Disgust 0 0 0 0 8.33 91.67
Average 82.72
Table 5. Person independent confusion matrix using PCA-LDA.
Label Anger Joy Sadness Surprise Fear Disgust
Anger 80 0 0 0 0 20
Joy 0 96 0 0 4 0
Sad 0 0 93.75 0 6.25 0
Surprise 0 0 0 100 0 0
Fear 0 8.33 0 0 91.67 0
Disgust 0 0 0 0 8.33 91.67
Average 92.85
Table 6. Person independent confusion matrix using FICA.
As mentioned above, the conventional feature extraction based FER system produced lower
recognition rate than the recognition rate of our method, 92.85%. Fig. 11 shows the summary
of recognition rate of the conventional compared against our FICA-based method.
4.2 Recognition via Depth Images
A total of 99 sequences were used with 8 images in each sequence, displaying the frontal
view of the faces. A total of 15 sequences for each expression were used in training, and for
the testing purpose, 10 of anger, 10 of joy, 8 of surprise, and 11 of sadness subsets were used.
We empirically selected 60 eigenvectors for dimension reduction, and test the performance
with the codebook size of 32. On the data set of RGB and depth facial expressions of the
BiomedicalEngineering642
same face, we applied our presented system to compare the FER performance. Table 7 and 8
show the recognition results for each case. More details are given in Lee at al. (2008b).
Fig. 11. Recognition rate of facial expressions using the conventional feature extraction
methods and the presented FICA feature extraction method.
Label Anger Joy Sadness Surprise
Anger 100 0 20 0
Joy 10 90 0 0
Sadness 9.09 9.09 81.82 0
Surprise 12.5 12.5 0 75
Average 86.5
Table 7. Person independent confusion matrix using the sequential RGB images (unit :%).
Label Anger Joy Sadness Surprise
Anger 100 0 0 0
Joy 0 100 0 0
Sadness 0 0 100 0
Surprise 0 0 0 100
Average 100
Table 8. Person independent confusion matrix using the sequential depth images.
5. Conclusion
In this work, we have presented a novel FER system utilizing FICA for facial expression
feature extraction and HMM for recognition. Especially in the framework of FICA and
HMM, the sequential spatiotemporal feature information from holistic facial expressions is
modeled and used for FER. The performance of our presented method has been investigated
on sequential datasets of six facial expressions. The result shows that FICA can extract
optimal features which are well utilized in HMM, outperforming all other conventional
feature extraction methods. We have also applied the presented system to 3-D depth facial
expression images and showed its improved performance. We believe that our presented
FER system should be useful toward real-time recognition of facial expressions which could
be also useful in many other applications of HCI.
6. Acknowledgement
This research was supported by the MKE (Ministry of Knowledge Economy), Korea, under
the ITRC (Information Technology Research Center) support program supervised by the
IITA (Institute of Information Technology Advancement) (IITA-2009-(C1090-0902-0002)).
7. Reference
Aleksic, P. S. & Katsaggelos, A. K. (2006). Automatic facial expression recognition using
facial animation parameters and multistream HMMs, IEEE trans. Information and
Security, Vol. 1, Nol. 1, pp. 3-11, ISSN. 1556-6013
Bartlett, M. S.; Donato, G. ; Movellan, J. R.; Hager, J. C.; Ekman, P. & Sejnowski, T. J. (1999).
Face Image Analysis for Expression Measurement and Detection of Deceit,
Proceedings of the 6th Joint Symposium on Neural Computation, pp. 8-15
Bartlett, M. S.; Movellan, J. R. & Sejnowski, T. J. (2002). Face Recognition by Independent
Component Analysis, IEEE trans. Neural Networks, Vol. 13, No. 6, pp. 1450-1464,
ISSN. 1045-9227
Buciu, I.; Kotropoulos, C. & Pitas, I. (2003). ICA and Gabor Representation for Facial
Expression Recognition, Proceedings of the IEEE, pp. 855-858
Calder, A. J.; Young, A. J.; Keane, J. & Dean, M. (2000). Configural information in facial
expression perception, Journal of Experimental psychology, Human Perception and
Performance. Human perception and performance, Vol. 26, No. 2, pp. 527-551
Calder, A. J.; Burton, A. M.; Miller, P.; Young, A. W. & Akamatsu, S. (2001). A principal
component analysis of facial expressions, Vision Research, Vol.41, pp. 1179-1208
Chen, F. & Kotani, K. (2008). Facial Expression Recognition by Supervised Independent
Component Analysis Using MAP Estimation, IEICE trans. INF. & SYST., Vol. E91-
D, No. 2, pp. 341-350, ISSN. 0916-8532
Chuang, C F. & Shih, F. Y. (2006). Recognizing Facial Action Units Using Independent
Component Analysis and Support Vector Machine, Pattern Recognition, Vol. 39,
No. 9, pp. 1795-1798, ISSN. 0031-3203
Cohen, I.; Sebe, N.; Garg, A; Chen, L. S. & Huang, T. S. (2003). Facial expression recognition
from video sequences: temporal and static modeling, Computer Vision and Image
Understanding, Vol. 91, ISSN. 1077-3142
Cohn, J. F.; Zlochower, A.; Lien, J. & Kanade, T. (1999). Automated face analysis by feature
point tracking has high concurrent validity with manual FACS coding, pp. 35-43,
Psychophysiology, Cambridge University Press
Danato, G.; Bartlett, M. S.; Hagar, J. C.; Ekman, P. & Sejnowski, T. J. (1999). Classifying Facial
Actions, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21(10), pp.
974-989
Dubuisson, S.; Davoine, F. & Masson, M. (2002). A solution for facial expression
representation and recognition, Signal Processing: Image Communication, Vol. 17, pp.
657-673
HumanFacialExpressionRecognition
UsingFisherIndependentComponentAnalysisandHiddenMarkovModel 643
same face, we applied our presented system to compare the FER performance. Table 7 and 8
show the recognition results for each case. More details are given in Lee at al. (2008b).
Fig. 11. Recognition rate of facial expressions using the conventional feature extraction
methods and the presented FICA feature extraction method.
Label Anger Joy Sadness Surprise
Anger 100 0 20 0
Joy 10 90 0 0
Sadness 9.09 9.09 81.82 0
Surprise 12.5 12.5 0 75
Average 86.5
Table 7. Person independent confusion matrix using the sequential RGB images (unit :%).
Label Anger Joy Sadness Surprise
Anger 100 0 0 0
Joy 0 100 0 0
Sadness 0 0 100 0
Surprise 0 0 0 100
Average 100
Table 8. Person independent confusion matrix using the sequential depth images.
5. Conclusion
In this work, we have presented a novel FER system utilizing FICA for facial expression
feature extraction and HMM for recognition. Especially in the framework of FICA and
HMM, the sequential spatiotemporal feature information from holistic facial expressions is
modeled and used for FER. The performance of our presented method has been investigated
on sequential datasets of six facial expressions. The result shows that FICA can extract
optimal features which are well utilized in HMM, outperforming all other conventional
feature extraction methods. We have also applied the presented system to 3-D depth facial
expression images and showed its improved performance. We believe that our presented
FER system should be useful toward real-time recognition of facial expressions which could
be also useful in many other applications of HCI.
6. Acknowledgement
This research was supported by the MKE (Ministry of Knowledge Economy), Korea, under
the ITRC (Information Technology Research Center) support program supervised by the
IITA (Institute of Information Technology Advancement) (IITA-2009-(C1090-0902-0002)).
7. Reference
Aleksic, P. S. & Katsaggelos, A. K. (2006). Automatic facial expression recognition using
facial animation parameters and multistream HMMs, IEEE trans. Information and
Security, Vol. 1, Nol. 1, pp. 3-11, ISSN. 1556-6013
Bartlett, M. S.; Donato, G. ; Movellan, J. R.; Hager, J. C.; Ekman, P. & Sejnowski, T. J. (1999).
Face Image Analysis for Expression Measurement and Detection of Deceit,
Proceedings of the 6th Joint Symposium on Neural Computation, pp. 8-15
Bartlett, M. S.; Movellan, J. R. & Sejnowski, T. J. (2002). Face Recognition by Independent
Component Analysis, IEEE trans. Neural Networks, Vol. 13, No. 6, pp. 1450-1464,
ISSN. 1045-9227
Buciu, I.; Kotropoulos, C. & Pitas, I. (2003). ICA and Gabor Representation for Facial
Expression Recognition, Proceedings of the IEEE, pp. 855-858
Calder, A. J.; Young, A. J.; Keane, J. & Dean, M. (2000). Configural information in facial
expression perception, Journal of Experimental psychology, Human Perception and
Performance. Human perception and performance, Vol. 26, No. 2, pp. 527-551
Calder, A. J.; Burton, A. M.; Miller, P.; Young, A. W. & Akamatsu, S. (2001). A principal
component analysis of facial expressions, Vision Research, Vol.41, pp. 1179-1208
Chen, F. & Kotani, K. (2008). Facial Expression Recognition by Supervised Independent
Component Analysis Using MAP Estimation, IEICE trans. INF. & SYST., Vol. E91-
D, No. 2, pp. 341-350, ISSN. 0916-8532
Chuang, C F. & Shih, F. Y. (2006). Recognizing Facial Action Units Using Independent
Component Analysis and Support Vector Machine, Pattern Recognition, Vol. 39,
No. 9, pp. 1795-1798, ISSN. 0031-3203
Cohen, I.; Sebe, N.; Garg, A; Chen, L. S. & Huang, T. S. (2003). Facial expression recognition
from video sequences: temporal and static modeling, Computer Vision and Image
Understanding, Vol. 91, ISSN. 1077-3142
Cohn, J. F.; Zlochower, A.; Lien, J. & Kanade, T. (1999). Automated face analysis by feature
point tracking has high concurrent validity with manual FACS coding, pp. 35-43,
Psychophysiology, Cambridge University Press
Danato, G.; Bartlett, M. S.; Hagar, J. C.; Ekman, P. & Sejnowski, T. J. (1999). Classifying Facial
Actions, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21(10), pp.
974-989
Dubuisson, S.; Davoine, F. & Masson, M. (2002). A solution for facial expression
representation and recognition, Signal Processing: Image Communication, Vol. 17, pp.
657-673
BiomedicalEngineering644
Lee, J. J.; Uddin, M. D. & Kim, T S. (2008a). Spatiotemporal human facial expression
recognition using fisher independent component analysis and Hidden Markov
Model, Proceedings of the IEEE Int. Conf. Engineering in Medicine and Biology Society,
pp. 2546-2549
Lee, J. J.; Uddin, M. D.; Truc P. T. H. & Kim, T S. (2008b). Spatiotemporal Depth
Information-based Human Facial Expression Recognition Using FICA and HMM,
Int. Conf. Ubiquitous Healthcare, IEEE, Busan, Korea
Lyons, M.; Akamatsu, S.; Kamachi, M. & Gyoba, J. (1998). Coding facial expressions with
Gabor wavelets, Proceedings of the Third IEEE Int. Conf. Automatic Face and Gesture
Recognition, pp. 200-205
Rabiner, L. R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in
Speech Recognition, Proceedings of the IEEE, Vol. 77, No. 2, pp. 257-286
Linde, Y.; Buzo, A. & Gray, R. (1980). An Algorithm for Vector Quantizer Design, IEEE
Transaction on Communications, Vol. 28, No. 1, pp. 84–94, ISSN. 0090-6778
Liu, C. (2004). Enhanced independent component analysis and its application to content
based face image retrieval, IEEE trans. Systems, Man, and Cybernetics, Vol. 34, No. 2,
pp. 1117-1127
Karklin, Y. & Lewicki, M. S. (2003). Learning higher-order structures in natural images,
Netw. Comput. Neural Syst., Vol. 14, pp. 483-499
Kwak, K. C. & Pedrycz, W. (2007). Face recognition using an enhanced independent
component analysis approach, IEEE Trans. Neural Network, Vol. 18, pp. 530-541,
ISSN. 1045-9227
Kotsia, I. & Pitas, I. (2007). Facial expression recognition in image sequences using geometric
deformation features and support vector machine, IEEE trans. Image Processing, Vol.
16, pp. 172-187, ISSN. 1057-7149
Mitra, S. & Acharya, T. (2007). Gesture Recognition: A survey, IEEE Trans. Systems, Man, and
Cybernetics, Vol. 37, No. 3, pp. 331-324, ISSN. 1094-6977
Otsuka, T. & Ohya, J. (1997). Recognizing multiple person’s facial expressions using HMM
based on automatic extraction of significant frames from image sequences.
Proceedings of the IEEE Int. Conf. Image Processing, pp. 546-549
Padgett, C. & Cottrell, G. (1997). Representation face images for emotion classification,
Advances in Neural Information Processing Systems, vol. 9, Cambridge, MA, MIT Press
Tian, Y,-L.; Kanade, T. & Cohn, J. F. (2002). Evaluation of Gabor wavelet based facial action
unit recognition in image sequences of increasing complexity, Proceedings of the 5th
IEEE Int. Conf. Automatic Face and Gesture Recognition, pp. 229-234
Zhang, L. & Cottrell, G. W. (2004). When Holistic Processing is Not Enough: Local Features
Save the Day, Proceedings of the Twenty-sixth Annual Cognitive Science Society
Conference
Zhu, Y.; De Silva, L. C. & Ko, C. C. (2002). Using moment invariants and HMM in facial
expression recognition, Pattern Recognition Letters, Vol. 23, pp. 83-91, ISSN. 0167-
8655
RequirementsandsolutionsforadvancedTelemedicineapplications 645
RequirementsandsolutionsforadvancedTelemedicineapplications
GeorgeJ.Mandellos,GeorgeV.Koutelakis,TheodorC.Panagiotakopoulos,MichaelN.
KoukiasandDimitriosK.Lymberopoulos
X
Requirements and solutions for advanced
Telemedicine applications
George J. Mandellos, George V. Koutelakis, Theodor C. Panagiotakopoulos,
Michael N. Koukias and Dimitrios K. Lymberopoulos
Wire Communication Laboratory, Electrical & Computer Engineering Department,
University of Patras, GR 265 04 Panepistimioupoli - Rion
Greece
1. Introduction
Telemedicine, as the term means, is the provision of medicine and the exchange of heath-
care information at a distance. Moreover, it is the use of advanced telecommunication and
information technologies to transmit and exchange health information and provide health
care services across geographic, time, social and cultural barriers [Vikas Singh, 2006]. It
includes both, the clinical (diagnosis, treatment, medical records and prevention of disease)
and academic medicine (research, continuous education, knowledge exchange, evaluation
and training). In general, telemedicine is not a new technology, but term encompassing
diverse information and communication technologies aiming to offer health-related
activities wherever and whenever these are needed or requested.
Telemedicine is the only solution to several situations of healthcare provision. Two of the
major cases, where there is no alternative to telemedicine, are emergencies and lack of
expertised and/or experienced medical staff either in rural areas of developing countries or
across the majority of the health care network of underdeveloped countries.
Many published studies [C. Weston et al., 1994] have shown that an early and specialized
pre-hospital management of emergency cases contributes to the patient’s survival.
Especially in cases of serious injuries, spinal cord or internal organs trauma, the way that
injured persons are treated and transported as well as the time period of their transportation
is crucial. This is where telemedicine can contribute to an initial diagnosis, initial treatment
during the patient’s transportation and proper preparation at the hospital’s premises for the
patient’s in-hospital treatment. Furthermore, in rural areas and in some occasions at health
centres where primary care is provided, there is a lack either in medical staff or in specific
expertise of the existing medical staff. In such cases, telemedicine is of great importance, as
it eliminates the need for transportation, where in some cases such as islands cannot be easy.
In addition it offers cost savings and immediate treatment administered by the expertised
physician lying at a central hospital.
Despite the fact that telemedicine seems to be a necessity in several cases and has a huge
impact both in personal and social level, there are many obstacles that need to be overcome
so that an effective, efficient and cost effective telemedicine application is realized.
36
BiomedicalEngineering646
Generally, the problems telemedicine providers and consumers have to deal with are
summarized in three major categories, juridical, financial and technological. On the other
hand, telemedicine is not always easy to be implemented or supported, because a long list of
factors affects its evolution.
This chapter will mainly focus on the technological problems, aiming to provide efficient
and effective solutions. Taking into account the fact that telemedicine is realized by the
fusion of communication and information technologies, the provided solutions will involve
two directions. First, towards the communication part, which forms the platform for any
kind of e-health application and, second the information structuring structure offering
adaption to the medical treatment demands, each health related incident sets.
Before the comprehensive analysis of the problems and requirements of telemedicine, as
well as the technological solutions adopted by several telemedicine applications, we
consider that the presentation of some information, regarding general attributes of
telemedicine, is quite necessary. For this reason, the next section will report on the types of
telemedicine, the reasons why telemedicine is not only being implemented, but seems to be
the only solution in several occasions, the telemedicine applications and the players having
either a major or less important role in several levels, spanning from the design to the
market analysis of telemedicine systems and applications.
2. General attributes of telemedicine
2.1. Types of telemedicine
Telemedicine sessions can be distinguished according to the interaction taking place
between the clients and the expert and the type of the exchanged information.
The two types of interaction are the Real-time (synchronous) and the Store and Forward
(asynchronous).
In Real-time telemedicine, a synchronous interaction between providers/patients/
healthcare professionals at distant locations is established, using some kind of
communication technology providing audio/visual/data exchange and wireless or
microwave signals. This type of telemedicine often renders increased accuracy and offers
better assessment of the overall patient’s health condition resulting, among other things, in
more satisfied patients. Keep in mind that patients’ satisfaction is a key factor so that
telemedicine will be integrated in our everyday life, as stated in many researches existing in
the literature. The main disadvantage is that the parties involved must be scheduled,
because in the real-time telemedicine usually two healthcare providers are involved, so they
both need to be available at the same time. In Real-time telemedicine, apart from video-
conferencing, peripheral sensing devices (biosignal measurement devices) can also be
attached to the patient or to the equipment, in order to offer the ability of remote interactive
examination. This type of telemedicine is used most often in accidents, psychiatry, internal
medicine, rehabilitation, cardiology, paediatrics, obstetrics and gynaecology as well as
neurology.
The Store and Forward or prerecorded (asynchronous) technology involves the acquisition
of medical data (images, bio-signals) and its transmission by the referrer to a medical expert
for consultation, evaluation or other purposes, at a convenient time. This type of
telemedicine does not require simultaneous communication between the referrer and the
healthcare professional in real time. E-mail is a common example. The diagnostic accuracy is
lower related to the real-time telemedicine, because the expert evaluates the data with a
delay and he doesn’t interact with the patient while care is provided, but it has advantages
considering hardware and software complexity, cost and convenience. This type of
telemedicine is often used in radiology, pathology and dermatology.
The type of information exchanged between the parties during a telemedicine session can be
comprised of data, audio, video or a combination of them. Data includes patient’s
demographic information, biosignal measurements acquired through sensors connected to
the patient and peripheral devices, etc. Audio includes the conversation (voice signals)
between the two parties. Video includes still images and/or video pictures concerning the
medical incident.
2.2. Necessity of telemedicine
A significant percentage of emergency cases are due to car crash accidents and coronary
artery diseases. Statistics for car accidents in USA and Europe prove that many thousands of
people loose their lives and many more drivers or passengers are severely injured. Studies
performed in Greece, a country with a very high death rate due to car crashes, proved that
most of the fatal injuries in accidents happened far away from any competent healthcare
institution, thus resulting in long response times [Mandellos et al., 2004]. The long response
time (ambulance arrival, transportation time, evaluation time and treatment initiation)
[Mandellos et al., 2004] leads a significant percentage of victims caused by accidents in rural
roads to die on the scene, or during the transportation [A. G. Heriot et al., 1993]. Some of the
above cases had a 50% chance of survival, if adequate pre-hospital care existed.
Heart disease is another common example of high death rates in emergency cases, since two
thirds of all patients die before reaching the central hospital [T. Evans, 1998]. The delay in
administering the appropriate therapy [T. Kereiakes et al., 1990] comes either from the
patient’s failure to recognize the seriousness of his symptoms and seek emergency care, or
the needed pre-hospital evaluation and transport time, or the time required for diagnosis
and initiation of treatment in the hospital.
The above show the great necessity of telemedicine in emergencies. The emergency cases
comprise a major case, among many others, where telemedicine can play an extremely vital
role. Airplanes, ships, rural areas and disaster areas constitute some other important cases
where telemedicine appears to be either the only or the most efficient and effective solution.
Moreover, the use of telemedicine offers several benefits in the traditional healthcare
network as it:
Increases the accessibility of and to professional caregivers
Increases the quality, efficiency and continuity of healthcare to patients
Increases the focus on preventive medicine through early intervention
Reduces the overall cost of healthcare both for care providers and consumers
Reduces the unnecessary transfer of patients to regional hospitals
Enables the education and training for both medical stuff and citizens, and
improves the medical knowledge
Provides services to remote areas in case of natural calamities, disasters and
military and space operations.
Enables the patient’s remote monitoring
Reduces the time needed for diagnosis extraction and patient treatment.
Leads to a rapid response time in pre-hospital actions.
RequirementsandsolutionsforadvancedTelemedicineapplications 647
Generally, the problems telemedicine providers and consumers have to deal with are
summarized in three major categories, juridical, financial and technological. On the other
hand, telemedicine is not always easy to be implemented or supported, because a long list of
factors affects its evolution.
This chapter will mainly focus on the technological problems, aiming to provide efficient
and effective solutions. Taking into account the fact that telemedicine is realized by the
fusion of communication and information technologies, the provided solutions will involve
two directions. First, towards the communication part, which forms the platform for any
kind of e-health application and, second the information structuring structure offering
adaption to the medical treatment demands, each health related incident sets.
Before the comprehensive analysis of the problems and requirements of telemedicine, as
well as the technological solutions adopted by several telemedicine applications, we
consider that the presentation of some information, regarding general attributes of
telemedicine, is quite necessary. For this reason, the next section will report on the types of
telemedicine, the reasons why telemedicine is not only being implemented, but seems to be
the only solution in several occasions, the telemedicine applications and the players having
either a major or less important role in several levels, spanning from the design to the
market analysis of telemedicine systems and applications.
2. General attributes of telemedicine
2.1. Types of telemedicine
Telemedicine sessions can be distinguished according to the interaction taking place
between the clients and the expert and the type of the exchanged information.
The two types of interaction are the Real-time (synchronous) and the Store and Forward
(asynchronous).
In Real-time telemedicine, a synchronous interaction between providers/patients/
healthcare professionals at distant locations is established, using some kind of
communication technology providing audio/visual/data exchange and wireless or
microwave signals. This type of telemedicine often renders increased accuracy and offers
better assessment of the overall patient’s health condition resulting, among other things, in
more satisfied patients. Keep in mind that patients’ satisfaction is a key factor so that
telemedicine will be integrated in our everyday life, as stated in many researches existing in
the literature. The main disadvantage is that the parties involved must be scheduled,
because in the real-time telemedicine usually two healthcare providers are involved, so they
both need to be available at the same time. In Real-time telemedicine, apart from video-
conferencing, peripheral sensing devices (biosignal measurement devices) can also be
attached to the patient or to the equipment, in order to offer the ability of remote interactive
examination. This type of telemedicine is used most often in accidents, psychiatry, internal
medicine, rehabilitation, cardiology, paediatrics, obstetrics and gynaecology as well as
neurology.
The Store and Forward or prerecorded (asynchronous) technology involves the acquisition
of medical data (images, bio-signals) and its transmission by the referrer to a medical expert
for consultation, evaluation or other purposes, at a convenient time. This type of
telemedicine does not require simultaneous communication between the referrer and the
healthcare professional in real time. E-mail is a common example. The diagnostic accuracy is
lower related to the real-time telemedicine, because the expert evaluates the data with a
delay and he doesn’t interact with the patient while care is provided, but it has advantages
considering hardware and software complexity, cost and convenience. This type of
telemedicine is often used in radiology, pathology and dermatology.
The type of information exchanged between the parties during a telemedicine session can be
comprised of data, audio, video or a combination of them. Data includes patient’s
demographic information, biosignal measurements acquired through sensors connected to
the patient and peripheral devices, etc. Audio includes the conversation (voice signals)
between the two parties. Video includes still images and/or video pictures concerning the
medical incident.
2.2. Necessity of telemedicine
A significant percentage of emergency cases are due to car crash accidents and coronary
artery diseases. Statistics for car accidents in USA and Europe prove that many thousands of
people loose their lives and many more drivers or passengers are severely injured. Studies
performed in Greece, a country with a very high death rate due to car crashes, proved that
most of the fatal injuries in accidents happened far away from any competent healthcare
institution, thus resulting in long response times [Mandellos et al., 2004]. The long response
time (ambulance arrival, transportation time, evaluation time and treatment initiation)
[Mandellos et al., 2004] leads a significant percentage of victims caused by accidents in rural
roads to die on the scene, or during the transportation [A. G. Heriot et al., 1993]. Some of the
above cases had a 50% chance of survival, if adequate pre-hospital care existed.
Heart disease is another common example of high death rates in emergency cases, since two
thirds of all patients die before reaching the central hospital [T. Evans, 1998]. The delay in
administering the appropriate therapy [T. Kereiakes et al., 1990] comes either from the
patient’s failure to recognize the seriousness of his symptoms and seek emergency care, or
the needed pre-hospital evaluation and transport time, or the time required for diagnosis
and initiation of treatment in the hospital.
The above show the great necessity of telemedicine in emergencies. The emergency cases
comprise a major case, among many others, where telemedicine can play an extremely vital
role. Airplanes, ships, rural areas and disaster areas constitute some other important cases
where telemedicine appears to be either the only or the most efficient and effective solution.
Moreover, the use of telemedicine offers several benefits in the traditional healthcare
network as it:
Increases the accessibility of and to professional caregivers
Increases the quality, efficiency and continuity of healthcare to patients
Increases the focus on preventive medicine through early intervention
Reduces the overall cost of healthcare both for care providers and consumers
Reduces the unnecessary transfer of patients to regional hospitals
Enables the education and training for both medical stuff and citizens, and
improves the medical knowledge
Provides services to remote areas in case of natural calamities, disasters and
military and space operations.
Enables the patient’s remote monitoring
Reduces the time needed for diagnosis extraction and patient treatment.
Leads to a rapid response time in pre-hospital actions.
BiomedicalEngineering648
2.3. Telemedicine applications
Because of the above benefits telemedicine has, telemedicine is utilized for providing
various services that spawns numerous specialties and can be broadly categorized as home-
based care, telepsychiatry, teleradiology, general telemedicine, telecardiology,
teleconsultation, telemedicine in disaster areas teledermatology, ambulatory and emergency
care, telepathology, self-care teledentistry and telesurgery.
In order to be applied to the above fields, telemedicine applications use the same basic
components, e.g. capturing infrastructure, communication media, processing equipment to
display, process and manage the information and deliver feedback.
The used infrastructure to capture the necessary information includes a Biosignal
Acquisition Module, for biosignal acquisition through sensors connected to the patient and
peripheral devices, and in some cases a Digital Camera, for patient’s digital image or video
capturing. The processing module such as a Personal Computer or Personal Digital
Assistant from the patient’s side integrates the acquired data (biosignals, demographics,
videos, geolocation, etc.) in order to be transmitted. On the expert’s side it displays the
received data and gives the necessary tools to doctor in order to evaluate them. The
communication medium connecting the telemedicine parties makes use of various
communication technologies such as POTS networks (PSTN or ISDN), GSM cellular phone
networks, 2g, 2.5g and 3g networks, Bluetooth and RF technologies, Satellite Systems, LAN,
WLAN, WiMAX, Home/Personal/Body Area Networks, Mobile ad-hoc networks or
MANET’s.
Also, optional equipment (GPS receiver, microscope, etc) can be used so that various
incidents can be managed through telemedicine.
2.4. The players involved
Telemedicine is a very complicated scheme. The implementation of a telemedicine
application requires the cooperation, among different type of players. There are directly and
indirectly involved players who can be categorized in the following categories:
Healthcare consumers (especially patients)
Health professionals (experts, family doctors, nurses, paramedic, obstetrician, etc.)
Other professionals involved in the wider area of healthcare (directors, researchers,
epidemics, technicians, IT engineers, statistician, etc.)
Hospitals and health centers
Communication companies
Complementary (non-medical) services suppliers
Infrastructure manufacturers and suppliers
Hardware and network manufacturers
Universities and research institutes
Insurance companies
Pharmaceutical companies
Health ministry
Health management organizations
Organizations for standardizing and licensing.
The above list shows one of the major intrinsic difficulties in establishing a telemedicine
application, as it needs a huge effort put on management, administration and cooperation
among different types of participants.
3. Potential problems and requirements
Telemedicine is not always easy to be implemented or supported. Our long experience as a
telemedicine R&D group of University of Patras, during design, development and
implementation of several projects since 1992, and the cooperation with the players involved
demonstrated a long list of factors affecting the evolution of the telemedicine. Generally, we
can divide them in three main categories, juridical, financial and technological. Although
these categories do not have clear border against each other, they can be broadly classified
into the following groups:
3.1. Juridical problems
Medical malpractice liability. Medical Malpractice Liability is a big case for healthcare
providers with iniquitous costs because of penalties. The nature of telemedicine, the
communication network interruptions, and the errors caused by hardware failures during
the transportation makes it vulnerable to malpractice lawsuits. In many countries, there is a
significant uncertainty regarding whether malpractice insurance policies cover services
provided by telemedicine. Telemedicine networks that cross border lines for example in
countries line US, create additional uncertainties regarding the state where a malpractice
lawsuit may be litigated and the law that will be used. Is the law to be considered in the
state of the provider, the patient, or in another state which covers the network?
Absence of legislative regulations on attribution of responsibilities. In some countries like Greece
there aren’t legislative regulations to define the telemedicine service operation. A major
barrier in this case is who is the responsible on the telemedicine incident evaluation, the
local paramedic or the remote expert? Another barrier is the financial of the people involved
in a telemedicine session.
Absence of Licensing and Credentialing. It is a consequence of the above absence of a suitable
law for telemedicine services.
Absence of rights. Usually, the ambulance staffs have not the required advanced theoretical
knowledge and experience to handle emergency situations. Moreover, it is not certified to
provide medical care without a medical expert’s advice. On the other hand, expert
physicians such as cardiologists, neurosurgeons, orthopaedics cannot participate in
ambulance staff for financial aspects or practical reasons.
3.2. Financial problems
Cost and Reimbursement for Telemedicine Services. Often telemedicine services are established
during research projects. Most of the telemedicine projects around the world are still funded
by state or federal grants. Although the communication costs and the cost of the basic
equipment generally are falling, many public and private payers are reluctant to set
standard for payment or reimbursement because of the uncertainty inherent in telemedicine
because of its evolving nature and lack of conclusive evidence of its effectiveness and range
of applications.
RequirementsandsolutionsforadvancedTelemedicineapplications 649
2.3. Telemedicine applications
Because of the above benefits telemedicine has, telemedicine is utilized for providing
various services that spawns numerous specialties and can be broadly categorized as home-
based care, telepsychiatry, teleradiology, general telemedicine, telecardiology,
teleconsultation, telemedicine in disaster areas teledermatology, ambulatory and emergency
care, telepathology, self-care teledentistry and telesurgery.
In order to be applied to the above fields, telemedicine applications use the same basic
components, e.g. capturing infrastructure, communication media, processing equipment to
display, process and manage the information and deliver feedback.
The used infrastructure to capture the necessary information includes a Biosignal
Acquisition Module, for biosignal acquisition through sensors connected to the patient and
peripheral devices, and in some cases a Digital Camera, for patient’s digital image or video
capturing. The processing module such as a Personal Computer or Personal Digital
Assistant from the patient’s side integrates the acquired data (biosignals, demographics,
videos, geolocation, etc.) in order to be transmitted. On the expert’s side it displays the
received data and gives the necessary tools to doctor in order to evaluate them. The
communication medium connecting the telemedicine parties makes use of various
communication technologies such as POTS networks (PSTN or ISDN), GSM cellular phone
networks, 2g, 2.5g and 3g networks, Bluetooth and RF technologies, Satellite Systems, LAN,
WLAN, WiMAX, Home/Personal/Body Area Networks, Mobile ad-hoc networks or
MANET’s.
Also, optional equipment (GPS receiver, microscope, etc) can be used so that various
incidents can be managed through telemedicine.
2.4. The players involved
Telemedicine is a very complicated scheme. The implementation of a telemedicine
application requires the cooperation, among different type of players. There are directly and
indirectly involved players who can be categorized in the following categories:
Healthcare consumers (especially patients)
Health professionals (experts, family doctors, nurses, paramedic, obstetrician, etc.)
Other professionals involved in the wider area of healthcare (directors, researchers,
epidemics, technicians, IT engineers, statistician, etc.)
Hospitals and health centers
Communication companies
Complementary (non-medical) services suppliers
Infrastructure manufacturers and suppliers
Hardware and network manufacturers
Universities and research institutes
Insurance companies
Pharmaceutical companies
Health ministry
Health management organizations
Organizations for standardizing and licensing.
The above list shows one of the major intrinsic difficulties in establishing a telemedicine
application, as it needs a huge effort put on management, administration and cooperation
among different types of participants.
3. Potential problems and requirements
Telemedicine is not always easy to be implemented or supported. Our long experience as a
telemedicine R&D group of University of Patras, during design, development and
implementation of several projects since 1992, and the cooperation with the players involved
demonstrated a long list of factors affecting the evolution of the telemedicine. Generally, we
can divide them in three main categories, juridical, financial and technological. Although
these categories do not have clear border against each other, they can be broadly classified
into the following groups:
3.1. Juridical problems
Medical malpractice liability
. Medical Malpractice Liability is a big case for healthcare
providers with iniquitous costs because of penalties. The nature of telemedicine, the
communication network interruptions, and the errors caused by hardware failures during
the transportation makes it vulnerable to malpractice lawsuits. In many countries, there is a
significant uncertainty regarding whether malpractice insurance policies cover services
provided by telemedicine. Telemedicine networks that cross border lines for example in
countries line US, create additional uncertainties regarding the state where a malpractice
lawsuit may be litigated and the law that will be used. Is the law to be considered in the
state of the provider, the patient, or in another state which covers the network?
Absence of legislative regulations on attribution of responsibilities.
In some countries like Greece
there aren’t legislative regulations to define the telemedicine service operation. A major
barrier in this case is who is the responsible on the telemedicine incident evaluation, the
local paramedic or the remote expert? Another barrier is the financial of the people involved
in a telemedicine session.
Absence of Licensing and Credentialing
. It is a consequence of the above absence of a suitable
law for telemedicine services.
Absence of rights
. Usually, the ambulance staffs have not the required advanced theoretical
knowledge and experience to handle emergency situations. Moreover, it is not certified to
provide medical care without a medical expert’s advice. On the other hand, expert
physicians such as cardiologists, neurosurgeons, orthopaedics cannot participate in
ambulance staff for financial aspects or practical reasons.
3.2. Financial problems
Cost and Reimbursement for Telemedicine Services
. Often telemedicine services are established
during research projects. Most of the telemedicine projects around the world are still funded
by state or federal grants. Although the communication costs and the cost of the basic
equipment generally are falling, many public and private payers are reluctant to set
standard for payment or reimbursement because of the uncertainty inherent in telemedicine
because of its evolving nature and lack of conclusive evidence of its effectiveness and range
of applications.
BiomedicalEngineering650
Telecommunication regulations – limitations.
The limited competition for telecommunication
services in some areas caused by country regulations leads to a significantly decreased
number of available network types. So the absence of competition keeps the communication
cost high.
Equipment costs.
The majority of the already existing hardware and software components
designed for telemedicine purposes have a relatively high cost which most of the people
cannot afford.
3.3. Technological problems
Closed Systems
. There is a significant number of telemedicine services used by healthcare
providers. Many of them are provided by closed systems without offering interoperability
with other medical systems (such as Electronic Health Record Systems).
Wide range of telemedicine
. Telemedicine lies across a wide range of medical activities
resulting in redundancies, lack of interoperability, cost sharing and incompatibilities
between systems.
Communication Networks
. The limited competition for telecommunication services in rural
areas and islands leads to a significantly decreased number of available network types (such
as GSM, 3G, etc.) in the above areas. Also, the available signal strength in those areas is
rather limited and the frequent interruptions or even the signal absence in some areas
(caused by the geomorphology) during a telemedicine session render telemedicine
unavailable.
User refusal against new technologies.
People do not want to involve any new technologies in
their life, especially when they are constrained by thirds (e.g. profession guidelines). So they
are negative in the introduction of new equipment and claim legislative and financial
reasons in order to abstain.
Telemedicine resources.
In order a telemedicine system to be able to serve a wide area of
medical cases, it needs various interconnected medical devices. The barrier here is whether
this equipment is sufficient to meet clinical standards and whether its usage inhibits
healthcare professionals from using it. Some times the quality and resolution of
radiographic images transmitted doesn’t meet the standards. Another factor is the lack of
know how in order to make a proper use of the equipment’s features. Telemedicine, in most
occasions, requires expensive infrastructure and highly complex setup which is a limiting
factor for small providers and physician clinics or health centres established in small islands.
Information structuring.
With respect to the information structuring we highlight the
following:
Variability of incidents served by telemedicine. A telemedicine system can be addressed to serve
various incident types (patients suffering from heart diseases, pneumonics, several injuries
etc.). Every single incident requires a different collection of diverse vital signs or relative
information in order to determine the condition of the transported patient.
The absence of a protocol able to handle all the selected data in order to allow the interoperability
between Medical Information Systems. There are many protocols handling an information
subtotal.
The capability of scalable user access depending on his type/specialty. Telemedicine systems as
they are applied in a wide area of activities have to introduce control processes, in order to
keep the information confidentiality.
4. Methods for structuring, processing, archiving and transmitting the
medical information
The successful design, implementation and utilization of a telemedicine system needs the
resolution of several problems listed in the previous section. Resolving the aforementioned
problems is often a very difficult task demanding the cooperation of various social groups.
Being engineers we will focus on providing solutions to the technological problems not
addressing the other two groups (i.e. juridical and financial)
In order to overcome some of the technological limitations, we adopt solutions that will be
presented in the following paragraphs. These solutions have been succesfully implemented
in a telemedicine system [Mandellos et al., 2008 (a)] resulted by a project tuned by the Greek
Information Society and the Greek Ministry of Defense. This telemedicine system (fig. 1)
supports diverse types of endpoints including moving transports (MT) (such as ambulances,
ships, planes, etc.), handheld devices and fixed units, using diverse communication
networks. The above telemedicine system targets to the improved pre-hospital patient
treatment. Although vital sign transmission has the priority against other services provided
by the telemedicine system (videoconference, remote management, voice calls etc.), a
predefined algorithm controls provision, switching and QoS of the other potential services.
A distributed database system controlled by a central server, aims to manage patient
attributes, exams and incidents handled by several Telemedicine Coordination Centers
(TCC). Doctors and other mecical personnel are able to participate for observation purposes,
during the incident evaluation through workstations in TCCs or through Regional
Teleconference Rooms (RTRs).
4.1 Bandwidth limitations lead to the adoption of an adaptive protocol
In many medical incidents, the correct diagnosis depends on the amount and type of data
relative to patient’s health state [Mandellos et al., 2008 (a)]. In some cases the referring
doctor needs to know more data for the patient’s state from those provided by the acquired
measurements. Such data are patient’s demographic data, mnemonic and medical history,
patient’s allergy state, patient’s ancestry, etc. These data affect the correct treatment (e.g.
pharmaceutical allergy precludes some drugs) or the patient’s handling (e.g. blood
transfusion is prohibited by some religions; patient must be treated separately from other
patients if he suffers from infectious diseases or has sensitive immunizer system). The
doctor, takes different measurements on patient’s Vital Signs (for cardiac diseases he takes
an ElectroCardioGram - ECG, for pulmonary diseases he takes measurements of blood
oxygen saturation - SPO
2
while on other cases he takes body’s temperature (TEMP), non
invasive blood pressure (NiBP), etc) depending on patient’s symptoms. Consequently, the
total amount of data collected during an incident could have diverse length type compared
to another incident.
RequirementsandsolutionsforadvancedTelemedicineapplications 651
Telecommunication regulations – limitations. The limited competition for telecommunication
services in some areas caused by country regulations leads to a significantly decreased
number of available network types. So the absence of competition keeps the communication
cost high.
Equipment costs. The majority of the already existing hardware and software components
designed for telemedicine purposes have a relatively high cost which most of the people
cannot afford.
3.3. Technological problems
Closed Systems. There is a significant number of telemedicine services used by healthcare
providers. Many of them are provided by closed systems without offering interoperability
with other medical systems (such as Electronic Health Record Systems).
Wide range of telemedicine. Telemedicine lies across a wide range of medical activities
resulting in redundancies, lack of interoperability, cost sharing and incompatibilities
between systems.
Communication Networks. The limited competition for telecommunication services in rural
areas and islands leads to a significantly decreased number of available network types (such
as GSM, 3G, etc.) in the above areas. Also, the available signal strength in those areas is
rather limited and the frequent interruptions or even the signal absence in some areas
(caused by the geomorphology) during a telemedicine session render telemedicine
unavailable.
User refusal against new technologies. People do not want to involve any new technologies in
their life, especially when they are constrained by thirds (e.g. profession guidelines). So they
are negative in the introduction of new equipment and claim legislative and financial
reasons in order to abstain.
Telemedicine resources. In order a telemedicine system to be able to serve a wide area of
medical cases, it needs various interconnected medical devices. The barrier here is whether
this equipment is sufficient to meet clinical standards and whether its usage inhibits
healthcare professionals from using it. Some times the quality and resolution of
radiographic images transmitted doesn’t meet the standards. Another factor is the lack of
know how in order to make a proper use of the equipment’s features. Telemedicine, in most
occasions, requires expensive infrastructure and highly complex setup which is a limiting
factor for small providers and physician clinics or health centres established in small islands.
Information structuring. With respect to the information structuring we highlight the
following:
Variability of incidents served by telemedicine. A telemedicine system can be addressed to serve
various incident types (patients suffering from heart diseases, pneumonics, several injuries
etc.). Every single incident requires a different collection of diverse vital signs or relative
information in order to determine the condition of the transported patient.
The absence of a protocol able to handle all the selected data in order to allow the interoperability
between Medical Information Systems. There are many protocols handling an information
subtotal.
The capability of scalable user access depending on his type/specialty. Telemedicine systems as
they are applied in a wide area of activities have to introduce control processes, in order to
keep the information confidentiality.
4. Methods for structuring, processing, archiving and transmitting the
medical information
The successful design, implementation and utilization of a telemedicine system needs the
resolution of several problems listed in the previous section. Resolving the aforementioned
problems is often a very difficult task demanding the cooperation of various social groups.
Being engineers we will focus on providing solutions to the technological problems not
addressing the other two groups (i.e. juridical and financial)
In order to overcome some of the technological limitations, we adopt solutions that will be
presented in the following paragraphs. These solutions have been succesfully implemented
in a telemedicine system [Mandellos et al., 2008 (a)] resulted by a project tuned by the Greek
Information Society and the Greek Ministry of Defense. This telemedicine system (fig. 1)
supports diverse types of endpoints including moving transports (MT) (such as ambulances,
ships, planes, etc.), handheld devices and fixed units, using diverse communication
networks. The above telemedicine system targets to the improved pre-hospital patient
treatment. Although vital sign transmission has the priority against other services provided
by the telemedicine system (videoconference, remote management, voice calls etc.), a
predefined algorithm controls provision, switching and QoS of the other potential services.
A distributed database system controlled by a central server, aims to manage patient
attributes, exams and incidents handled by several Telemedicine Coordination Centers
(TCC). Doctors and other mecical personnel are able to participate for observation purposes,
during the incident evaluation through workstations in TCCs or through Regional
Teleconference Rooms (RTRs).
4.1 Bandwidth limitations lead to the adoption of an adaptive protocol
In many medical incidents, the correct diagnosis depends on the amount and type of data
relative to patient’s health state [Mandellos et al., 2008 (a)]. In some cases the referring
doctor needs to know more data for the patient’s state from those provided by the acquired
measurements. Such data are patient’s demographic data, mnemonic and medical history,
patient’s allergy state, patient’s ancestry, etc. These data affect the correct treatment (e.g.
pharmaceutical allergy precludes some drugs) or the patient’s handling (e.g. blood
transfusion is prohibited by some religions; patient must be treated separately from other
patients if he suffers from infectious diseases or has sensitive immunizer system). The
doctor, takes different measurements on patient’s Vital Signs (for cardiac diseases he takes
an ElectroCardioGram - ECG, for pulmonary diseases he takes measurements of blood
oxygen saturation - SPO
2
while on other cases he takes body’s temperature (TEMP), non
invasive blood pressure (NiBP), etc) depending on patient’s symptoms. Consequently, the
total amount of data collected during an incident could have diverse length type compared
to another incident.
BiomedicalEngineering652
HIS/EHR
Communication
Network
Central Telemedicine
Database Server (CDS)
In-Hospital units
(1… n)
Mobile Units
(1… l)
Ambulance (1 m)
Ship (1… k)
Aerial transport
(1… o)
Telemedicine Coordination Center (1 k)
LAN
Server (DB &
Server app
Telemedicine)
Management
Workstation
PABX
Router Cisco 2811
WorkStation 1 . . . n
Vital Sign Presentation
Mobile Unit Remote
Control
Telemedicine Incident
Management
Teleconference
PC 1
PC 2
Local Database Server (LDS)
Regional Teleconference Room
(1 l)
Collaboration
system with
data capabilities
- Vital Sign viewing
- Patient medical file
presentation
- Connection with HIS
Fig. 1. General structure of a Telemedicine System.
This diversity of data led to the necessity of an adaptive protocol in order to minimize the
required transmission time, the recovery time and finally the storage space. Authors have
proposed some extensions in order the new protocol to be able to handle more data based
on the file format of SCP-ECG protocol [Health Informatics, 2002] which handles a basic set
of demographics data and also ECG measurements. The new protocol is referred as e-SCP-
ECG+ protocol [Mandellos et al., 2008 (b)]. The finally used protocol has significant changes
and constitutes a newer version of the aforementioned protocol. This version made
extensions in order all types of data collected during an incident to be handled. The protocol
consists of a sections collection, each of them handling different types of data. A main
section is defined to control critical data on other sections (e.g. existence, offset, length, etc.).
Thus, the protocol’s structure permits the handling only of the existing data.
The data handled by the latest protocol version covers patient demographic data, ancestry
data, profession data, mnemonic and medical history, a quick description of the incident, a
first diagnosis, allergy state and measurements on ECG, SPO
2
, NiBP, TEMP, Heart Rate and
CO
2
.
The user has the ability to manage the acquired data such as user inputs and Medical
Monitor (MM) measurements and define the data set describing each incident via a PC
connected between MM and communication media. Afterwards, the formatted data can be
transmitted to the TCC.
The new expanded protocol can be used in a big variety of medical incidents. The possibility
to manage only the data that characterize each medical incident has the advantage of
minimal possible volume of data, which results in a high flexibility regarding their
management and in a very fast transmission.
4.2. Communication network instability or absence in many areas leads to the
adoption of an alternative communication media structure
The reassurance of a stable communication between Mobile Units (MU) and TCCs, with the
best possible quality regarding the available networks at the various access points is very
important. It is more than obvious that the transmission of vital signs to the TCC is required
by all means, so that the paramedical staff of the MT can offer crucial treatment the soonest
possible. For this purpose, communication capabilities using multiple networks of diverse
physical media are developed to guarantee the successful vital signs transmission under all
circumstances (Fig. 1).
The development focused on mobile communications due to the fact that a patient
transportation is taking place. GPRS offers wider geographical area coverage but low
transmission rates. Mobile Asymmetric Digital Subscriber Line (M-ADSL) clearly offers
higher transmission rates with lower cost but it lacks in geographical area coverage. In the
occasions where the system was installed at ships transporting patients from islands to
central hospitals of the mainland (University hospitals or other central hospital), satellite
modems are used. The choice of the wireless network is mainly based on the offered Quality
of Service (QoS) and not only on its availability. Moreover, priorities are embedded
regarding the choice of the wireless network. The designation of the network to be used for
call transfers and transactions is made depending on these priorities and the available
networks. Thus, the result is a single network connection based on a priority algorithm (as
mentioned above) that will host (in contrast with the previous pilot operation of the project)
all the communications used to handle a single incident.
In any communication between MUs and TCCs two different and independent calls (C1 and
C2) are made from the MU to the TCC. C1 call enables vital signs transfer and C2 call
enables either telephone communication via an ordinary telephone set (wired or wireless) or
communication via teleconference, where, depending on the available bandwidth, text,
voice and video transmission will be possible in a respective hierarchical way. In all cases
one or more connections are used within the same transmission media, to circulate all the
information (data, voice and video) regarding a single incident.
Moreover, priorities are designated regarding the kind of data that must be transferred each
time, when the available bandwidth is inefficient. Examples of validations (demarcation of
situations/behaviour) carried out by this transmission sub-system according to these
priorities are the following:
In a GPRS network in case of low bandwidth only vital signs transmission is
accomplished. If bandwidth is sufficient, text is also transmitted.
In a wireless LAN (IEEE 802.11 a/b/g) or satellite modem in case bandwidth
reaches low rates high priority is given to vital signs transmission in regard of
teleconference.
RequirementsandsolutionsforadvancedTelemedicineapplications 653
HIS/EHR
Communication
Network
Central Telemedicine
Database Server (CDS)
In-Hospital units
(1… n)
Mobile Units
(1… l)
Ambulance (1 m)
Ship (1… k)
Aerial transport
(1… o)
Telemedicine Coordination Center (1 k)
LAN
Server (DB &
Server app
Telemedicine)
Management
Workstation
PABX
Router Cisco 2811
WorkStation 1 . . . n
Vital Sign Presentation
Mobile Unit Remote
Control
Telemedicine Incident
Management
Teleconference
PC 1
PC 2
Local Database Server (LDS)
Regional Teleconference Room
(1 l)
Collaboration
system with
data capabilities
- Vital Sign viewing
- Patient medical file
presentation
- Connection with HIS
Fig. 1. General structure of a Telemedicine System.
This diversity of data led to the necessity of an adaptive protocol in order to minimize the
required transmission time, the recovery time and finally the storage space. Authors have
proposed some extensions in order the new protocol to be able to handle more data based
on the file format of SCP-ECG protocol [Health Informatics, 2002] which handles a basic set
of demographics data and also ECG measurements. The new protocol is referred as e-SCP-
ECG+ protocol [Mandellos et al., 2008 (b)]. The finally used protocol has significant changes
and constitutes a newer version of the aforementioned protocol. This version made
extensions in order all types of data collected during an incident to be handled. The protocol
consists of a sections collection, each of them handling different types of data. A main
section is defined to control critical data on other sections (e.g. existence, offset, length, etc.).
Thus, the protocol’s structure permits the handling only of the existing data.
The data handled by the latest protocol version covers patient demographic data, ancestry
data, profession data, mnemonic and medical history, a quick description of the incident, a
first diagnosis, allergy state and measurements on ECG, SPO
2
, NiBP, TEMP, Heart Rate and
CO
2
.
The user has the ability to manage the acquired data such as user inputs and Medical
Monitor (MM) measurements and define the data set describing each incident via a PC
connected between MM and communication media. Afterwards, the formatted data can be
transmitted to the TCC.
The new expanded protocol can be used in a big variety of medical incidents. The possibility
to manage only the data that characterize each medical incident has the advantage of
minimal possible volume of data, which results in a high flexibility regarding their
management and in a very fast transmission.
4.2. Communication network instability or absence in many areas leads to the
adoption of an alternative communication media structure
The reassurance of a stable communication between Mobile Units (MU) and TCCs, with the
best possible quality regarding the available networks at the various access points is very
important. It is more than obvious that the transmission of vital signs to the TCC is required
by all means, so that the paramedical staff of the MT can offer crucial treatment the soonest
possible. For this purpose, communication capabilities using multiple networks of diverse
physical media are developed to guarantee the successful vital signs transmission under all
circumstances (Fig. 1).
The development focused on mobile communications due to the fact that a patient
transportation is taking place. GPRS offers wider geographical area coverage but low
transmission rates. Mobile Asymmetric Digital Subscriber Line (M-ADSL) clearly offers
higher transmission rates with lower cost but it lacks in geographical area coverage. In the
occasions where the system was installed at ships transporting patients from islands to
central hospitals of the mainland (University hospitals or other central hospital), satellite
modems are used. The choice of the wireless network is mainly based on the offered Quality
of Service (QoS) and not only on its availability. Moreover, priorities are embedded
regarding the choice of the wireless network. The designation of the network to be used for
call transfers and transactions is made depending on these priorities and the available
networks. Thus, the result is a single network connection based on a priority algorithm (as
mentioned above) that will host (in contrast with the previous pilot operation of the project)
all the communications used to handle a single incident.
In any communication between MUs and TCCs two different and independent calls (C1 and
C2) are made from the MU to the TCC. C1 call enables vital signs transfer and C2 call
enables either telephone communication via an ordinary telephone set (wired or wireless) or
communication via teleconference, where, depending on the available bandwidth, text,
voice and video transmission will be possible in a respective hierarchical way. In all cases
one or more connections are used within the same transmission media, to circulate all the
information (data, voice and video) regarding a single incident.
Moreover, priorities are designated regarding the kind of data that must be transferred each
time, when the available bandwidth is inefficient. Examples of validations (demarcation of
situations/behaviour) carried out by this transmission sub-system according to these
priorities are the following:
In a GPRS network in case of low bandwidth only vital signs transmission is
accomplished. If bandwidth is sufficient, text is also transmitted.
In a wireless LAN (IEEE 802.11 a/b/g) or satellite modem in case bandwidth
reaches low rates high priority is given to vital signs transmission in regard of
teleconference.
BiomedicalEngineering654
It is obvious by the examples given above that in every case the bandwidth is not efficient,
transmission of crucial medical information i.e. vital signs, has priority against the delivery
of any other service offered by the telemedicine system.
Using the adaptive protocol described above, the vital signs transmission is performed by
means of transferring packets of a fixed time period (e.g. 10 sec) from the MU to the TCC.
These vital signs packets include all the acquired vital signs from the patient for the
respective period of time. The application developed provides the potential to define the
period of time in which transmitted vital signs are partitioned. Thus, it is possible to adjust
the size of each transmitted packet. Detection of low bandwidth results in a larger packet
size in order to limit the transactions with the database and prevent a possible flood in the
network connection. This adjustment will result in a higher delay imported at the beginning
of the real time transmission due to the initial control-process of the first large sized packet.
Detection of high bandwidth results in a smaller packet size accompanied with lower delay
imported to the transmission due to a fast initial control-process of the first small sized
packet. The size of the vital signs packet can be dynamically adjusted depending on a
periodically measurement of the available bandwidth.
4.3. A call management application controls the productive telemedicine system’s
operation
Each TCC hosts one Management Workstation (MW), three Treatment Workstations (TWs),
as well as one Local Database Server (LDS) for storing the whole of incident data. The MW
in its turn hosts a call management application, which is responsible the control and
management of incoming calls from the MTs to the TCCs and the already established
connections between them. When a call arrives at the TCC, optical and sound alerts are
enabled to the MW to inform of the data arrival. MW selects a TW in order to handle the
incident. The call management application enables TCC’s administrator to route an
incoming call to the TCC to a workstation, redirect an already established session from one
workstation to another, to connect or disconnect any endpoint in an active session, to invite
participants, etc.
Apart from these functionalities, the call management application provides control and
management of the wider established telemedicine network outside the physical area of the
TCC. It offers endpoint management (i.e. insertion, deletion, modification etc.),
teleconference administration and management of users (i.e. insertion, deletion, access level
designation and access data modification of users authorized to access TWs, MWs, RTRs
and MUs).
The latest part, of which the call management application is responsible, is the maintenance
of the telemedicine system. Through this application, the administrator is able to
start/stop/restart the proxy servers which consist the intermediates of the communication
between MUs and TWs. Furthermore, it provides the functionality to flag several types of
incidents that either were not properly handled (e.g. due to connection loss between TCC’s
and MUs) or lack information and need further identification. These incidents are then
handled in an offline mode by any willing TW, where the diagnostic reports are filled and
the overall incident information is sent to the LDS.
4.4. Multiple TCCs
In order to ensure that a call initiated by a MU, will be served by a workstation in a TCC, in
all cases, we established 3 networked TCCs. There were many occasions where all the TWs
of a specific TCC were engaged while calls where arriving in this TCC. Furthermore, there
were cases where the LDS of a TCC crashed or communications were broken due to various
reasons. This led as to establish multiple TCC’s in different central hospitals, not only to
offer the ability of simultaneous handling of a relatively high number of telemedicine
incidents, but also to be able to cope with such problems. So, in case all workstations in the
referral TCC aren’t able to treat a telemedicine call due to the reasons mentioned above, a
predefined algorithm reroutes the call to another TCC which has available workstations.
4.5. Distributed database system
The information management in the telemedicine applications mainly follows an
hierarchical structure represented by three basic information entities: Patient – Incident –
Exams (Vital signs, still images and short time patient video). The archiving systems
implemented by different TCCs result in a distributed system regarding the storage of
information at the each TCC that any telemedicine-related information is being collected. A
central database connects with peripheral databases located at the TCCs and this central
database is responsible for the integration and coordination of the telemedicine databases.
Each Exam can be considered as a complete part of an Incident. Consequently, Exams are
related only to one TCC and the attributes of one Incident are stored in its LDS. The
different Incidents that comprise one Patient are often related to different TCCs. Thus, the
Central Database Server (CDS) of the Telemedicine System is designed to keep records of
the attributes of Patients and Incidents. However, the whole content of the Exams that is
related to an Incident is stored in the LDS of the TCC that handled it. Inside the CDS, a
reference connects user to the proper LDS of a TCC, which keeps the Exams data.
Thus, Exams remain locally stored, yet they appear as a reference link in the CDS of the
Telemedicine. Each access of a LDS comes through the CDS and consequently any query
related to Exams data is assigned to the proper LDS located at the respective TCC. The
choice of the proper LDS is made regarding the attributes of the TCC that is related to the
requested Incident.
Central telemedicine database should communicate with either the central Hospital
Information System (HIS) or an Electronic Health Record (EHR) system in order to update
the patient data corresponding to a telemedicine incident (Fig. 1). Central database also
updates the HIS/EHR for possible changes to data regarding already registered patients or
even registers new patients to the HIS/EHR whenever this is required during a patient
identification process. Communication between central telemedicine database and
HIS/EHR is accomplished exchanging the appropriate Health Level 7 (HL7) messages on
occasion.
The distributed database can be implemented with robust grid database server software
installed on a cluster of database servers. At the central telemedicine server, suitable proxy
server software should be installed. This proxy server has the responsibility to perform the
communication between the telemedicine system and the HIS/EHR. The proxy server
undertakes to send and receive the (HL7) messages for the communication between the CDS
and the HIS/EHR.
RequirementsandsolutionsforadvancedTelemedicineapplications 655
It is obvious by the examples given above that in every case the bandwidth is not efficient,
transmission of crucial medical information i.e. vital signs, has priority against the delivery
of any other service offered by the telemedicine system.
Using the adaptive protocol described above, the vital signs transmission is performed by
means of transferring packets of a fixed time period (e.g. 10 sec) from the MU to the TCC.
These vital signs packets include all the acquired vital signs from the patient for the
respective period of time. The application developed provides the potential to define the
period of time in which transmitted vital signs are partitioned. Thus, it is possible to adjust
the size of each transmitted packet. Detection of low bandwidth results in a larger packet
size in order to limit the transactions with the database and prevent a possible flood in the
network connection. This adjustment will result in a higher delay imported at the beginning
of the real time transmission due to the initial control-process of the first large sized packet.
Detection of high bandwidth results in a smaller packet size accompanied with lower delay
imported to the transmission due to a fast initial control-process of the first small sized
packet. The size of the vital signs packet can be dynamically adjusted depending on a
periodically measurement of the available bandwidth.
4.3. A call management application controls the productive telemedicine system’s
operation
Each TCC hosts one Management Workstation (MW), three Treatment Workstations (TWs),
as well as one Local Database Server (LDS) for storing the whole of incident data. The MW
in its turn hosts a call management application, which is responsible the control and
management of incoming calls from the MTs to the TCCs and the already established
connections between them. When a call arrives at the TCC, optical and sound alerts are
enabled to the MW to inform of the data arrival. MW selects a TW in order to handle the
incident. The call management application enables TCC’s administrator to route an
incoming call to the TCC to a workstation, redirect an already established session from one
workstation to another, to connect or disconnect any endpoint in an active session, to invite
participants, etc.
Apart from these functionalities, the call management application provides control and
management of the wider established telemedicine network outside the physical area of the
TCC. It offers endpoint management (i.e. insertion, deletion, modification etc.),
teleconference administration and management of users (i.e. insertion, deletion, access level
designation and access data modification of users authorized to access TWs, MWs, RTRs
and MUs).
The latest part, of which the call management application is responsible, is the maintenance
of the telemedicine system. Through this application, the administrator is able to
start/stop/restart the proxy servers which consist the intermediates of the communication
between MUs and TWs. Furthermore, it provides the functionality to flag several types of
incidents that either were not properly handled (e.g. due to connection loss between TCC’s
and MUs) or lack information and need further identification. These incidents are then
handled in an offline mode by any willing TW, where the diagnostic reports are filled and
the overall incident information is sent to the LDS.
4.4. Multiple TCCs
In order to ensure that a call initiated by a MU, will be served by a workstation in a TCC, in
all cases, we established 3 networked TCCs. There were many occasions where all the TWs
of a specific TCC were engaged while calls where arriving in this TCC. Furthermore, there
were cases where the LDS of a TCC crashed or communications were broken due to various
reasons. This led as to establish multiple TCC’s in different central hospitals, not only to
offer the ability of simultaneous handling of a relatively high number of telemedicine
incidents, but also to be able to cope with such problems. So, in case all workstations in the
referral TCC aren’t able to treat a telemedicine call due to the reasons mentioned above, a
predefined algorithm reroutes the call to another TCC which has available workstations.
4.5. Distributed database system
The information management in the telemedicine applications mainly follows an
hierarchical structure represented by three basic information entities: Patient – Incident –
Exams (Vital signs, still images and short time patient video). The archiving systems
implemented by different TCCs result in a distributed system regarding the storage of
information at the each TCC that any telemedicine-related information is being collected. A
central database connects with peripheral databases located at the TCCs and this central
database is responsible for the integration and coordination of the telemedicine databases.
Each Exam can be considered as a complete part of an Incident. Consequently, Exams are
related only to one TCC and the attributes of one Incident are stored in its LDS. The
different Incidents that comprise one Patient are often related to different TCCs. Thus, the
Central Database Server (CDS) of the Telemedicine System is designed to keep records of
the attributes of Patients and Incidents. However, the whole content of the Exams that is
related to an Incident is stored in the LDS of the TCC that handled it. Inside the CDS, a
reference connects user to the proper LDS of a TCC, which keeps the Exams data.
Thus, Exams remain locally stored, yet they appear as a reference link in the CDS of the
Telemedicine. Each access of a LDS comes through the CDS and consequently any query
related to Exams data is assigned to the proper LDS located at the respective TCC. The
choice of the proper LDS is made regarding the attributes of the TCC that is related to the
requested Incident.
Central telemedicine database should communicate with either the central Hospital
Information System (HIS) or an Electronic Health Record (EHR) system in order to update
the patient data corresponding to a telemedicine incident (Fig. 1). Central database also
updates the HIS/EHR for possible changes to data regarding already registered patients or
even registers new patients to the HIS/EHR whenever this is required during a patient
identification process. Communication between central telemedicine database and
HIS/EHR is accomplished exchanging the appropriate Health Level 7 (HL7) messages on
occasion.
The distributed database can be implemented with robust grid database server software
installed on a cluster of database servers. At the central telemedicine server, suitable proxy
server software should be installed. This proxy server has the responsibility to perform the
communication between the telemedicine system and the HIS/EHR. The proxy server
undertakes to send and receive the (HL7) messages for the communication between the CDS
and the HIS/EHR.