630 Maria Halkidi and Michalis Vazirgiannis
that increase (or decrease) as the number of clusters increase we search for the values of nc at
which a significant local change in value of the index occurs. This change appears as a ‘knee’
in the plot and it is an indication of the number of clusters underlying the data set. Moreover,
the absence of a knee may be an indication that the data set possesses no clustering structure.
Below, some representative validity indices for crisp and fuzzy clustering are presented.
Crisp Clustering
Crisp clustering considers non overlapping partitions meaning that a data point either belongs
to a class or not. In this section we discuss validity indices suitable for crisp clustering.
The modified Hubert
Γ
statistic
The definition of the modified Hubert
Γ
(Theodoridis and Koutroubas, 1999) statistic is given
by the equation
Γ
=(1/M)
N−1
∑
i=1
N
∑
j=i+1
P(i, j) ·Q(i, j) (31.16)
where N is the number of objects in a data set, M = N ·(N −1)/2, P is the proximity matrix
of the data set and Q is an N ×N matrix whose (i, j) element is equal to the distance between
the representative points (v
c
i
,v
c
j
) of the clusters where the objects x
i
and x
j
belong.
Similarly, we can define the normalized Hubert
σ
statistic, given by equation
ˆ
Γ
=
(1/M)
∑
N−1
i=1
∑
N
j=i+1
(P(i, j) −
μ
P
)(Q(i, j) −
μ
Q
)
σ
P
·
σ
Q
(31.17)
where
μ
P
,
μ
Q
,
σ
P
,
σ
Q
are the respective means and variances of P, Q matrices.
If the d(v
c
i
,v
c
j
) is close to d(x
i
,x
j
) for i, j = 1,2, ,N, P and Q will be in close
agreement and the values of
Γ
and
ˆ
Γ
(normalized
Γ
) will be high. Conversely, a high value of
Γ
(
ˆ
Γ
) indicates the existence of compact clusters. Thus, in the plot of normalized
Γ
versus
n
c
, we seek a significant knee that corresponds to a significant increase of normalized G. The
number of clusters at which the knee occurs is an indication of the number of clusters that
occurs in the data. We note, that for n
c
= 1 and n
c
= N the index is not defined.
Dunn family of indices
A cluster validity index for crisp clustering proposed in (Dunn, 1974), aims at the identifica-
tion of ‘compact and well separated clusters’. The index is defined in the following equation
for a specific number of clusters
D
n
c
= min
i=1, ,n
c
min
j=i+1, ,n
c
d(c
i
,c
j
)
max
k=1, ,n
c
(diam(c
k
))
(31.18)
where d(c
i
,c
j
) is the dissimilarity function between two clusters c
i
and c
j
defined as
d(c
i
,c
j
)=min
x∈C
i
,y∈C
j
d(x, y) , and diam(c) is the diameter of a cluster, which may be con-
sidered as a measure of clusters’ dispersion. The diameter of a cluster C can be defined as
follows:
diam(C)=max
x,y∈C
{
d(x, y)
}
(31.19)
If the data set contains compact and well-separated clusters, the distance between the
clusters is expected to be large and the diameter of the clusters is expected to be small. Thus,
31 Quality Assessment Approaches in Data Mining 631
based on the Dunn’s index definition, we may conclude that large values of the index indicate
the presence of compact and well-separated clusters.
Index D
n
c
does not exhibit any trend with respect to number of clusters. Then the maxi-
mum in the plot of D
n
c
versus the number of clusters can be an indication of the number of
clusters that fits the data.
The problems of the Dunn index are: i) its considerable time complexity, and ii) its sen-
sitivity to the presence of noise in data sets, since these are likely to increase the values of
diam(c) (i.e. dominator of equation 31.18).
Three indices, are proposed in (Pal and Biswas, 1997) that are more robust to the presence
of noise. They are known as Dunn-like indices since they are based on the Dunn index. More-
over, the three indices use for their definition the concepts of Minimum Spanning Tree (MST),
the relative neighborhood graph (RNG) and the Gabriel graph respectively (Theodoridis and
Koutroubas, 1999). Consider the index based on MST. Let a cluster ci and the complete graph
G
i
whose vertices correspond to the vectors of c
i
. The weight, we, of an edge, e, of this graph
equals the distance between its two end points, x,y. Let E
MST
i
be the set of edges of the MST
of the graph G
i
, and e
MST
i
the edge in E
MST
i
with the maximum weight. Then the diameter of
C
i
is defined as the weight of e
MST
i
. Dunn-like index based on the concept of the MST is given
by equation
D
n
c
= min
i=1, ,n
c
min
j=i+1, ,n
c
d(c
i
,c
j
)
max
k=1, ,n
c
(diam
MST
k
)
(31.20)
The number of clusters at which D
MST
m
takes its maximum value indicates the number
of clusters in the underlying data. Based on similar arguments we may define the Dunn-like
indices for GG and RGN graphs.
The Davies-Bouldin (DB) index
A similarity measure R
ij
between the clusters C
i
and C
j
is defined based on a measure of
dispersion of a cluster C
i
, denoted by s
i
, and a dissimilarity measure between two clusters,
d
ij
. The R
ij
index is defined to satisfy the following conditions (Davies and Bouldin, 1979):
1. R
ij
= 0
2. R
ij
= R
ji
3. if s
i
= 0 and s
j
= 0 then R
ij
= 0
4. if s
j
> s
k
and d
ij
= d
ik
then R
ij
> R
ik
5. if s
j
= s
k
and d
ij
< d
ik
then R
ij
> R
ik
.
These conditions state that R
ij
is non-negative and symmetric. A simple choice for R
ij
that satisfies the above conditions is (Davies and Bouldin, 1979):
R
ij
=(s
i
+ s
j
)/d
ij
(31.21)
Then the DB index is defined as
DB
n
c
=
1
n
c
·
n
c
∑
i=1
R
i
,andR
i
= max
j=1, ,n
c
, j=i
{R
ij
},i = 1, ,n
c
(31.22)
It is clear for the above definition that DB
n
c
is the average similarity between each cluster
c
i
, i = 1, ,n
c
and its most similar one. It is desirable for the clusters to have the minimum
possible similarity to each other; therefore we seek partitionings that minimize DB
n
c
. The
DB
n
c
index exhibits no trends with respect to the number of clusters and thus we seek the
minimum value of DB
n
c
in its plot versus the number of clusters.
632 Maria Halkidi and Michalis Vazirgiannis
Some alternative definitions of the dissimilarity between two clusters as well as the dis-
persion of a cluster, c
i
, is defined in (Davies and Bouldin, 1979).
Three variants of the DB
n
c
index are proposed in (Pal and Biswas, 1997). They are based
on the MST, RNG and GG concepts, similar to the cases of the Dunn-like indices.
Other validity indices for crisp clustering have been proposed in (Dave, 1996) and (Mil-
ligan and Cooper, 1985). The implementation of most of these indices is computationally
very expensive, especially when the number of clusters and objects in the data set grows very
large (Xie and Beni, 1991). In (Milligan and Cooper, 1985), an evaluation study of 30 validity
indices proposed in literature is presented. It is based on tiny data sets (about 50 points each)
with well-separated clusters. The results of this study (Milligan and Cooper, 1985) place
Caliski and Harabasz (1974), Je(2)/Je(1) (1984), C-index (1976), Gamma and Beale among
the six best indices. However, it is noted that although the results concerning these methods
are encouraging they are likely to be data dependent. Thus, the behavior of indices may
change if different data structures are used. Also, some indices are based on a sample of
clustering results. A representative example is Je(2)/Je(1) whose computations based only on
the information provided by the items involved in the last cluster merge.
RMSSDT, SPR, RS, CD
This family of validity indices is applicable in the cases that hierarchical algorithms are used
to cluster the data sets. Below we present the definitions of four validity indices, which have
to be used simultaneously to determine the number of clusters existing in the data set. These
four indices are applied to each step of a hierarchical clustering algorithm (Sharma, 1996).
• RMSST D (root mean square standard deviation) of a new clustering scheme defined at
a level of a clustering hierarchy is the square root of the variance of all the variables
(attributes used in the clustering process). This index measures the homogeneity of the
formed clusters at each step of the hierarchical algorithm. Since the objective of cluster
analysis is to form homogeneous groups the RMSST D of a cluster should be as small as
possible. Where the values of RMSSTD are higher than the ones of the previous step, we
have an indication that the new clustering scheme is worse.
In the following definitions we shall use the term SS, which means Sum of Squares and
refers to the equation:
SS =
n
∑
i=1
(X
i
−
¯
X)
2
(31.23)
Along with this we shall use some additional symbolism like:
1. SSw referring to the sum of squares within group,
2. SSb referring to the sum of squares between groups,
3. SSt referring to the total sum of squares, of the whole data set.
• SPR (Semi-Partial R-squared) for the new cluster is the difference between SSw of the
new cluster and the sum of the SSw’s values of clusters joined to obtain the new cluster
(loss of homogeneity), divided by the SSt for the whole data set. This index measures the
loss of homogeneity after merging the two clusters of a single algorithm step. If the index
value is zero then the new cluster is obtained by merging two perfectly homogeneous
clusters. If its value is high then the new cluster is obtained by merging two heterogeneous
clusters.
• RS (R-Squared) of the new cluster is the ratio of SSb over SSt. SSb is a measure of differ-
ence between groups. Since SSt = SSb + SSw, the greater the SSb the smaller the SSw and
vice versa. As a result, the greater the differences between groups, the more homogenous
31 Quality Assessment Approaches in Data Mining 633
each group is and vice versa. Thus, RS may be considered as a measure of dissimilarity
between clusters. Furthermore, it measures the degree of homogeneity between groups.
The values of RS range between 0 and 1. Where the value of RS is zero, there is an indi-
cation that no difference exists among groups. On the other hand, when RS equals 1 there
is an indication of significant difference among groups.
• The CD index measures the distance between the two clusters that are merged in a given
step of the hierarchical clustering. This distance depends on the selected representatives
for the hierarchical clustering we perform. For instance, in the case of Centroid hierarchi-
cal clustering the representatives of the formed clusters are the centers of each cluster, so
CD is the distance between the centers of the clusters. Where we use single linkage, CD
measures the minimum Euclidean distance between all possible pairs of points. In case of
complete linkage, CD is the maximum Euclidean distance between all pairs of data points,
and so on.
Using these four indices we determine the number of clusters that exist in a data set,
plotting a graph of all these indices values for a number of different stages of the clustering
algorithm. In this graph we search for the steepest knee, or in other words, the greatest jump
of these indices’ values from the higher to the smaller number of clusters.
The SD validity index
Another clustering validity approach is proposed in (Halkidi et al., 2000). The SD validity
index definition is based on the concepts of average scattering for clusters and total
separation between clusters. Below, we give the fundamental definition for this index.
Average scattering for clusters. It evaluates scattering of the points in the clusters
comparing the variance of the considered clustering scheme with the variance of the whole
data set. The average scattering for clusters is defined as follows:
Scat(n
c
)=
1
n
c
·
∑
n
c
i=1
σ
(v
i
)
σ
(S)
(31.24)
The term
σ
(S) is the variance of a data set; and its p-th dimension is defined as follows:
σ
p
=
1
n
·
n
∑
k=1
x
p
k
− ¯x
p
(31.25)
where ¯x
p
is the p-th dimension of
¯
X =
1
n
·
∑
n
k=1
x
k
,∀x
k
∈ S.
The term
σ
(v
i
) is the variance of cluster c
i
and its p-th dimension is given by the equation
σ
(v
i
)
p
=
n
i
∑
k=1
(x
p
k
−v
p
i
)
2
/n
i
(31.26)
Further the term
Y
is defined as:
Y
=
Y
T
Y
1/2
, where
Y =(y
1
, ,y
k
) is a vector (e.g.
σ
(v
i
)).
Total separation between clusters. The definition of total scattering (separation) between
clusters is given in the following equation:
Dis(n
c
)=
D
max
D
min
·
n
c
∑
k=1
n
c
∑
z=1
v
k
−v
z
−1
(31.27)
634 Maria Halkidi and Michalis Vazirgiannis
where D
max
= max(
v
i
−v
j
) i, j ∈ 1,2,3,,n
c
is the maximum distance between cluster cen-
ters. The D
min
= min(
v
i
−v
j
)), ∀i, j ∈{1, 2, ,n
c
} is the minimum distance between clus-
ter centers.
Now, we can define a validity index based on 31.24 and 31.27 as follows
SD(n
c
)=a ·Scat(n
c
)+Dis(n
c
) (31.28)
where a is a weighting factor equal to Dis(c
max
) and where c
max
is the maximum number of
input clusters.
The first term (i.e. Scat(n
c
)) is defined in Eq. 24, indicating the average compactness of
clusters (i.e. intra-cluster distance). A small value for this term indicates compact clusters
and as the scattering within clusters increases (i.e. they become less compact) the value of
Scat(n
c
) also increases. The second term Dis(n
c
) indicates the total separation between the
nc clusters (i.e. an indication of inter-cluster distance). Contrary to the first term the second
one, Dis(n
c
), is influenced by the geometry of the clusters and increases with the number of
clusters. The two terms of SD are of the different range, thus a weighting factor is needed in
order to incorporate both terms in a balanced way. The number of clusters, n
c
, that minimizes
the above index is an optimal value. Also, the influence of the maximum number of clusters
c
max
, related to the weighting factor, in the selection of the optimal clustering scheme, is
discussed in (Halkidi et al., 2000). It is proved that SD proposes an optimal number of clusters
almost irrespectively of the c
max
value.
The S
Dbw validity index
A recent validity index is proposed in (Halkidi and Vazirgiannis, 2001a). It exploits the
inherent features of clusters to assess the validity of results and select the optimal partitioning
for the data under concern. Similarly with the SD index, its definition is based on the compact-
ness and separation of clusters. The average scattering for clusters is defined as above in 31.24.
Inter-cluster Density (ID) - It evaluates the average density in the region among clus-
ters in relation with the density of the clusters. The goal is the density among clusters to be
significantly low in comparison with the density in the considered clusters. Then, considering
a partitioning of the data set into more than two clusters (i.e. n
c
> 1) the inter-cluster density
is defined as follows:
Dens
bw(c)=
1
n
c
·(n
c
−1)
n
c
∑
i=1
n
c
∑
j=1, j=i
density(u
ij
)
max{density(v
i
),density(v
j
)}
,c > 1 (31.29)
where v
i
, v
j
are the centers of clusters c
i
, c
j
respectively, and u
ij
the middle point of the line
segment defined by the clusters’ centers v
i
, v
j
.
The term density(u) is defined in the following equation:
density(u)=
n
ij
∑
l=1
f (x
l
,u) (31.30)
where x
l
is a point of data set S, n
ij
is the number of points (tuples) that belong to the clusters
c
i
and c
j
, i.e. x
l
∈ c
i
∪c
j
⊆ S. It represents the number of points in the neighborhood of u.In
our work, the neighborhood of a data point, u, is defined to be a hyper-sphere with center u
and radius the average standard deviation of the clusters, stdev. The standard deviation of the
clusters is given by the following equation:
31 Quality Assessment Approaches in Data Mining 635
stdev =
1
n
c
n
c
∑
i=1
σ
(v
i
)
where c is the number of clusters and s(v
i
) is the variance of cluster C
i
.
More specifically, the function f (x, u) is defined as:
Y =
0ifd(x,u) > stdev ,
1 otherwise
(31.31)
It is obvious that a point belongs to the neighborhood of u if its distance from u is smaller
than the average standard deviation of clusters. Here we assume that the data has been scaled
to consider all dimensions (bringing them into comparable ranges), as is equally important
during the process of finding the neighbors of a multidimensional point (Berry and Linoff,
1996).
Then the validity index S
Dbw is defined as:
S
Dbw(n
c
)=Scat(n
c
)+Dens bw(n
c
) (31.32)
The above definitions refer to the case that a cluster presents clustering tendency, i.e. it
can be partitioned into at least two clusters. The index is not defined for n
c
= 1.
The definition of S
Dbw indicates that both criteria of ‘good’ clustering (i.e. compactness
and separation) are properly combined, enabling reliable evaluation of clustering results. Also,
the density variations among clusters are taken into account to achieve more reliable results.
The number of clusters, n
c
, that minimizes the above index is an optimal value indicating the
number of clusters present in the data set.
Moreover, an approach based on the S
Dbw index is proposed in (Halkidi and Vazirgian-
nis, 2001b). It evaluates the clustering schemes of a data set as defined by different clustering
algorithms and selects the algorithm resulting in optimal partitioning of the data.
In general terms, S
Dbw enables the selection both of the algorithm and its parameter
values for which the optimal partitioning of a data set is defined (assuming that the data set
presents clustering tendency). However, the index cannot properly handle arbitrarily shaped
clusters. The same applies to all the aforementioned indices.
There are a number of applications where it is important to identify non-convex clusters
such as medical or spatial data applications. An approach to handle arbitrarily shaped clusters
in the cluster validity process is presented in (Halkidi and Vazirgiannis, 2002).
31.4.5 Fuzzy Clustering
In this section, we present validity indices suitable for fuzzy clustering. The objective is to seek
clustering schemes where most of the vectors of the data set exhibit a high degree of mem-
bership in one cluster. Fuzzy clustering is defined by a matrix U =
u
ij
, where u
ij
denotes
the degree of membership of the vector x
i
in cluster j. Also, a set of cluster representatives is
defined. Similar to a crisp clustering case a validity index, q! is defined and we search for the
minimum or maximum in the plot of q versus n
c
.
Also, where q exhibits a trend with respect to the number of clusters, we seek a significant
knee of decrease (or increase) in the plot of q.
Below two categories of fuzzy validity indices are discussed. The first category uses only
the membership values, u
ij
, of a fuzzy partition of data. The second involves both the U
matrix and the data set itself.
636 Maria Halkidi and Michalis Vazirgiannis
Validity Indices involving only the membership values
Bezdek proposed in (Bezdeck et al., 1984) the partition coefficient, which is defined as
PC =
1
N
N
∑
i=1
n
c
∑
j=1
u
2
ij
(31.33)
The PC index values range in [1/n
c
,1], where n
c
is the number of clusters. The closer the
index is to unity the ”crisper” the clustering is. In case that all membership values to a fuzzy
partition are equal, that is, u
ij
= 1/n
c
, the PC obtains its lower value. Thus, the closer the value
of PC is to 1/n
c
, the fuzzier the clustering is. Furthermore, a value close to 1/n
c
indicates that
there is no clustering tendency in the considered data set or the clustering algorithm failed to
reveal it.
The partition entropy coefficient is another index of this category. It is defined as follows
PE = −
1
N
N
∑
i=1
n
c
∑
j=1
u
ij
·log
a
(u
ij
) (31.34)
where a is the base of the logarithm. The index is computed for values of n
c
greater than 1 and
its value ranges in [0,log
a
n
c
]. The closer the value of PE to 0, the ‘crisper’ the clustering is.
As in the previous case, index values close to the upper bound (i.e. log
a
n
c
), indicate absence
of any clustering structure in the data set or inability of the algorithm to extract it.
The drawbacks of these indices are:
• their monotonous dependency on the number of clusters. Thus, we seek significant knees
of increase (for PC) or decrease (for PE) in the plots of the indices versus the number of
clusters,
• their sensitivity to the fuzzifier, m. More specifically, as m → 1 the indices give the same
values for all values of n
c
. On the other hand when m → ∞ , both PC and PE exhibit
significant knee at n
c
= 2,
• the lack of direct connection to the geometry of the data (Dave, 1996), since they do not
use the data itself.
Indices involving the membership values and the data set
The Xie-Beni index (Xie and Beni, 1991), XB, also called the compactness and separation
validity function, is a representative index of this category. Consider a fuzzy partition of the
data set X = {x
j
; j = 1, ,n} with v
i
,(i = 1, ,n
c
) the centers of each cluster and u
ij
the
membership of the jth data point belonging to the ith cluster. The fuzzy deviation of x
j
form
cluster i, d
ij
, is defined as the distance between x
j
and the center of cluster weighted by
the fuzzy membership of data point j with regards to cluster i. It is given by the following
equation:
d
ij
= u
ij
x
j
−v
i
(31.35)
Also, for a cluster i, the sum of the squares of fuzzy deviation of the data point in X ,
denoted
σ
i
, is called variation of cluster i.
The term
π
i
=(
σ
i
/n
i
), is called compactness of cluster i. Since n
i
is the number of point
in cluster belonging to cluster i,
π
i
is the average variation in cluster i. Then the compactness
of a partitioning of n
c
clusters is defined as the average compactness of the defined clusters,
given by the equation:
π
=
∑
n
c
i=1
π
i
n
c
(31.36)
Also, the separation of the fuzzy partitions is defined as the minimum distance between
cluster centers, that is
31 Quality Assessment Approaches in Data Mining 637
d
min
= min
v
i
−v
j
(31.37)
Then XB index is defined as
XB =
π
N ·(d
min
)
2
(31.38)
where N is the number of points in the data set.
It is clear that small values of XB are expected for compact and well-separated clusters.
We note, however, that XB is monotonically decreasing when the number of clusters n
c
gets
very large and close to n. One way to eliminate this decreasing tendency of the index is to
determine a starting point, c
max
, of the monotonic behavior and to search for the minimum
value of XB in the range [2, c
max
]. Moreover, the values of the index XB depend on the
fuzzifier values, so as if m →∞ then XB → ∞.
Another index of this category is the Fukuyama-Sugeno index, which is defined as
FS
m
=
N
∑
i=1
n
c
∑
j=1
u
m
ij
x
i
−v
j
2
A
−
v
j
−v
2
A
(31.39)
where v is the mean vector of X and A is an lxl positive definite, symmetric matrix. When A =
I, the above distance becomes the squared Euclidean distance. It is clear that for compact and
well-separated clusters we expect small values for FS
m
. The first term in brackets measures
the compactness of the clusters while the second one measures the distances of the clusters
representatives.
Also some other fuzzy validity indices are proposed in (Gath and Geva, 1989), which are
based on the concepts of hyper volume and density.
31.4.6 Other Approaches for Cluster Validity
Another approach for finding the optimal number of clusters of a data set was proposed in
(Smyth, 1996). It introduces a practical clustering algorithm based on Monte Carlo cross-
validation. More specifically, the algorithm consists of M cross-validation runs over M chosen
train/test partitions of a data set, D. For each partition u, the EM algorithm is used to define
n
c
clusters to the training data, while n
c
is varied from 1 to c
max
. Then, the log-likelihood
L
u
c
(D) is calculated for each model with n
c
clusters. It is defined using the probability density
function of the data as
L
k
(D)=
N
∑
i=1
logf
k
(x
i
/
Φ
k
) (31.40)
where f
k
is the probability density function for the data and
Φ
k
denotes parameters that have
been estimated from data. This is repeated M times and the M cross-validated estimates are
averaged for each of n
c
. Based on these estimates we may define the posterior probabilities for
each value of the number of clusters n
c
, p(n
c
/D). If one of p(n
c
/D) is near 1, there is strong
evidence that the particular number of clusters is the best for our data set. The evaluation
approach proposed in (Smyth, 1996) is based on density functions considered for the data set.
Thus, it is based on concepts related to probabilistic models in order to estimate the number of
clusters, better fitting a data set, and it does not use concepts directly related to the data, (i.e.
inter-cluster and intra-cluster distances).
638 Maria Halkidi and Michalis Vazirgiannis
References
Athanasopoulos, D. (1991). Probabilistic Theory. Stamoulis, Piraeus.
Berry, M. and Linoff, G. (1996). Data Mining Techniques for Marketing, Sales and Customer
Support. John Wiley and Sons, Inc.
Bezdeck, J., Ehrlich, R., and Full, W. (1984). Fcm:fuzzy c-means algorithm. Computers and
Geoscience.
Brin, S., Motwani, R., Ullman, J., and Tsur, S. (1997). Dynamic itemset counting and impli-
cation rules for market basket data. CSIGMOD Record (ACM Special Interest Group on
Management of Data), 26(2).
Dave, R. (1996). Validating fuzzy partitions obtained through c-shells clustering. Pattern
Recognition Letters, 10:613–623.
Davies, D. and Bouldin, D. (1979). A cluster separation measure. PIEEE Transactions on
Pattern Analysis and Machine Intelligence
, 1(2).
Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification
learning algorithms. Neural Computation, 10(7):6.
Dong, G. and Li, J. (1998). Research and development in knowledge discovery and
Data Mining. In Proc. 2nd Pacific-Asia Conf. Knowledge Discovery and Data Min-
ing(PAKDD).
Dunn, J. (1974). Well separated clusters and optimal fuzzy partitions. Cybernetics, 4:95–104.
Fayyad, M., Piatesky-Shapiro, G., Smuth, P., and Uthurusamy, R. (1996). Advances in
Knowledge Discovery and Data Mining. AAAI Press.
Gago, P. and Bentos, C. (1998). A metric for selection of the most promising rules. In
Proceedings of the 2nd European Conference on The Pronciples of Data Mining and
Knowledge Discovery (PKDD’98).
Gath, I. and Geva, A. (1989). Unsupervised optimal fuzzy clustering. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 11(7).
Gray, B. and Orlowka, M. (1998). Ccaiia: Clustering categorial attributed into interseting
accociation rules. In Proceedings of the 2nd Pacific-Asia Conference on Knowledge
Discovery and Data Mining (PAKDD ’98).
Guha, S., Rastogi, R., and Shim, K. (1999). Rock: A robust clustering algorithm for categor-
ical attributes. In Proceedings of the IEEE Conference on Data Engineering.
Halkidi, M. and Vazirgiannis, M. (2001a). Clustering validity assessment: Finding the opti-
mal partitioning of a data set. In Proceedings of ICDM. California, USA.
Halkidi, M. and Vazirgiannis, M. (2001b). A data set oriented approach for clustering algo-
rithm selection. In Proceedings of PKDD. Freiburg, Germany.
Halkidi, M. and Vazirgiannis, M. (2002). Clustering validity assessment: Finding the optimal
partitioning of a data set. In Poster paper in the Proceedings of SETN Conference. April,
Thessaloniki, Greece.
Halkidi, M., Vazirgiannis, M., and Batistakis, I. (2000). Quality scheme assessement in the
clustering process. In Proceedings of PKDD. Lyon, France.
Han, J. and Kamber, M. (2001). Data Mining: Concepts and Techniques. Morgan Kaufmann
Publishers.
Hilderman, R. and Hamilton, H. (1999). Knowledge discovery and interestingness measures:
A survey. In Technical Report CS 99-04. Department of Computer Science, University
of Regina.
Jain, A., Murty, M., and Flyn, P. (1999). Data clustering: A review. ACM Computing Surveys,
31(3).
31 Quality Assessment Approaches in Data Mining 639
Liu, H., Hsu, W., and Chen, S. (1997). Using general impressions to analyze discovered
classification rules. In Proceedings of the Third International Conference on Knowledge
Discovery and Data Mining (KDD’97). Newport Beach, California.
MAGOpus. V1.1 software. g.i. webb and assoc. In RuleQuest Research Pty Ltd, 30 Athena
Avenue, St Ives NSW 2075, Australia.
Milligan, G. and Cooper, M. (1985). An examination of procedures for determining the
number of clusters in a data set. Psychometrika, 50(3):159–179.
Pal, N. and Biswas, J. (1997). Cluster validation using graph theoretic concepts. Pattern
Recognition, 30(6).
Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L. (1999). Discovering frequent closed
itemsets for association rules. In Proceedings of the 7th International Conference on
Database Theory.
Pei, J., Han, J., and Mao, R. (2000). Dcloset: An efficient algorithm for mining frequent
closed itemsets. In Proceedings of ACM-SIGMOD International Workshop on Data
Mining and Knowledge Discovery (DMKD’00).
Piatetsky-Shapiro, G. (1991). Discovery analysis and presentation of strong rules. Knowl-
edge Discovery in Databases, AAAI/MIT Press.
Roberto, J., Bayardo, J., Agrawal, R., and Gunopulos, D. (1999). Constraint-based rule
mining in large, dense databases. In Proceedings of the 15th ICDE.
Rokach, L., Averbuch, M., and Maimon, O., Information retrieval system for medical narra-
tive reports. Lecture notes in artificial intelligence, 3055. pp. 217-228, Springer-Verlag
(2004).
Sharma, S. (1996). Applied Multivariate Techniques. John Wiley and Sons.
Smyth, P. (1996). Clustering using monte carlo cross-validation. In Proceedings of KDD
Conference.
Smyth, P. and Goodman, R. (1991). Rule induction using information theory. Knowledge
Discovery in Databases, AAAI/MIT Press.
Snedecor, G. and Cochran, W. (1989). Statistical Methods. owa State University Press,
Ames, IA, 8th Edition.
Theodoridis, S. and Koutroubas, K. (1999). Pattern recognition. Knowledge Discovery in
Databases, Academic Press.
Xie, X. and Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions on
Pattern Analysis and machine Intelligence, 13(4).
Zaki, M. and Hsiao, C. (2002). Charm: An efficient algorithm for closed itemset mining. In
Proceedings of the 2nd SIAM International Conference on Data Mining.
Zhong, N., Yao, Y., and Ohsuga, S. (1999). Peculiarity-oriented multi-database mining.
In Proceedings of the 3rd European Conference on the Principles of Data Mining and
Knowledge Discovery.