72 CHAPTER 4 Requirements Analysis and Conceptual Data Modeling
At this point we have sufficient commonality between schemas to
attempt a merge. In schemas 1 and 2.2 we have two sets of common
entities, Department and Topic-area. Other entities do not overlap and
must appear intact in the superimposed, or merged, schema. The merged
schema, schema 3, is shown in Figure 4.7a. Because the common entities
are truly equivalent, there are no bad side effects of the merge due to
existing relationships involving those entities in one schema and not in
the other. (Such a relationship that remains intact exists in schema 1
between Topic-area and Report, for example.) If true equivalence cannot
be established, the merge may not be possible in the existing form.
In Figure 4.7, there is some redundancy between Publication and
Report in terms of the relationships with Department and Topic-area.
Such a redundancy can be eliminated if there is a supertype/subtype
relationship between Publication and Report, which does in fact occur
in this case because Publication is a generalization of Report. In schema
4.1 (Figure 4.7b) we see the introduction of this generalization from
Report to Publication. Then in schema 4.2 (Figure 4.7c) we see that the
Figure 4.7 View integration: the merged schema
Publication
includeshas
NN
NN
N
N
N1
1N
contains
research-
area
written-for
title
title
address
(a) Schema 3, the result of merging schema 1 and schema 2.2
code
code
namename
name
Report
Contractor
publishesDepartment
Topic-area
address
Teorey.book Page 72 Saturday, July 16, 2005 12:57 PM
4.4 View Integration 73
(b) Schema 3.1, new generalization
(c) Schema 3.2, elimination of redundant relationships
Figure 4.7 (continued)
Publication
includeshas
d
NN
NN
N
N
N1
1N
contains
research-
area
written-for
title
title
address
code
code
namename
name
Report
Contractor
publishesDepartment
Topic-area
address
Publication
includeshas
d
NN
N
N
1
N
research-
area
written-for
title
address
code
code
namename
name
Report
Contractor
Department Topic-area
address
Teorey.book Page 73 Saturday, July 16, 2005 12:57 PM
74 CHAPTER 4 Requirements Analysis and Conceptual Data Modeling
redundant relationships between Report and Department and Topic-area
have been dropped. The attribute “title” has been eliminated as an
attribute of Report in Figure 4.7c because “title” already appears as an
attribute of Publication at a higher level of abstraction; “title” is inher-
ited by the subtype Report.
The final schema, in Figure 4.7c, expresses completeness because all
the original concepts (report, publication, topic area, department, and
contractor) are kept intact. It expresses minimality because of the
transformation of “dept-name” from attribute in schema 1 to entity
and attribute in schema 2.2, and the merger between schema 1 and
schema 2.2 to form schema 3, and because of the elimination of “title”
as an attribute of Report and of Report relationships with Topic-area
and Department. Finally, it expresses understandability in that the
final schema actually has more meaning than the individual original
schemas.
The view integration process is one of continual refinement and
reevaluation. It should also be noted that minimality may not always be
the most efficient way to proceed. If, for example, the elimination of the
redundant relationships “publishes” and/or “contains” from schema 3.1
to 3.2 causes the time required to perform certain queries to be exces-
sively long, it may be better from a performance viewpoint to leave
them in. This decision could be made during the analysis of the transac-
tions on the database or during the testing phase of the fully imple-
mented database.
4.5 Entity Clustering for ER Models
This section presents the concept of entity clustering, which abstracts
the ER schema to such a degree that the entire schema can appear on a
single sheet of paper or a single computer screen. This has happy conse-
quences for the end user and database designer in terms of developing a
mutual understanding of the database contents and formally document-
ing the conceptual model.
An entity cluster is the result of a grouping operation on a collection
of entities and relationships. Entity clustering is potentially useful for
designing large databases. When the scale of a database or information
structure is large and includes a large number of interconnections
among its different components, it may be very difficult to understand
the semantics of such a structure and to manage it, especially for the end
users or managers. In an ER diagram with 1,000 entities, the overall
Teorey.book Page 74 Saturday, July 16, 2005 12:57 PM
4.5 Entity Clustering for ER Models 75
structure will probably not be very clear, even to a well-trained database
analyst. Clustering is therefore important because it provides a method
to organize a conceptual database schema into layers of abstraction, and
it supports the different views of a variety of end users.
4.5.1 Clustering Concepts
One should think of grouping as an operation that combines entities
and their relationships to form a higher-level construct. The result of a
grouping operation on simple entities is called an entity cluster. A group-
ing operation on entity clusters, or on combinations of elementary enti-
ties and entity clusters, results in a higher-level entity cluster. The high-
est-level entity cluster, representing the entire database conceptual
schema, is called the root entity cluster.
Figure 4.8a illustrates the concept of entity clustering in a simple
case where (elementary) entities R-sec (report section), R-abbr (report
abbreviation), and Author are naturally bound to (dominated by) the
entity Report; and entities Department, Contractor, and Project are not
dominated. (Note that to avoid unnecessary detail, we do not include
the attributes of entities in the diagrams.) In Figure 4.8b, the dark-bor-
dered box around the entity Report and the entities it dominates defines
the entity cluster Report. The dark-bordered box is called the EC box to
represent the idea of an entity cluster. In general, the name of the entity
cluster need not be the same as the name of any internal entity; how-
ever, when there is a single dominant entity, the names are often the
same. The EC box number in the lower-right corner is a clustering-level
number used to keep track of the sequence in which clustering is done.
The number 2.1 signifies that the entity cluster Report is the first entity
cluster at level 2. Note that all the original entities are considered to be
at level 1.
The higher-level abstraction, the entity cluster, must maintain the
same relationships between entities inside and outside the entity cluster
as occur between the same entities in the lower-level diagram. Thus, the
entity names inside the entity cluster should appear just outside the EC
box along the path of their direct relationship to the appropriately
related entities outside the box, maintaining consistent interfaces (rela-
tionships) as shown in Figure 4.8b. For simplicity, we modify this rule
slightly: If the relationship is between an external entity and the domi-
nant internal entity (for which the entity cluster is named), the entity
cluster name need not be repeated outside the EC box. Thus, in Figure
4.8b, we could drop the name Report both places it occurs outside the
Teorey.book Page 75 Saturday, July 16, 2005 12:57 PM
76 CHAPTER 4 Requirements Analysis and Conceptual Data Modeling
Report box, but we must retain the name Author, which is not the name
of the entity cluster.
4.5.2 Grouping Operations
Grouping operations are the fundamental components of the entity
clustering technique. They define what collections of entities and rela-
tionships comprise higher-level objects, the entity clusters. The opera-
tions are heuristic in nature and include (see Figure 4.9):
Figure 4.8 Entity clustering concepts
N N
N1
NN
1
N
1N
has
(a) ER model before clustering
Report
Author
Project
Department
Contractor
has
does
does
hasin
11
R-abbr
R-sec
(b) ER model after clustering
NNReportReport
NN
Author
Project
Department
Contractor
has
does
does
11
Report
(entity cluster)
2.1
Teorey.book Page 76 Saturday, July 16, 2005 12:57 PM