2454
Semantic Knowledge Transparency in E-Business Processes
Semantic Interoperability. Semantic in-
teroperability³LVDG\QDPLFHQWHUSULVHFDSDELOLW\
derivate from the application of special software
technologies (such as reasoners, inference engines,
ontologies, and models) that infer, relate, and
classify the implicit meanings of digital content
without human involvement—which in turn
drive adaptive business processes, enterprise
knowledge, business rules, and software ap-
plication interoperability” (Pollock & Hodgson,
2004, p. 6)
Semantic Knowledge Transparency. Se-
mantic knowledge transparency LVGH¿QHGDV the
G\QDPLFRQGHPDQGDQGVHDPOHVVÀRZRIUHOHYDQW
and unambiguous, machine-interpretable knowl-
edge resources within organizations and across
inter-organizational systems of business partners
engaged in collaborative processes.
TBox. TBox contains intentional knowledge
in the form of a terminology and is built through
declarations that describe general properties of
concepts (Baader et al., 2003; Gomez-Perez et
al., 2004).
This work was previously published in Semantic Web Technologies and E-Business: Toward the Integrated Virtual Organiza-
tion and Business Process Automation, edited by A. Salam and J. Stevens, pp. 255-286, copyright 2007 by IGI Publishing (an
imprint of IGI Global).
2455
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Chapter 8.7
Enhancing E-Business on the
Semantic Web through
Automatic Multimedia
Representation
Manjeet Rege
Wayne State University, USA
Ming Dong
Wayne State University, USA
Farshad Fotouhi
Wayne State University, USA
ABSTRACT
With the evolution of the next generation Web—
the Semantic Web—e-business can be expected
to grow into a more collaborative effort in which
businesses compete with each other by collabo-
rating to provide the best product to a customer.
Electronic collaboration involves data interchange
with multimedia data being one of them. Digital
multimedia data in various formats have increased
tremendously in recent years on the Internet. An
automated process that can represent multimedia
data in a meaningful way for the Semantic Web
is highly desired. In this chapter, we propose an
automatic multimedia representation system for
the Semantic Web. The proposed system learns
DVWDWLVWLFDOPRGHOEDVHGRQWKHGRPDLQVSHFL¿F
training data and performs automatic semantic
annotation of multimedia data using eXtensible
Markup Language (XML) techniques. We dem-
onstrate the advantage of annotating multimedia
data using XML over the traditional keyword
based approaches and discuss how it can help
e-business.
2456
Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation
INTRODUCTION
An Internet user typically conducts separate in-
dividual e-business transactions to accomplish a
certain task. A tourist visiting New York might
purchase airfare tickets and tickets to a concert
in New York separately. With the evolution of the
Semantic Web, as shown in Figure 1, the user can
conduct one collaborative e-business transaction
for the two purchases. Moreover, he/she can also
take a virtual tour of New York city online, which
actually might be a collection of all videos, images,
and songs on New York appearing anywhere on
the World Wide Web. With the continuing growth
and reach of the Web, the multimedia data avail-
able on it continue to grow on a daily basis. For
a successful collaborative e-business, in addition
to other kinds of data, it is important to be able
to organize and search the multimedia data for
the Semantic Web.
With the Semantic Web being the future of
the World Wide Web of today, there has to be an
HI¿FLHQWZD\ WR UHSUHVHQW WKH PXOWLPHGLD GDWD
automatically for it. Multimedia data impose a
great challenge to document indexing and retrieval
as it is highly unstructured and the semantics are
implicit in the content of it. Moreover, most of
the multimedia contents appearing on the Web
have no description available with it in terms of
keywords or captions. From the Semantic Web
point of view, this information is crucial because
it describes the content of multimedia data and
would help represent it in a semantically meaning-
ful way. Manual annotation is feasible on a small
set of multimedia documents but is not scalable as
the number of multimedia documents increases.
Hence, performing manual annotation of all Web
PXOWLPHGLDGDWDZKLOH³PRYLQJ´WKHPWRWKH6H-
mantic Web domain is an impossible task. This
we believe is a major challenge in transforming
today’s Web multimedia data into tomorrow’s
Semantic Web data.
In this chapter, we propose a generic auto-
matic multimedia representation solution for the
Semantic Web—an XML-based (Bray, Paoli, &
Sperberg-McQueen, 1998) automatic multimedia
representation system. The proposed system is
implemented using images as an example and
S H U IRU P V G R P D L Q V S H F L ¿FD Q Q RW D W LRQXVL QJ; 0 /
6SHFL¿FDOO\ RXU V\VWHP ³OHDUQV´ IURP DVHWRI
GRPDLQVSHFL¿FWUDLQLQJLPDJHVPDGHDYDLODEOHWR
it a priori. Upon receiving a new image from the
Web that belongs to one of the semantic catego-
ries the system has learned, the system generates
appropriate XML-based annotation for the new
LPDJHPDNLQJLW³UHDG\´IRUWKH6HPDQWLF:HE
Although the proposed system has been described
from the perspective of images, in general it is
Figure 1. Collaborative e-business scenario on the Semantic Web
2457
Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation
applicable to many kinds of multimedia data
available on the Web today. To our best knowl-
edge, there has been no work done on automatic
multimedia representation for the Semantic Web
using the semantics of XML. The proposed system
LVWKH¿UVWZRUNLQWKLVGLUHFWLRQ
BACKGROUND
The term e-business in general refers to online
transactions conducted on the Internet. These are
PDLQO\FODVVL¿HGLQWRWZRFDWHJRULHVbusiness-
to-consumer (B2C) and business-to-business
(B2B). One of the main differences between
these two kinds of e-businesses is that B2C, as
the name suggests, applies to companies that sell
their products or offer services to consumers over
the Internet. B2B on the other hand are online
transactions conducted between two companies.
From its initial introduction in late 1990s, e-busi-
ness has grown to include services such as car
rentals, health services, movie rentals, and online
banking. The Web site CIO.com (2006) reports
that North American consumers have spent $172
billion shopping online in 2005, up from $38.8
billion in 2000. Moreover, e-business is expected
to grow even more in the coming years. By 2010,
consumers are expected to spend $329 billion each
y e a r o n l i n e . We e x p e c t t h e e v o l v i n g S e m a n t i c We b
WRSOD\DVLJQL¿FDQWUROHLQHQKDQFLQJWKHZD\H
business is done today. However, as mentioned
in the earlier section, there is a need to represent
the multimedia data on the Semantic Web in an
HI¿FLHQWZD\,QWKHIROORZLQJVHFWLRQZHUHYLHZ
some of the related work done on the topic.
Ontology/Schema-Based
Approaches
Ontology-based approaches have been frequently
used for multimedia annotation and retrieval.
Hyvonen, Styrman, and Saarela (2002) proposed
ontology-based image retrieval and annotation
of graduation ceremony images by creating hi-
erarchical annotation. They used Protégé (n.d.)
DVWKHRQWRORJ\HGLWRUIRUGH¿QLQJWKHRQWRORJ\
and annotating images. Schreiber, Dubbeldam,
Wielemaker, and Wielinga (2001) also performed
ontology-based annotation of ape photographs,
LQ ZKLFK WKH\ XVH WKH VDPH RQWRORJ\ GH¿QLQJ
DQG DQQRWDWLRQ WRRO DQG XVH 5HVRXUFH 'H¿QL-
tion Framework (RDF) Schema as the output
language. Nagao, Shirai, and Squire (2001) have
developed a method for associating external an-
notations to multimedia data appearing over the
Web. Particularly, they discuss video annotation
by performing automatic segmentation of video,
semiautomatic linking of video segments, and
interactive naming of people and objects in video
frames. More recently, Rege, Dong, Fotouhi, Sia-
dat, and Zamorano (2005) proposed to annotate
human brain images using XML by following
the MPEG-7 (Manjunath, 2002) multimedia
standard. The advantages of using XML to store
meta-information (such as patient name, surgery
location, etc.), as well as brain anatomical infor-
mation, has been demonstrated in a neurosurgical
domain. The major drawback of the approaches,
mentioned previously, is that the image annotation
is performed manually. There is an extra effort
needed from the user’s side in creating the ontol-
ogy and performing the detailed annotation. It is
highly desirable to have a system that performs
automatic semantic annotation of multimedia
data on the Internet.
Keyword-Based Annotations
Automatic image annotation using keywords
has recently received extensive attention in the
research community. Mori, Takahashi, and Oka
(1999) developed a co-occurrence model, in which
they looked at the co-occurrence of keywords with
image regions. Duygulu, Barnard, Freitas, and
Forsyth (2002) proposed a method to describe
images using a vocabulary of blobs. First, regions
are created using a segmentation algorithm. For
2458
Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation
each region, features are computed and then blobs
are generated by clustering the image features for
these regions across images. Finally, a translation
model translates the set of blobs of an image to
a set of keywords. Jeon, Lavrenko, and Man-
matha (2003) introduced a cross-media relevance
model that learns the joint distribution of a set of
regions and a set of keywords rather than the cor-
respondence between a single region and a single
keyword. Feng, Manmatha, and Lavrenko (2004)
proposed a method of automatic annotation by
partitioning each image into a set of rectangular
regions. The joint distribution of the keyword
annotations and low-level features is computed
from the training set and used to annotate test-
ing images. High annotation accuracy has been
reported. The readers are referred to Barnard,
Duygulu, Freitas, and Forsyth (2003) for a com-
p r e h e n s i ve revie w o n t h i s t o p i c . A s we poi n t o u t i n
WKHVHFWLRQ³;0/%DVHG$QQRWDWLRQ´NH\ZRUG
annotations do not fully express the semantic
meaning embedded in the multimedia data. In
this paper, we propose an Automatic Multimedia
Representation System for the Semantic Web
using the semantics of XML, which enables ef-
¿FLHQWPX OW LPH GLDDQQRWDWLRQD QGUHW U LHYDOE DVHG
on the domain knowledge. The proposed work is
WKH¿UVWDWWHPSWLQWKLVGLUHFWLRQ
PROPOSED FRAMEWORK
In order to represent multimedia data for the
Semantic Web, we propose to perform automatic
multimedia annotation using XML techniques.
Though the proposed framework is applicable to
multimedia data in general, we provide details
about the framework using image annotations
as a case study.
XML-Based Annotation
$QQRWDWLRQVDUHGRPDLQVSHFL¿FVHPDQWLFLQIRU-
mation assigned with the help of a domain expert
to semantically enrich the data. The traditional
approach practiced by image repository librarians
i s t o a n n o t a t e e a c h i m a g e m a n u a l l y w i t h k e y w o r d s
or captions and then search on those captions or
keywords using a conventional text search engine.
The rationale here is that the keywords capture
the semantic content of the image and help in
retrieving the images. This technique is also
used by television news organizations to retrieve
¿OHIRRWDJHIURPWKHLUYLGHRV6XFKWHFKQLTXHV
DOORZWH[WTXHULHVDQGDUHVXFFHVVIXOLQ¿QGLQJ
the relevant pictures. The main disadvantage with
PDQXDODQQRWDWLRQVLVWKHFRVWDQGGLI¿FXOW\RI
scaling it to large numbers of images.
MPEG-7 (Manjunath, 2002, p. 8) describes
WKHFRQWHQW²³WKHELWVDERXWWKHELWV´²RIDPXO-
WLPHGLD¿OHVXFKDVDQLPDJHRUDYLGHRFOLS7KH
MPEG-7 standard has been developed after many
r o u n d s o f c a r e f u l d i s c u s s i o n . I t i s e x p e c t e d t h a t t h i s
standard would be used in searching and retrieving
for all types of media objects. It proposes to store
low-level image features, annotations, and other
PHWDLQIRUPDWLRQLQRQH;0/¿OHWKDWFRQWDLQV
a reference to the location of the corresponding
LPDJH¿OH;0/KDVEURXJKWJUHDWIHDWXUHVDQG
promising prospects to the future of the Semantic
Web and will continue to play an important role in
its development. XML keeps content, structure,
and representation apart and is a much more
adequate means for knowledge representation.
It can represent semantic properties through its
syntactic structure, that is, by the nesting or se-
quentially ordering relationship among elements
(XML tags). The advantage of annotating mul-
timedia using XML can best be explained with
the help of an example. Suppose we have a New
York image (shown in Figure 2) with keywords
annotation of St at ue of Liber ty, Sea, Clouds, Sk y.
Instead of simply using keywords as annotation
for this image, consider now that the same image
is represented in an XML format.
Note that the XML representation of the im-
DJHFDQFRQIRUPWRDQ\GRPDLQVSHFL¿F;0/
schema. For the sake of illustration, consider
2459
Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation
the XML schema and the corresponding XML
representation of the image shown in Figure
3. This XML schema stores foreground and
background object information along with other
meta-information with keywords along various
SDWKVRIWKH;0/¿OH&RPSDUHGZLWKNH\ZRUG
based approaches, the XML paths from the root
node to the keywords are able to fully express
the semantic meaning of the multimedia data.
In the case of the New York image, semanti-
cally meaningful XML annotations would be
³LPDJHVHPDQWLFIRUHJURXQGREMHFW 6WDWXH RI
Liberty, image/semantic/foreground/ object =
Sea, image/semantic/ background/ object = Sky,
image/semantic/background /object =Clouds”.
The semantics in XML paths provides us with an
added advantage by differentiating the objects in
the foreground and background and giving more
meaningful annotation.
We emphasize that the annotation performed
XVLQJRXUDSSURDFKLVGRPDLQVSHFL¿FNQRZOHGJH
The same image can have different annotation
under a different XML schema that highlights
certain semantic characteristics of importance
pertaining to that domain knowledge. We simply
use the schema of Figure 3 that presents image
foreground and background object information
as a running example.
Overview of System Architecture
The goal of the proposed system is to represent
multimedia data obtained from the Web in a
meaningful XML format. Consequently, this
GDWDFDQEH³PRYHG´WRWKH6HPDQWLF:HELQDQ
DXWRPDWLFDQGHI¿FLHQWZD\)RUH[DPSOHDVVKRZQ
LQ)LJXUHWKHV\VWHP¿UVWUHFHLYHVDQLPDJH
from the Web. The image could be received by
a Web image provider which is an independent
module outside of the system that simply fetches
GRPDLQVSHFL¿FLPDJHVIURPWKH:HEDQGSDVVHV
them onto our system. The Web image provider
FRXOGDOVREHD³:HEVSLGHU´WKDW³FUDZOV´DPRQJ
GRPDLQVSHFL¿F:HEGDWDVRXUFHVDQGSURFXUHV
relevant images. The image is then preprocessed
by two other modules, namely, image divider and
feature extractor. An image usually contains sev-
eral regions. Extracting low-level features from
GLIIHUHQWLPDJHUHJLRQVLVW\SLFDOO\WKH¿UVWVWHS
of automatic image annotation since regions may
have different contents and represent different
semantic meaning. The image regions could be
determined through either image segmentation
(Shi & Malik, 1997) or image cutting in the image
divider. For low-level feature extraction, we used
some of the features standardized by MPEG-7.
The low-level features extracted from all the
regions are passed on to the automatic annotator.
This module learns a statistical model that links
image regions and XML annotation paths from a
VHWRIGRPDLQVSHFL¿FWUDLQLQJLPDJHV7KHWUDLQ-
ing image database can contain images belonging
to various semantic categories represented and
annotated in XML format. The annotator learns
to annotate new images that belong to at least
one of the many semantic categories that the
Figure 2. Comparison of keyword annotation and XML-path-based annotation
2460
Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation
Figure 3. An example of an XML schema and the corresponding XML representation of an image
Figure 4. System architecture
2461
Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation
annotator has been trained on. The output of the
automatic annotator is an XML representation
of the image.
Statistical Model for Automatic
Annotation
In general, image segmentation is a computation-
ally expensive as well as an erroneous task (Feng
et al., 2004). As an alternative simple solution,
we have the image divider partition each image
into a set of rectangular regions of equal sizes.
The feature extractor extracts low-level features
from each rectangular region of every image and
constructs a feature vector. By learning the joint
probability distribution of XML annotation paths
and low-level image features, we perform the
automatic annotation of a new image.
Let X denote the set of XML annotation paths,
7GHQRWHWKHGRPDLQVSHFL¿FWUDLQLQJLPDJHVLQ
XML format, and let t be an image belonging to
T. Let x
t
be a subset of X containing the annota-
tion paths for t. Also, assume that each image is
divided into n rectangular regions of equal size.
Consider a new image q not in the training
set. Let f
q
= {f
q1
, f
q2
,… f
qn
} denote the feature
vector for q. In order to perform automatic an-
notation of q, we model the joint probability of
f
q
and any arbitrary annotation path subset x of
X as follows,
P(x, f
q
) = P(x, f
q1
, f
q2
, … , f
qn
) (1)
We use the training set T of annotated images
to estimate the joint probability of observing x
and {f
q1
, f
q2, …… ,
f
qn
} by computing the expectation
over all the images in the training set.
P(x, f
q1
, f
q2
, …., f
qn
) = P(t) P(x, f
q1
, f
q2
, … , f
qn
| t )
(2)
We assume that the events of observing x and
f
q1,
f
q2,
f
qn
are mutually independent of each other
and express the joint probability in terms of P
A
,
P
B
and P
C
as follows:
P(x, f
q1
, f
q2
, … f
qn
) = { P
A
(t) P
B
(f
a
|t) P
C
(path|t)
(1 – P
C
(path|t)) } (3)
where P
A
is the prior probability of selecting each
training image, P
B
is the density function respon-
sible for modeling the feature vectors, and P
C
is a
multiple Bernoulli distribution for modeling the
XML annotation paths.
In the absence of any prior knowledge of the
training set, we assume that P
A
follows a uniform
prior and can be expressed as:
P
A
=
||||
1
T
(4)
where ||T|| is the size of the training set. For the
distribution P
B
, we use a nonparametric, kernel-
based density estimate:
P
B
(f |t) =
¦
¦
¦
i
kk
i
T
i
ffff
n
||2
)}()(exp{
1
1
(5)
where f
i
belongs to {f
1
,f
2
,…,f
n
} the set of all low-
level features computed for each rectangular
region of ¦image t. is the diagonal covariance
matrix which is constructed empirically for best
annotation performance.
In the XML representation of images, every
annotation path can either occur or might not
occur at all for an image. Moreover, as we an-
notate images based on object presence and not
on prominence in an image, an annotation path
if it occurs can occur—at most—once in the
XML representation of the image. As a result,
it is reasonable to assume that the density func-
tion P
C
follows a multiple Bernoulli distribution
as follows:
P
C
(path|t) =
||)||(
)(
,
T
N
pathtpath
(6)
where J is a smoothing parameter, if the path
occurs in the annotation of image t, else it is zero.
2462
Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation
N
path
is the total number of training images that
contain this path in their annotation.
EXPERIMENTAL RESULTS
Our image database contains 1,500 images ob-
tained from the Corel data set, comprising 15
image categories with 100 images in each cat-
egory. The Corel image data set contains images
from different semantic categories with keyword
annotations performed by Corel employees. In
order to conduct our experiments, we require a
training image database representing images in
;0/IRUPDW(DFK;0/¿OHVKRXOGFRQWDLQDQ-
notation, low-level features, and other meta-infor-
mation stored along different XML paths. In the
absence of such a publicly available data, we had
to manually convert each image in the database to
an XML format conforming to the schema shown
in Figure 3. We performed our experiments on
¿YH UDQGRPO\ VHOHFWHG LPDJH FDWHJRULHV (DFK
image category represents a distinct semantic
concept. In the experiments, 70% of the data are
randomly selected as the training set while the
remaining were used for testing.
Automatic Annotation Results
Given a test image, we calculate the joint prob-
ability of the low-level feature vector and the XML
annotation paths in the training set. We select the
top four paths with the highest joint probability
as the annotation for the image. Compared with
other approaches in image annotation (Duygulu
et al., 2002; Feng et al., 2004), our annotation
results provide more meaningful description of
a given image.
Figure 5 shows some examples of our anno-
tation results. We can clearly see that the XML-
path-based annotation contains richer semantic
meaning than the original keyword provided by
Corel. We evaluate the image annotation perfor-
mance in terms of recall and precision. The recall
and precision for every annotation path in the test
set is computed as follows:
recall =
r
q
precision =
s
q
where q is the number of images correctly an-
notated by an annotation path, r is the number
Figure 5. Examples of top annotation in comparison with Corel keyword annotation
2463
Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation
of images having that annotation path in the test
set, and s is the number of images annotated by
the same path. In Table 1 we report the results for
all the 148 paths in the test set as well as the 23
best paths as in Duygulu et al. (2002) and Feng
et al. (2004).
Retrieval Results
*LYHQVSHFL¿FTXHU\FULWHULD;0/UHSUHVHQWD-
WLRQKHOSVLQHI¿FLHQWUHWULHYDORILPDJHVRYHU
WKH6HPDQWLF:HE6XSSRVHDXVHUZDQWVWR¿QG
images that have an airplane in the background
and people in the foreground. State-of-the-art
search engines require the user to supply indi-
YLGXDO NH\ZRUGV VXFK DV ³DLUSODQH´ ³SHRSOH´
and so forth or any combination of keywords
as a query. The union of the retrieved images of
all possible combinations of the aforementioned
quer y keywords is sure to have images satisfying
WKHXVHUVSHFL¿HGFULWHULD
However, a typical search engine user search-
LQJIRULPDJHVLVXQOLNHO\WRYLHZEH\RQGWKH¿UVW
Table 1. Annotation results
Number of Paths with recall > 0 is 50
Annotation Results Results on all 148 paths Results on top 23 paths
Mean per-path recall
0.22 0.83
Mean per-path precision
0.21 0.73
Figure 6. Ranked retrieval for the query image/semantic/background/object =“sky”
15-20 retrievals, which may be irrelevant in this
case. As a result, the user query in this scenario
is unanswered in spite of images satisfying the
VSHFL¿HGFULWHULDEHLQJSUHVHQWRQWKH:HE:LWK
the proposed framework, the query could be an-
VZHUHGLQDQHI¿FLHQWZD\6LQFHDOOWKHLPDJHV
on the Semantic Web are represented in an XML
format, we can use XML querying technologies
such as XQuery (Chamberlin, Florescu, Robie,
Simeon, & Stefanascu, 2001) and XPath (Clark
& DeRose, 1999) to retrieve images for the query
³LPDJHVHPDQWLFEDFNJURXQGREMHFW SODQH
image/semantic/foreground/object = people”.
This is unachievable with keyword-based queries
and hence is a major contribution of the proposed
work.
Figure 6 shows some examples of the retrieval
results. In Table 2, we also report the mean aver-
age precision obtained for ranked retrieval as in
Feng et al. (2004).
6LQFHWKHSURSRVHGZRUNLVWKH¿UVWRQHRILWV
kind to automatically annotate images using XML
p a t h s , w e w e r e u n a b l e t o m a k e a d i r e c t c o m p a r i s o n