Multimedia Information Retrieval
CHANDOS
INFORMATION PROFESSIONAL SERIES
Series Editor: Ruth Rikowski
(Email: )
Chandos’ new series of books is aimed at the busy information professional. They have
been specially commissioned to provide the reader with an authoritative view of current
thinking. They are designed to provide easy-to-read and (most importantly) practical
coverage of topics that are of interest to librarians and other information professionals.
If you would like a full listing of current and forthcoming titles, please visit
www.chandospublishing.com or email or telephone
+44(0) 1223 499140.
New authors: we are always pleased to receive ideas for new titles; if you would like to write
a book for Chandos, please contact Dr Glyn Jones on
or telephone +44 (0) 1993 848726.
Bulk orders: some organisations buy a number of copies of our books. If you are
interested in doing this, we would be pleased to discuss a discount. Please email
or telephone +44 (0) 1223 499140.
Multimedia Information
Retrieval
Theory and techniques
ROBERTO RAIELI
Oxford Cambridge New Delhi
Chandos Publishing
Hexagon House
Avenue 4
Station Lane
Witney
Oxford OX28 4BN
UK
Tel: +44(0) 1993 848726
Email:
www.chandospublishing.com
www.chandospublishingonline.com
Chandos Publishing is an imprint of Woodhead Publishing Limited
Woodhead Publishing Limited
80 High Street
Sawston
Cambridge CB22 3HJ
UK
Tel: +44(0) 1223 499140
Fax: +44(0) 1223 832819
www.woodheadpublishing.com
First published in 2013
ISBN: 978-1-84334-722-4 (print)
ISBN: 978-1-78063-388-6 (online)
Chandos Information Professional Series ISSN: 2052-210X (print) and ISSN: 2052-2118 (online)
Library of Congress Control Number: 2013941270
© R. Raieli, 2013
British Library Cataloguing-in-Publication Data.
A catalogue record for this book is available from the British Library.
All rights reserved. No part of this publication may be reproduced, stored in or introduced into a retrieval
system, or transmitted, in any form, or by any means (electronic, mechanical, photocopying, recording or
otherwise) without the prior written permission of the publisher. This publication may not be lent, resold,
hired out or otherwise disposed of by way of trade in any form of binding or cover other than that in which
it is published without the prior consent of the publisher. Any person who does any unauthorised act in
relation to this publication may be liable to criminal prosecution and civil claims for damages.
The publisher makes no representation, express or implied, with regard to the accuracy of the information
contained in this publication and cannot accept any legal responsibility or liability for any errors or
omissions.
The material contained in this publication constitutes general guidelines only and does not represent to be
advice on any particular matter. No reader or purchaser should act on the basis of material contained in this
publication without first taking professional advice appropriate to their particular circumstances. All
screenshots in this publication are the copyright of the website owner(s), unless indicated otherwise.
This book was originally published in Italian by Editrice Bibliografica s.r.l., Milan, Italy, with the title
Nuovi metodi di gestione dei documenti multimediali.
Revised English edition
Translated by Giles Smith
Typeset by Domex e-Data Pvt. Ltd., India.
Printed in the UK and USA.
List of figures and tables
Figures
1.1
Set of the film 1900, by Bernado Bertolucci
21
1.2
a) Comparisons of the shapes of various pipes for visual
searches of these objects b) Images of the Churchwarden pipe
23
1.3
D-R head
26
2.1
Drawing representing the famous Magritte painting
37
3.1
a) Terminological search attempts applied to a painting
by Roberto Sicilia b) Content-based search founded on
concrete, figurative data on the same painting by
Roberto Sicilia
80
4.1
Example from Grosky: multimedia content-based indexing
95
4.2
Another example from Grosky: content-based
multimedia search
97
4.3
Hierarchy of possible representative levels in a document
121
4.4
Example of ‘collaborative filtering from Amazon’s website
129
5.1
The organic MIR system
139
5.2
Example of a textual-textual search
140
5.3
Example of a textual-visual search
140
5.4
Example of a visual-textual search
141
5.5
Example of a visual-visual search
141
5.6
Example of term-based IR
143
5.7
Example of CBIR
144
5.8
Example of TR
147
ix
Multimedia Information Retrieval
5.9
Example of MIR
148
5.10
a) Formal comparison between search example and
b) archived model
151
5.11
Selection of a colour range as an example for searching
153
5.12
Analysis of the constitutive elements of a video
159
5.13
Video-browsing modes a) Slide show b) Storyboard
160
5.14
A scheme of representative image selection
162
5.15
A recapitulative image of the different video processing
constraints in relation to their computability
163
‘Talk to Me’ interface, didactic system of Automatic
Speech Recognition
165
5.17
‘AudioFex’. AR module of the MUVIS system
166
5.18
‘Beat Histogram’ of different styles of music
167
5.19
‘Similarity matrix’ of a Bach Prelude
168
6.1
Example of the Scheda F on the Album di Romana site
178
6.2
‘Collage summary’ built starting from the descriptive
metadata in a video produced by the Informedia II system
180
Model of the possible applications of MPEG-7 in
information processes
183
7.1
JACOB’s start-up interface demo
196
7.2
Start page of the ECHO site
197
7.3
The MILOS system home page
198
7.4
Search phases in QuickLook. a) Browsing and image
model choice. b) Definition of textual data.
c) System answer and indications of
‘relevance/non-relevance’. d) Final system response
5.16
6.3
x
199–200
7.5
AESS home page
201
7.6
Scanner realize during the VASARI project
203
7.7
Demo of a visual search by VIPER
204
7.8
The ‘PicToSeek’ search screen. a) The system’s selection
interface. b) Upload from the Web of a search image.
c) Individuation of a more precise model.
d) New response by the system
205–6
List of figures and tables
7.9
Demo of the ‘Sphere browser’ of the MediaMill system
207
7.10
Publicity screen for the Shazam system
209
7.11
Application of AudioID via a cell phone
210
7.12
QBIC search interface. a. Search through colour range.
b. Search through structural data
211
Example of a colour search of the Hermitage DC.
a. Definition of the colour range. b. Search results
212
Example of a colour-formal search of the Hermitage DC.
a. Definition of the colours and forms. b. Search results
213
Search interface based on colour histograms on the
WebSEEK browser
214
7.16
WebSEEK module for defining the search histogram
215
7.17
Example of a search through sketches using Retrievr
216
7.18
Interface of the Virage VS LiveMedia system,
developed using VideoLogger technology
217
Operating phases of the Informedia II system
a. Analysis of a documentary video
b. Relationship model of the analyzed elements
218
7.13
7.14
7.15
7.19
7.20
7.21
Phases in the demo driven by Sound Fisher
a. Similarity search b. Filter of the search with specific
varied data c. Addition of music tracks to the system
d. Content-based rearrangement of the archive
220–1
Video Mail Retrieval system model a. Browser
b. Search interface
222
7.22
Screen of Bing Visual Search
224
7.23
Example of Google Goggles’ functionality
225
7.24
Module for radiology content-based analysis from a
system designed at the National Library of Medicine
233
Visual analysis and search screen of the GIS Web
Enterprise Suite system
234
7.25
8.1
Example of content-related structural analysis of a visual
document. a. Segmentation of an image into blocks b. Grey
scale calculation for each block c. Complete light-dark
histogram of the structure
245
xi
Multimedia Information Retrieval
8.2
Automatic analysis model of a multimedia object
a. 3D model b. Structural definition c. Calculation
of the representative vectors
247
Low-level characteristics in a document a. Original VO
b. Form and skeleton c. Extremities and ‘Centre of Gravity’
250
Definitions of the ‘meanings’ of VO using low-level
characteristics: models of bowling, ski slalom, golf,
baseball and ski-jumping
251
8.5
Figurative example of the square-triangles
253
8.6
Example of similarity match in a musical search
a. Search model b. Bach fugue with elements similar
to the model
255
Definition and omission of a search sample
a. Search via model design b. Modification of
retrieved object c. New search using the modified sample
259
Searching in the ‘photographs’ archive of PicToSeek via
the ‘colour invariant’ parameter
262
Search in the ‘graphics’ archive of PicToSeek via the
‘shape invariant’ parameter
263
Demo page of the Video Content Description and
Exploration tool (ViCoDE), one of the products of the
Viper Group
266
8.11
Viper Group website
268
8.12
Model of fingerprint treatment according to the MPEG-7
module
271
8.13
AudioID’s promotional website
273
9.1
Noise example as a consequence of a formal search using
Retrievr
279
Example of information loss in a colour search using the
WebSEEK system
281
Model of an integrated content-based and ‘concept-based’
system developed during the Sculpteur project,
promoted by the European Union
285
MAVIS 2 system architecture
288
8.3
8.4
8.7
8.8
8.9
8.10
9.2
9.3
9.4
xii
List of figures and tables
9.5
9.6
9.7
Demo of colour-formal query using the QBIC system
interface, as applied to the Hermitage Digital Collection
290
Example of a search conducted with contentual and
textual parameters, run using QBIC’s interface developed
for the Sculpteur project
291
Relationship model between VR, OPAC and a Virtual
Library System
296
Tables
4.1
Example of relationships between different image
and user types
104
4.2
‘Concept-based’ and content-based search models
119
5.1
Comparison between IR and MIR systems
136
5.2
The MIR system
137
5.3
Database and index creation
138
5.4
Search and retrieval process
138
5.5
Advanced functions
138
5.6
Query and search model presented by Enser
156
8.1
Steps required for the content-based treatment of
materials and the creation of a database
244
8.2
Operational phases during document search and retrieval
254
9.1
Stages of implementation of the VDR project via the
MILOS operating system
298–9
xiii
Acknowledgments
I acknowledge with thanks Maria Teresa Biagetti, for supporting the
MIR project during my PhD course, and Giovanni Solimine for following
the publication of the book in Italian. A special thank you to Luisa
Marquardt for helping me plan the English version of the book, and to
Michele Costa, head of Editrice Bibliografica, for granting translation
rights of the original edition Nuovi metodi di gestione dei documenti
multimediali (Milano, Bibliografica, 2010).
xv
Main list of abbreviations
AACR
(Anglo-American Cataloging Rules)
AIB
(Associazione Italiana Biblioteche)
AIDA
(Associazione Italiana Documentazione Avanzata)
AR
(Audio Retrieval)
CBIR
(Content Based Information Retrieval)
IFLA
(International Federation of Library Associations and
institutions)
IR
(Information Retrieval)
ISBD
(International Standard Bibliographic Description)
JPEG
(Joint Photographic Experts Group)
LIS
(Library and Information Science)
MARC
(Machine Readable Cataloging)
MIR
(Multimedia Information Retrieval)
MPEG
(Moving Picture Experts Group)
NLP
(Natural Language Processing)
OPAC
(Online Public Access Catalogue)
TR
(Text Retrieval)
TREC
(Text Retrieval Conference)
TRECVID
(TREC Video Retrieval)
VDR
(Video Retrieval)
VR
(Visual Retrieval)
W3C
(World Wide Web Consortium)
OWL
(Web Ontology Language)
XML
(Extensible Mark-up Language)
xvii
Preface to the English edition
Multimedia Information Retrieval
Towards an improved user access and
satisfaction
The production of multimedia works and their increasing availability on
the Internet poses the question about how to search for them, and
successfully retrieve them in an efficient and effective way.
Information Retrieval (IR) has usually been considered a mainly
library-related issue; in terms of information analysis and processing
by librarians (conceptual analysis, content description, indexing,
development and application of thesauri etc.); and, from the user’s
viewpoint, in terms of searching for information and retrieving it
through library catalogues, bibliographic databases etc. In brief, text
retrieval has been the main way to retrieve information, intended as
textual information or information, textually described. In the second
part of the twentieth century, the diffusion of information in electronic
form and, since the mid-1990s, the wealth and availability of non-print
media such as digital objects, music, images, pictures and videos, have
emphasized the user’s role in his/her independency from the library. This
is the so-called disintermediation era, where the intermediary, the
‘middleman’ (Cobo, 2011) is cut out from the production and distribution
of knowledge.
The increasing production (both digitally-born and digitized) on one
hand, and the need for non-print content on the other hand, underline
a growing interaction and integration between humans and
technology. Studies of such a close relationship, and the convergence of
different fields of science and technology shows an emerging eco-infobio-nano-cogno era, with interesting implications in educational terms
xix
Multimedia Information Retrieval
(Cobo and Moravec, 2011) in the field of digital literacy or media and
information literacy education. They are encompassed by the so-called
NBIC paradigm, where nano,1 bio, info and cognitive (NBIC) areas
and technologies converge and sometimes merge. These four areas have
been identified as key ones in the National Science Foundation Report
(NCF, 2003). Creativity and the production of creative works will also
benefit from the development of the NBIC as an integrated field
(Bainbridge et al., 2003). In a futurist and trans-humanist’s view, by
2020 ‘Engineers, artists, architects, and designers will experience
tremendously expanded creative abilities, both with a variety of new
tools and through improved understanding of the wellsprings of human
creativity’ (Orca, 2012). Transformative technologies will help to create
new expressions of arts.2 New forms of creative works will emerge: they
shall not be related or confined only to current art-forms. For instance,
pictures, images and the production of content where images are
fundamental (as in many applied sciences, like medicine, engineering
etc.) are expected to increase significantly. New technological solutions
are flourishing and spreading, like the application of nanotechnologies in
the production and application of nanofibrous media. For instance, the
interest in quantum dots application is ever increasing both in the
research field and in the corporate one: quantum dots3 are particularly
useful in the STEM sector, e.g., for drug discovery (Rosenthal et al., 2011),
or in sectors where images of high quality and definition are required,
and specific technological solutions are needed (for instance, aiming at
better quality pictures, as developed by InVisage.4
The interesting trends in a closer integration of media, with a
consequent increasing convergence (Jenkins, 2008) of media, technology
and humans, remind us that factors – like users’ perspective and
behaviour – have to be taken into account rather more than in the past,
especially when designing tools and planning services that aim at assisting
in retrieving multimedia information. Digital natives (Prensky, 2001a,
2001b) very often use technology in a ‘bricolage’ (tinkering) way
(Oblinger and Oblinger, 2005), and show a clear preference for online
information, available in digital form and accessible 24/7, rather than a
printed version.5 This also affects the way they search for information,
process and use it throughout their academic life. They prefer information
that can be accessed very easily (Ucak, 2007). They multi-task; actively
participate in social media; produce multimedia content; and often have
a need for retrieving music and pictures. They need to find media,
resources, and multimedia information that are relevant to them.
xx
Preface to the English edition
Visuality: visual information needs and visual skills are not exclusive
features of young people today: they are relevant in many professions
(e.g., surgeons). Furthermore, visual queries are proved to be more
efficient and effective in a cross-lingual issue. Images, pictures, music etc.
are usually described and indexed in a textual way – their content is
forced into a textual form – and are retrieved using text (keywords,
descriptors etc.) in traditional IR. The conversion of a text or single
words into an effective image would facilitate the search for information
in multilingualism or in cross-lingual context (Lin, Chang and Chen, 2006).
Multimedia Information Retrieval (MIR) can be crucial to finding media
other than in a textual form, so that the user’s multimedia information
needs can be accurately addressed (Ren and Blackwell, 2009).
In terms of user perspective (and user satisfaction), improved user
access to multimedia content is discussed in many meetings6 and is the
aim of research and projects. There are many useful examples in the
corporate field: for example, Shazam started as ‘a simple service designed
to connect people in the UK with music they heard but didn’t know’
( It has also been the
overall goal of PetaMedia (Peer-To-Peer Tagged Media), a network of
excellence – comprising four national networks from the Netherlands,
Switzerland, the UK and Germany – funded by the EU 7th Framework
Programme, and active from March 2008 to September 2011. It aimed at
building the foundations of ‘a European virtual centre of excellence’,
where multimedia content can be accessed using user-generated annotations
and the structures of peer-to-peer and social networks. Among the
research projects developed within PetaMedia, one is particularly in tune
with the aim of Raieli’s work: Off The Beaten Track (OTBT) is based on
that triple synergy: user-user relationships (i.e., a social network); usermedia interactions (i.e., user-contributed annotations); and a multimedia
collection (i.e., material for multimedia analysis). On this basis, an
interesting prototype Near2Me, an outdoor tourist guide, was developed.
It incorporated the following PetaMedia technologies:
1. “Geotag-based location recommendation;
2. Place naming based on a geotag and textual tags;
3. Retrieval of diversified images for a location, using image properties
and textual tags;
4. Determination of subject-related authority based on comments made
by peers on the user’s uploaded content;
5. Tag clustering and cluster naming;
6. UGC/tag propagation using object duplicate detection”.
xxi
Multimedia Information Retrieval
Many research challenges were faced while developing the prototype at
different levels – interface, technology integration, evaluation – to get useful
information and significant feed-back from user-perspective testing. Near2Me
functions as a tourist guide that helps the tourist to explore an area and find
interesting places to visit, according to his/her (geotagged) location. An
animated video also provides the user with an audio-visual overview of
attractions, landmarks, cultural places etc. Many field trials, involving over
1,000 users, were carried out in order to test and validate the integration of
the triple synergy and the user perspective. Locations, topics and experts
were the most appreciated perspectives by the participants in the study. The
trial then resulted in a balanced combination between the two goals – the
former, technology-oriented, and the latter, user-oriented (PetaMedia,
2012: 12–15). Other projects are also exploring and developing image
query and recognition, users’ interaction etc.7
The shift from the technological dimension to the social, interconnected
and interactive dimension of media and communication shows how
McLuhan’s ideas – the global village and “the medium is the message” – have
been actualizing during the last few years. The still traditional separation
between cold and hot media has been overtaken by the predominance of
software over hardware, which is now shifting to an increasing range
of tools and media. These are characterized by different levels of
integration, flexibility and interactivity; features that make them more
(or less) relevant and useful to a user. They also carry and transmit
lifestyles and values. Furthermore, the content is key. Analysts define
four models of content and related scenarios, with different levels of
privacy, data protection and exchange: Premium content (with a low
level of interactivity and a pay-per-view fruition); Interactive immersion
(e.g., multimedia content); Social media (peer-to-peer, interactive and
social construction and aggregation of content); and the guide’s scenario,
where content is aggregated by users who cannot modify it (Valori,
2009: 224). On one hand, the way content is created, aggregated, used
etc. is affected by the functionality and the features of the platform(s)
where it is made available. On the other hand the user’s competences in
retrieving and processing media and information makes a difference.
Those competences are defined in many ways: MIL or Media and
Information Literacy (UNESCO, s.a.), trans-literacy, multiple literacies,
new literacies, cyber-culture etc. Despite the different emphasis on one or
another aspect, they are undoubtedly crucial not only in personal or
individual terms of retrieving multimedia information. They are relevant
in educational terms, where user education also means both building up
xxii
Preface to the English edition
the cultural competences of understanding and producing information
and media (UNESCO, 2009; 2011), and raising an active and creative
member of society, as recently discussed at ENS – Ecole Normale
Superieure, Cachan, Paris (Frau-Meigs, Bruillard and Delamotte, 2012),
and in terms of providing librarians (and library and multimedia
software developers) with the vital and needed feedback to enhance
MIR.
As briefly described above, MIR seems to hold many perspectives and
great potential to be developed, as mentioned and taken into account by
Roberto Raieli here. Technological solutions and experiences are also
explored in his book, even though they are not the main aim of this
work, the technological and practical field being a fast growing and
changing one: it is honestly hard to keep pace – especially in a book –
with its continuous development. Even though Raieli’s work published
here is mainly a translation of the Italian edition, this work is an accurate
revised edition, with substantial adaption to the international context.8
In this general and dynamic scenario, Raieli’s work is definitely a
welcome and useful contribution that provides the international library
and information community with foundational knowledge on MIR.
The ongoing development of complex multimedia systems for effective
web-mining (Ordonez de Pablos et al., 2013) make MIR an interesting
field for further research, development and enhancement.
Luisa Marquardt, Roma Tre University, Rome
References
ACM (2005). Proceedings of the 7th ACM SIGMM International Workshop on
Multimedia Information Retrieval. New York: ACM.
Bainbridge et al. (2003) ‘Expanding Human Cognition and Communication’ in
Roco, M.C. Baindridge, W.S. eds., ‘Converging Technologies for Improving
Human Performance’. Nanotechnology, Biotechnology, Information Technology
and Cognitive Sciences, Arlington: NSF and Springer.
/>Caballero, L. (2010) ‘Near2Me: Design and Evaluation of a Personalized
Recommender and Explorer for Off-the-beaten-track Travel Destinations’.
S.l.: Stan Ackermans Institute.
Cobo, C., Moravec J. W. (2011) ‘Aprendizaje Invisible. Hacia una Nueva
Ecología de la Educacio´n’. Barcelona (Spain): Universitat de Barcelona.
Online version available in PDF format at URL:
/>
xxiii
Multimedia Information Retrieval
Cobo, C., Scolari, C. and Pardo Kuklinski, H. (2011) ‘Knowledge Production
and Distribution in the Disintermediation Era’. Available at SSRN: http://
ssrn.com/abstract=1920766
/>Frau-Meigs, D., Bruillard, É. and Delamotte, É. (eds) (2012) ‘Le e-Dossiers de
l’Audiovisuel: L’Éducation aux Cultures de l’Information. Support de
Réflexion au Colloque Translittératies’. Enjeux de Citoyenneté et de Céativité.
ENS-Cachan et Université Sorbonne nouvelle 7–9 Novembre 2012.
/>S.l.: Cachan, Paris: INA.160 Online version available at URL:
/>Jenkins, H. (2008) ‘Convergence Culture. Where Old and New Media Collide’.
New York: NYU Press.
Lin, W. C., Chang Y. C. and Chen H. H. (2007) ‘Integrating Textual and Visual
Information for Cross-language Image Retrieval: a Trans-media Dictionary
Approach’. Information Processing Management, 43 (2) (March): 488–502.
Available in PDF format at URL:
/>‘New Learners, New Literacies, New Libraries’. (2008) School Libraries
Worldwide, 14 (2) (January). Available in PDF format at URL:
/>Orca S. (2012) ‘Nano-Bio-Info-Cogno: Paradigm for the Future’. H+,
(12 February), available at URL: />nano-bio-info-cogno-paradigm-future/
Ordóđez de Pablos, P. et al. (2013) ‘Advancing Information Management
through Semantic Web Concepts and Ontologies’. IGI Global, 1-433. Web.
(27 November 2012). doi: 10.4018/978-1-4666–2494-8
PetaMedia. (2008–2011). Research Projects. />PetaMedia. (2012) ‘Project Final Report’. (February 2012). Available in PDF
format at URL: />Ren, F., Bracewell, D. B. (2009) ‘Advanced Information Retrieval’. Electronic
Notes in Theoretical Computer Science, 225 (8) (2 January): 303–317.
Rosenthal, S. J. et al. (2011). ‘Biocompatible Quantum Dots for Biological
Applications’. Chemistry & Biology, 18 (1) (28 January): 10–24.
Tamine-Lechani, L., Boughanem, M., Daoud, M. (2010) ‘Evaluation of Contextual
Information Retrieval Effectiveness: Overview of Issues and Research’.
Knowledge and Information Systems, 24 (1): 134.
Uỗak, Nazan ệzenỗ (2007) Internet Use Habits of Students of the Department
of Information Management, Hacettepe University, Ankara’. The Journal of
Academic Librarianship, 33 (6): 697–707. Available in PDF format at URL:
/>UNESCO (s.a.). ‘Media and Information Literacy’. (webpages) at URL: http://
portal.unesco.org/ci/en/ev.php-URL_ID=15886&URL_DO=DO_
TOPIC&URL_SECTION=201.html
xxiv
Preface to the English edition
UNESCO (2009) ‘Mapping Media Education Policies in the World: Visions,
Programmes and Challenges’. Edited by Divina Frau-Meigs and Jordi Torrent.
New York, USA; Huelva, Spain: The United Nations – Alliance of Civilizations
in collaboration with Grupo Comunicar. Online version available in PDF
format at URL: />UNESCO (2011) ‘Media and Information Literacy Curriculum for Teachers’.
Edited by Alton Grizzle and Carolyn Wilson, Paris: UNESCO. Online version
available in PDF format at URL: />001929/192971e.pdf
Valori, G.E. (2009) ‘Il Futuro è già qui: gli Scenari che Determineranno le
Vicende del nostro Pianeta’. Milano: Rizzoli.
Notes
1. Nano technologies have a high financial potential too, and are seen by venture
capitalists (although some are still reluctant in investing in them) ‘as the next
“big thing” after the dotcom crash’. See, e.g., Siemon, C. (2010). ‘Financing
6th Kondratieff’s Start ups: A Schumpeterian Problem Reconsidered from an
Evolutionary Perspective […]’. Bremen: University of Applied Sciences (SME
Working Papers: 2: 25 />f1/forschung/kmu/002-sme_working_papers_siemon.pdf Nevertheless, an
interesting upward trend from emerging economies (such as the BRICS
countries and their institutions) shows how they have been investing an
increasing amount of money over the last few years. See: Roco, M.C. – Mirkin
C.A. – Hersam M. C. (2010) ‘Nanotechnology Research Directions for
Societal Needs in 2020: Retrospective and Outlook’. NSF, WTEC report.
Berlin and Boston: Springer, available in PDF format at />nano2/Nanotechnology_Research_Directions_to_2020/ The content is also
available both in hard copy and e-book via the Springer website:
/>2. See, for example: the (still discussed) Transhuman Art Gallery that ‘features
a select cast of international artists focused on examining the transformative
technologies of today. The work presented transcends a multitude of media,
including innovations such as 3D printing and virtual reality. The Transhuman
Art Gallery is a virtual collection of vanguard artwork, attempting to evoke
anticipation for the future. The challenge of defining a transhumanist
aesthetic is concerned with an attempt to find new forms of representation’.
/>3. Defined by Rosenthal et al. (2011) as ‘a nanometer-sized crystal of inorganic
semiconductor, or semiconductor nanocrystal’.
4. />5. See, for example, the contributions in the thematic issue: ‘New Learners, New
Literacies, New Libraries’. School Libraries Worldwide, 14(2) (January 2008).
/>6. See, for example, the series of ACM Multimedia Systems Conference http://
www.mmsys.org/?q=node/68 and the contributions in the annual conference
proceedings.
xxv
Multimedia Information Retrieval
7. Like the indoor mobile museum guide application developed for the Olympic
Museum of Lausanne. The application provides audio-visual information
concerning the exhibits of the museum and its goal is to make the visit to the
museum more interactive and enjoyable, as shown in the video:
/>8. The more philosophical and conceptual parts have been reduced; terminology
and references to facts now obsolete have been updated; the structure of the
book has been changed as well, and now shows a different articulation of
chapters and paragraphs. The reference list and the illustrations have been
updated and radically modified.
xxvi
Preface
Never before could it be said that the purpose of librarianship as a
profession is at a turning point.
Traditionally, we have dealt with managing physical objects (documents)
and only marginally did our attention wander to their content; it was
enough to describe them and make them available. From this point of
view, Robert Musil drew a fine, almost caricatured profile of the
librarian: ‘the secret of a good librarian’ – he writes in The Man Without
Qualities – ‘is that he never reads any of the literature in his charge other
than the title and the table of contents. Anyone who lets himself go and
starts reading a book is lost as a librarian!’ However, in the digital
universe, distinctions between the content and the document that
conveys it are becoming increasingly subtle and, while continuing to
exist, sometimes lose their meaning. From document handlers we turn
into – and have already partly become – content handlers.
Is it possible for us not to deal with content when we handle intangible
documents, often not even described as objects? Is it possible for us not to
deal with content in the age of FRBR (Functional Requirements for
Bibliographic Records), that doesn’t just describe documents, but aims to put
them in relation to each other, to go beyond the materiality of the documents,
to take care of the works, their expressions, their manifestations?
The horizon of reference is moving from catalogued mediation, to
information mediation, to documentation mediation.
And if that were not enough, added to this is another equally
significant transformation. I refer to the fact that digital libraries are
storing increasingly heterogeneous objects, for content, for purpose and
for format: digitization results in text or image format, digital native
documents, moving and still images, audio files and other audio-visual
material, teaching aids, hypertexts, 3D paths; and this is without
mentioning other types of materials that are built up by the contribution
of users and come as a result of using resources such as the so-called User
Generated Content (UGC).
xxvii
Multimedia Information Retrieval
One of the consequences of this multifaceted reality can be seen in the
need for libraries and librarians to lay out methods and tools to organize
and research content according to criteria consistent with the nature of
each specific document. A digital library, wanting firstly to be a library
and wishing to offer a high quality mediation service, cannot confine
itself to collections of documents, without setting itself the objective of
making available to users adequate instruments for accessing the
richness, complexity and diversity of their collections.
Conceptually, this is not a novelty in an absolute sense, and in
traditional librarianship we find many consolidated precedents that take
their cue from the same requirement. To mention only one of the most
classic examples of the discipline of cataloguing, we can think of how
catalogues of antique books were prepared, for which there is usually an
analytical description that takes into account the primary interest of those
who consult these tools. This interest is usually not the textual content
but the book itself as an artefact, the circumstances of its publication and
the vicissitudes of its circulation. That’s why over-emphasis is given to the
watermark or tethering, or the typefaces used for printing, or to persons
other than the author equally related to the works, editions and specimens:
translators, editors, commentators, critics, annotators, prefacers, printers,
publishers, booksellers, illustrators, holders and donors, dedicators and
dedicatees, and so on. In this case, librarianship uses a mediating language
tailored to the particular characteristics of the antique book, even though
it always concerns itself with the container and not the content.
In collections of electronic documents, the ability to develop new
languages of mediation and descriptions is enhanced by advanced
navigation and search systems, allowing us to develop working tools
calibrated to the specific needs of the recipients of a specific project. Even
in this case, some examples may be helpful. Within the limits of digital
libraries dealing with literary texts, if we consult Biblioteca Italiana
(BibIt) or LIZ (Letteratura Italiana Zanichelli), we find a very rich
search functionality (free search for terms or elements; proximity search;
previous word search; footnote search, phrase search, character search,
caption search, paragraph search, references search; search for terms in
a foreign language; concordances; and indices, to name just a few). This
enables scholars to work effectively, locating words in a text or a corpus
of texts, generating the contexts in which forms are linked together. It is
no coincidence that this type of project is successful when it manages to
satisfactorily blend three types of competence: the librarian and
mediation, technical and informatics, and the specialist carried by
content experts.
xxviii
Preface
It is no surprise then, that when textual documents join together with
other document types, the need is felt to develop multimedia search
methodologies capable of holding together Text Retrieval (TR) search
methodologies, based on textual information for the treatment and
search of textual documents; with Visual Retrieval (VR) set for searching
visual document; Video Retrieval (VDR) for audio-visual video treatment;
and Audio Retrieval (AR) methodologies based on sound data for
processing and searching audio.
The MIR systems aim to analyze, process and research the objective
content of documents with a content-based approach that aims to
overcome traditional search and analysis based on textual equivalents
describing the content of a document, or term-based systems. Pursuing
this path is vital to making concrete a course of repeatedly enunciated
actions that so far has been little more than sloganeering: here we refer to
the integration between libraries, archives and museums which the
European Union has promoted.
Roberto Raieli is no stranger to studies on this issue, and has already
demonstrated mastery of the subject. I remember, along with other minor
contributions, a collection of essays edited in 2004 along with Perla
Innocenti,1 for AIDA; and the volume that presented the proceedings of the
seminar promoted by AIB in December of the same year at the Roma Tre
University.2 Years and years of analysis and study, broadened by three years
of work done as part of a PhD, have now ripened and, thanks to a timely
and rigorous description of the state of the art, what begins to emerge is an
organic treatment strategy for documents and information that aims to
resolve a fascinating matter fraught with difficulties: To give users a
research methodology based on the ‘objective’ informative content of
documents, which refers to their contents and their forms of expression.
Having clearly identified the direction in which to move does not mean
to say the search is over. This book shows the prospects of study that must
continue to be investigated, enriching the results of theoretical research
with the results of experiments extended to a significant body of documents,
with contributions coming from users and observations of their behaviour.
This volume describes the sectors and contexts in which these methodologies
can find a profitable application and recounts the most interesting
experiences that have already been made at an international level.
Thanks to the possibilities offered by technology, the revolutionary
MIR system can allow document search by applying storage and retrieval
techniques that operate directly on the content of digital objects within
databases, to search for images, audio-visual and sound, as texts,
exploiting the specific language characterizing each type of document.
xxix
Multimedia Information Retrieval
The scope of this deeply innovative methodology has the eyes of the
world on it, with everyone waiting to see it being achieved, and the
fascinating proposals illustrated in this book, finally realized and fully
operational.
‘For every document there is its retrieval’ Ranganathan would say.
Giovanni Solimine, Sapienza University, Rome
Notes
1. Roberto Raieli, Perla Innocenti (eds.) (2004) ‘Multimedia Information Retrieval’.
Roma: AIDA.
2 Roberto Raieli (ed.) (2005) ‘L’ informazione multimediale dal presente al
futuro’. Roma: AIB Lazio.
xxx
About the author
Roberto Raieli is a librarian in the Roma Tre University Arts Library,
Italy. Roberto has collaborated with both scientific and humanities
libraries, and has been involved in studies on digital libraries and
multimedia information, on which he has published. Roberto is on the
editorial staff of the Italian LIS journal AIB Studi (old Bollettino AIB),
and is a member of groups dealing with electronic resources, virtual
libraries, and open archives. Roberto has expertise in film direction; he
has directed various theatre plays and short films; and he has been
published on a wide range of subjects, also founding and directing the
Italian literary journal Línfera. Roberto holds a degree in Philosophy,
and a degree and a PhD in Library and Information Science.
Important precedents of Roberto’s publications regarding Multimedia
Information Retrieval are the international book MultiMedia Information
Retrieval: metodologie ed esperienze internazionali di content-based
retrieval per l’informazione e la documentazione (Multimedia
Information Retrieval: Methodologies and International Experiences of
Content-based Retrieval for the Information and the Documentation),
edited by Roberto Raieli and Perla Innocenti (Roma, AIDA, 2004); the
book L’informazione multimediale dal presente al futuro: le prospettive
del MultiMedia Information Retrieval (The Multimedia Information
from the Present to the Future: the Perspectives of the Multimedia
Information Retrieval), edited by Roberto Raieli (Roma, AIB Lazio,
2005); and the book Nuovi metodi di gestione dei documenti
multimediali: principi e pratica del MultiMedia Information Retrieval
(New Methods of Management of Multimedia Documents: Principles
and Practice of Multimedia Information Retrieval) (Milano, Editrice
Bibliografica, 2010). Moreover, he has a series of articles published on
Knowledge Organization, Bollettino AIB, AIDA Informazioni, and
other periodicals and books on different subjects.
xxxi