Lecture Notes in Computer Science 4478
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
University of Dortmund, Germany
Madhu Sudan
Massachusetts Institute of Technology, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Moshe Y. Vardi
Rice University, Houston, TX, USA
Gerhard Weikum
Max-Planck Institute of Computer Science, Saarbruecken, Germany
Joan Martí José Miguel Benedí
Ana Maria Mendonça Joan Serrat (Eds.)
Pattern Recognition
and Image Analysis
Third Iberian Conference, IbPRIA 2007
Girona, Spain, June 6-8, 2007
Proceedings, Part II
13
Volume Editors
Joan Martí
University of Girona
Campus Montilivi, s/n., 17071 Girona, Spain
E-mail:
José Miguel Benedí
Polytechnical University of Valencia
Camino de Vera, s/n., 46022 Valencia, Spain
E-mail:
Ana Maria Mendonça
University of Porto
Rua Dr. Roberto Frias, s/n, 4200-465 Porto, Portugal
E-mail:
Joan Serrat
Centre de Visió per Computador-UAB
Campus UAB, 08193 Belaterra, (Cerdanyola), Barcelona, Spain
E-mail:
Library of Congress Control Number: 2007927717
CR Subject Classification (1998): I.4, I.5, I.7, I.2.7, I.2.10
LNCS Sublibrary: SL 6 – Image Processing, Computer Vision, Pattern Recognition,
and Graphics
ISSN
0302-9743
ISBN-10
3-540-72848-1 Springer Berlin Heidelberg New York
ISBN-13
978-3-540-72848-1 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2007
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 12070374 06/3180 543210
Preface
We welcome you to the 3rd Iberian Conference on Pattern Recognition and Image
Analysis (IbPRIA 2007), jointly promoted by AERFAI (Asociaci´on Espa˜nola
de Reconocimiento de Formas y An´alisis de Im´agenes) and APRP (Associ¸c˜ao
Portuguesa de Reconhecimento de Padr˜oes). This year, IbPRIA was held in
Girona, Spain, June 6–8, 2007, and was hosted by the Institute of Informatics
and Applications of the University of Girona. It followed the two successful
previous editions hosted by the Universitat de les Illes Balears (2003) and the
Institute for Systems and Robotics and the Geo-systems Center of the Instituto
Superior T´ecnico (2005).
A record number of 328 full paper submissions from 27 countries were re-
ceived. Each of these submissions was reviewed in a blind process by two re-
viewers. The review assignments were determined by the four General Chairs,
and the final decisions were made after the Chairs meeting in Girona, giving an
overall acceptance rate of 47.5%. Because of the limited size of the conference,
we regret that some worthy papers were probably rejected.
In keeping with the IbPRIA tradition of having a single track of oral presen-
tations, the number of oral papers remained in line with the previous IbPRIA
editions, with a total of 48 papers. The number of poster papers was settled to
108.
We were also very honored to have as invited speakers such internationally
recognized researchers as Chris Willians from the University of Edinburgh, UK,
Michal Irani from The Weizmann Institute of Science, Israel and Andrew Davison
from Imperial College London, UK.
For the first time, some relevant related events were scheduled in parallel to
the IbPRIA main conference according to the Call for Tutorials and Workshops:
Antonio Torralba from MIT, USA and Aleix Mart´ınez from Ohio State Uni-
versity, USA taught relevant tutorials about object recognition and Statistical
Pattern Recognition, respectively, while the “Supervised and Unsupervised En-
semble Methods and Their Applications” workshop and the first edition of the
“Spanish Workshop on Biometrics” were successfully developed.
We would like to thank all the authors for submitting their papers and thus
making these proceedings possible. We address special thanks to the members of
the Program Committee and the additional reviewers for their great work which
contributed to the high quality of these proceedings.
We are also grateful to the Local Organizing Committee for their substantial
contribution of time and effort.
VI Preface
Finally, our thanks go to IAPR for support in sponsoring the Best Paper
Prize at IbPRIA 2007.
The next edition of IbPRIA will be held in Portugal in 2009.
June 2007 Joan Mart´ı
Ana Maria Mendon¸ca
Jos´e Miguel Bened´ı
Joan Serrat
Organization
IbPRIA 2007 was organized by AERFAI (Asociaci´on Espa˜nola de Reconocimiento
de Formas y An´alisis de Im´agenes) and APRP (Associa¸c˜ao Portuguesa de Recon-
hecimento de Padr˜oes), and as the local organizer of this edition, the Computer
Vision and Robotics Group, Institute of Informatics and Applications, University
of Girona (UdG).
General Conference Co-chairs
Joan Mart´ı University of Girona, Spain
Jos´e Miguel Bened´ı Polytechnical University of Valencia, Spain
Ana Maria Mendon¸ca University of Porto, Portugal
Joan Serrat Universitat Aut`onoma de Barcelona, Spain
Invited Speakers
Chris Williams University of Edinburgh, UK
Michal Irani The Weizmann Institute of Science, Israel
Andrew Davison Imperial College London, UK
National Organizing Committee
Marc Carreras
Xavier Cuf´ı
Jordi Freixenet
Rafael Garc´ıa
Xavier Llad´o
Robert Mart´ı
Marta Peracaula
Pere Ridao
Joaquim Salvi
Marcel Alofra
Elisabet Batlle
Anna Bosch
Fran¸cois Chung
Andr´es El-Fakdi
Jordi Ferrer
Emili Hern´andez
Maryna Kudzinava
Arnau Oliver
Jordi Palau
Ricard Prados
VIII Organization
Narc´ıs Palomeras
David Raba
David Ribas
Miquel Villanueva
Program Committee
Lourdes Agapito Queen Mary University of London, UK
Helder Ara´ujo University of Coimbra, Portugal
Herv´e Bourlard EPFL, Switzerland
Patrick Bouthemy IRISA, France
Joachim Buhmann ETH Zurich, Switzerland
Horst Bunke University of Bern, Switzerland
Hans Burkhard University of Freiburg, Germany
Francisco Casacuberta Polytechnical University of Valencia, Spain
Vicent Caselles Universitat Pompeu Fabra, Spain
Aur´elio Campilho University of Porto, Portugal
Lu´ıs Corte-Real University of Porto, Portugal
Herv´e Delinguette INRIA, France
Pierre Dupont Universit´e catholique de Louvain, Belgium
Marcello Federico ITC-irst Trento, Italy
Marco Ferreti University of Pavia, Italy
Ana Fred Technical University of Lisbon, Portugal
Andrew Gee University of Cambridge, UK
Vito di Ges´u University of Palermo, Italy
Edwin R. Hancock University of York, UK
Francisco Mario Hern´andez Tejera Universidad de Las Palmas, Spain
Laurent Heutte Universit´edeRouen,France
Jos´eManuelI˜nesta Quereda Universidad de Alicante, Spain
Jorge Marques Technical University of Lisbon, Portugal
Hermann Ney University of Aachen, Germany
Wiro Niessen University of Utrecht, The Netherlands
Francisco Jos´e Perales Universitat de les Illes Balears, Spain
Nicol´as P´erez de la Blanca University of Granada, Spain
Fernando P´erez Cruz Universidad Carlos III, Spain
Maria Petrou Imperial College, UK
Pedro Pina Technical University of Lisbon, Portugal
Armando Pinho University of Aveiro, Portugal
Ioannis Pitas University of Thessaloniki, Greece
Filiberto Pla University Jaume I, Spain
Alberto Sanfeliu Polytechnical University of Catalonia, Spain
Gabriella Sanniti di Baja Istituto di Cibernetica CNR, Italy
Organization IX
Pierre Soille Joint Research Centre, Italy
Karl Tombre LORIA, France
M. In´es Torres University of the Basque Country, Spain
Jordi Vitri`aUniversitatAut`onoma de Barcelona, Spain
Joachim Weickert Saarland University, Germany
Reyer Zwiggelaar University of Wales, Aberystwyth, UK
Reviewers
Maria Jos´e Abasolo University of the Balearic Islands, Spain
Antonio Ad´an Universidad de Castilla La Mancha, Spain
Francisco J´avier L´opez Aligu´e University of Extremadura, Spain
Ren´eAlqu´ezar UPC, Spain
Joachim Buhmann ETH Zurich, Switzerland
Juan Carlos Amengual UJI-LPI, Spain
Hans Burkhard University of Freiburg, Germany
Ramon Baldrich Computer Vision Center, Spain
Jorge Pereira Batista ISR Coimbra, Portugal
Luis Baumela UPM, Spain
Alexandre Bernardino Instituto Superior T´ecnico, Portugal
Lilian Blot University of East Anglia, UK
Imma Boada University of Girona, Spain
Marcello Federico ITC-irst Trento, Italy
Michael Breuss Saarland University, Germany
Jaime Santos Cardoso INESC Porto, Portugal
Modesto Castrill´on Universidad de Las Palmas de Gran Canaria,
Spain
Miguel Velhote Correia Instituto de Engenharia Biom´edica, Portugal
Xevi Cuf´ı University of Girona, Spain
Jorge Alves da Silva FEUB-INEB, Portugal
Hans du Buf University of Algarve, Portugal
´
Oscar Deniz Universidad de Las Palmas de Gran Canaria,
Spain
Daniel Hern´andez-Sosa Universidad de Las Palmas de Gran Canaria,
Spain
Olga Duran Imperial College, UK
Claudio Eccher ITC-irst Trento, Italy
Arturo De la Escalera Universidad Carlos III de Madrid, Spain
Miquel Feixas Universitat de Girona, Spain
Francesc J. Ferri Universitat de Val`encia, Spain
David Fofi Le2i UMR CNRS 5158, France
Jordi Freixenet University of Girona, Spain
Maria Frucci Institute of Cybernetics “E. Caianiello”, Italy
Cesare Furlanello ITC-irst Trento, Italy
Miguel
´
Angel Garc´ıa Universidad Aut´onoma de Madrid, Spain
Rafael Garc´ıa University of Girona, Spain
X Organization
Yolanda Gonz´alez Universidad de las Islas Baleares, Spain
Manuel Gonz´alez Universitat de les Illes Balears, Spain
Nuno Gracias University of Girona, Spain
Antoni Grau UPC, Spain
Nicol´as Guil University of Malaga, Spain
Alfons Juan Universitat Polit`ecnica de Val`encia, Spain
Fr´ed´eric Labrosse University of Wales, Aberystwyth, UK
Bart Lamiroy Nancy Universit´e-LORIA-INPL,France
Xavier Llad´o University of Girona, Spain
Paulo Lobato Correia IT - IST, Portugal
´
Angeles L´opez Universitat Jaume I, Spain
Javier Lorenzo Universidad de Las Palmas de Gran Canaria,
Spain
Manuel Lucena Universidad de Ja´en, Spain
Enric Mart´ıUniversitatAut`onoma de Barcelona, Spain
Robert Mart´ı Universitat de Girona, Spain
Elisa Mart´ınez Enginyeria La Salle, Universitat Ramon Llull,
Spain
Carlos Mart´ınez Hinarejos Universidad Polit´ecnica de Valencia, Spain
Fabrice Meriaudeau Le2i UMR CNRS 5158, France
Maria Luisa Mic´o Universidad de Alicante, Spain
Birgit M¨oller Martin Luther University Halle-Wittenberg,
Germany
Ram´on Mollineda Universidad Jaume I, Spain
Jacinto Nascimento Instituto de Sistemas e Rob´otica, Portugal
Shahriar Negahdaripour University of Miami, USA
Paulo Oliveira IST-ISR, Portugal
Gabriel A. Oliver-Codina University of the Balearic Islands, Spain
Jos´e Oncina Universidad de Alicante, Spain
Thierry Paquet LITIS, France
Roberto Paredes UPV, Spain
Joao Paulo Costeira Instituto de Sistemas e Rob´otica, Portugal
Antonio Miguel Peinado Universidad de Granada, Spain
Caroline Petitjean Universit´edeRouen,France
Andr´e Teixeira Puga Universidade do Porto, Portugal
Petia Radeva Computer Vision Center-UAB, Spain
Joao Miguel Raposo Sanches Instituto Superior T´ecnico, Portugal
Pere Ridao University of Girona, Spain
Antonio Rubio Universidad de Granada, Spain
Jos´e Ruiz Shulcloper Advanced Technologies Application Center, Cuba
J. Salvador S´anchez Universitat Jaume I, Spain
Joaquim Salvi University of Girona, Spain
Joan Andreu S´anchez Universitat Polit`ecnica de Val`encia, Spain
Elena S´anchez Nielsen Universidad de La Laguna, Spain
Organization XI
Joao Silva Sequeira Instituto Superior T´ecnico, Portugal
Margarida Silveira Instituto Superior T´ecnico, Portugal
Joao Manuel R.S. Tavares Universidade do Porto, Portugal
Antonio Teixeira Universidade de Aveiro, Portugal
Javier Traver Universitat Jaume I, Spain
Maria Vanrell Computer Vision Center, Spain
Javier Varona Universitat de les Illes Balears, Spain
Martin Welk Saarland University, Germany
Laurent Wendling LORIA, France
Michele Zanin ITC-irst Trento, Italy
Sponsoring Institutions
MEC (Ministerio de Educaci´on y Ciencia, Spanish Government)
AGAUR (Ag`encia de Gesti´o d’Ajuts Universitaris i de Recerca, Catalan
Government)
IAPR (International Association for Pattern Recognition)
Vicerectorat de Recerca en Ci`encia i Tecnologia, Universitat de Girona
Table of Contents – Part II
Robust Automatic Speech Recognition Using PD-MEEMLIN 1
Igmar Hern´andez, Paola Garc´ıa, Juan Nolazco, Luis Buera, and
Eduardo Lleida
Shadow Resistant Road Segmentation from a Mobile Monocular
System 9
Jos´eManuel
´
Alvarez, Antonio M. L´opez, and Ramon Baldrich
Mosaicking Cluttered Ground Planes Based on Stereo Vision 17
Jos´e Gaspar, Miguel Realpe, Boris Vintimilla, and
Jos´e Santos-Victor
Fast Central Catadioptric Line Extraction 25
Jean Charles Bazin, C´edric Demonceaux, and Pascal Vasseur
Similarity-Based Object Retrieval Using Appearance and Geometric
Feature Combination 33
Agn´es Borr`as and Josep Llad´os
Real-Time Facial Expression Recognition for Natural Interaction 40
Eva Cerezo, Isabelle Hupont, Cristina Manresa-Yee, Javier Varona,
Sandra Baldassarri, Francisco J. Perales, and Francisco J. Seron
A Simple But Effective Approach to Speaker Tracking in Broadcast
News 48
Luis Javier Rodr´ıguez, Mikel Pe˜nagarikano, and Germ´an Bordel
Region-Based Pose Tracking 56
Christian Schmaltz, Bodo Rosenhahn, Thomas Brox,
Daniel Cremers, Joachim Weickert, Lennart Wietzke, and
Gerald Sommer
Testing Geodesic Active Contours 64
A.Caro,T.Alonso,P.G.Rodr´ıguez, M.L. Dur´an, and M.M.
´
Avila
Rate Control Algorithm for MPEG-2 to H.264/AVC Transcoding 72
Gao Chen, Shouxun Lin, and Yongdong Zhang
3-D Motion Estimation for Positioning from 2-D Acoustic Video
Imagery 80
H. Sekkati and S. Negahdaripour
Progressive Compression of Geometry Information with Smooth
Intermediate Meshes 89
Taejung Park, Haeyoung Lee, and Chang-hun Kim
XIV Table of Contents – Part II
Rejection Strategies Involving Classifier Combination for Handwriting
Recognition 97
Jose A. Rodr´ıguez, Gemma S´anchez, and Josep Llad´os
Summarizing Image/Surface Registration for 6DOF Robot/Camera
Pose Estimation 105
Elisabet Batlle, Carles Matabosch, and Joaquim Salvi
Robust Complex Salient Regions 113
Sergio Escalera, Oriol Pujol, and Petia Radeva
Improving Piecewise-Linear Registration Through Mesh
Optimization 122
Vicente Ar´evalo and Javier Gonz´alez
Registration-Based Segmentation Using the Information Bottleneck
Method 130
Anton Bardera, Miquel Feixas, Imma Boada, Jaume Rigau, and
Mateu Sbert
Dominant Points Detection Using Phase Congruence 138
Francisco Jos´e Madrid-Cuevas, Rafel Medina-Carnicer,
´
Angel Carmona-Poyato, and Nicol´as Luis Fern´andez-Garc´ıa
Exploiting Information Theory for Filtering the Kadir Scale-Saliency
Detector 146
Pablo Suau and Francisco Escolano
False Positive Reduction in Breast Mass Detection Using
Two-Dimensional PCA 154
Arnau Oliver, Xavier Llad´o, Joan Mart´ı, Robert Mart´ı, and
Jordi Freixenet
A Fast and Robust Iris Segmentation Method 162
No´eOtero-Mateo,Miguel
´
Angel Vega-Rodr´ıguez,
Juan Antonio G´omez-Pulido, and
Juan Manuel S´anchez-P´erez
Detection of Lung Nodule Candidates in Chest Radiographs 170
Carlos S. Pereira, Hugo Fernandes, Ana Maria Mendon¸ca, and
Aur´elio Campilho
A Snake for Retinal Vessel Segmentation 178
L. Espona, M.J. Carreira, M. Ortega, and M.G. Penedo
Risk Classification of Mammograms Using Anatomical Linear Structure
and Density Information 186
Edward M. Hadley, Erika R.E. Denton, Josep Pont,
Elsa P´erez, and Reyer Zwiggelaar
Table of Contents – Part II XV
A New Method for Robust and Efficient Occupancy Grid-Map
Matching 194
Jose-Luis Blanco, Javier Gonzalez, and
Juan-Antonio Fernandez-Madrigal
Vote-Based Classifier Selection for Biomedical NER Using Genetic
Algorithms 202
Nazife Dimililer, Ekrem Varo˘glu, and Hakan Altın¸cay
Boundary Shape Recognition Using Accumulated Length and Angle
Information 210
Mar¸cal Rusi˜nol, Philippe Dosch, and Josep Llad´os
Extracting Average Shapes from Occluded Non-rigid Motion 218
Alessio Del Bue
Automatic Topological Active Net Division in a Genetic-Greedy Hybrid
Approach 226
N. Barreira, M.G. Penedo, O. Ib´a˜nez, and J. Santos
Using Graphics Hardware for Enhancing Edge and Circle Detection 234
Antonio Ruiz, Manuel Ujald´on, and Nicol´as Guil
Optimally Discriminant Moments for Speckle Detection in Real B-Scan
Images 242
Robert Mart´ı, Joan Mart´ı, Jordi Freixenet,
Joan Carles Vilanova, and Joaquim Barcel´o
Influence of Resampling and Weighting on Diversity and Accuracy of
Classifier Ensembles 250
R.M. Valdovinos, J.S. S´anchez, and E. Gasca
A Hierarchical Approach for Multi-task Logistic Regression 258
`
Agata Lapedriza, David Masip, and Jordi Vitri`a
Modelling of Magnetic Resonance Spectra Using Mixtures for Binned
and Truncated Data 266
Juan M. Garcia-Gomez, Montserrat Robles, Sabine Van Huffel, and
Alfons Juan-C´ıscar
Atmospheric Turbulence Effects Removal on Infrared Sequences
Degraded by Local Isoplanatism 274
Magali Lemaitre, Olivier Laligant, Jacques Blanc-Talon, and
Fabrice M´eriaudeau
Inference of Stochastic Finite-State Transducers Using N-Gram
Mixtures 282
Vicente Alabau, Francisco Casacuberta, Enrique Vidal, and
Alfons Juan
XVI Table of Contents – Part II
Word Spotting in Archive Documents Using Shape Contexts 290
Josep Llad´os, Partha Pratim-Roy, Jos´eA.Rodr´ıguez, and
Gemma S´anchez
Fuzzy Rule Based Edge-Sensitive Line Average Algorithm in Interlaced
HDTV Sequences 298
Gwanggil Jeon, Jungjun Kim, Jongmin You, and Jechang Jeong
A Tabular Pruning Rule in Tree-Based Fast Nearest Neighbor Search
Algorithms 306
Jose Oncina, Franck Thollard, Eva G´omez-Ballester,
Luisa Mic´o, and Francisco Moreno-Seco
A General Framework to Deal with the Scaling Problem in
Phrase-Based Statistical Machine Translation 314
Daniel Ortiz, Ismael Garc´ıa Varea, and Francisco Casacuberta
Recognizing Individual Typing Patterns 323
Michal Chora´s and Piotr Mroczkowski
Residual Filter for Improving Coding Performance of Noisy Video
Sequences 331
Won Seon Song, Seong Soo Lee, and Min-Cheol Hong
Cyclic Viterbi Score for Linear Hidden Markov Models 339
Vicente Palaz´on and Andr´es Marzal
Non Parametric Classification of Human Interaction 347
Scott Blunsden, Ernesto Andrade, and Robert Fisher
A Density-Based Data Reduction Algorithm for Robust Estimators 355
L. Ferraz, R. Felip, B. Mart´ınez, and X. Binefa
Robust Estimation of Reflectance Functions from Polarization 363
Gary A. Atkinson and Edwin R. Hancock
Estimation of Multiple Objects at Unknown Locations with Active
Contours 372
Margarida Silveira and Jorge S. Marques
Analytic Reconstruction of Transparent and Opaque Surfaces from
Texture Images 380
Mohamad Ivan Fanany and Itsuo Kumazawa
Sedimentological Analysis of Sands 388
Cristina Lira and Pedro Pina
Catadioptric Camera Calibration by Polarization Imaging 396
O. Morel, R. Seulin, and and D. Fofi
Table of Contents – Part II XVII
Stochastic Local Search for Omnidirectional Catadioptric Stereovision
Design 404
G. Dequen, L. Devendeville, and E. Mouaddib
Dimensionless Monocular SLAM 412
Javier Civera, Andrew J. Davison, and J.M.M. Montiel
Improved Camera Calibration Method Based on a Two-Dimensional
Template 420
Carlos Ricolfe-Viala and Antonio-Jose Sanchez-Salmeron
Relative Pose Estimation of Surgical Tools in Assisted Minimally
Invasive Surgery 428
Agustin Navarro, Edgar Villarraga, and Joan Aranda
Efficiently Downdating, Composing and Splitting Singular Value
Decompositions Preserving the Mean Information 436
Javier Melench´on and Elisa Mart´ınez
On-Line Classification of Human Activities 444
J.C. Nascimento, M.A.T. Figueiredo, and J.S. Marques
Data-Driven Jacobian Adaptation in a Multi-model Structure for Noisy
Speech Recognition 452
Yong-Joo Chung and Keun-Sung Bae
Development of a Computer Vision System for the Automatic Quality
Grading of Mandarin Segments 460
Jos´e Blasco, Sergio Cubero, Ra´ul Arias, Juan G´omez,
Florentino Juste, and Enrique Molt´o
Mathematical Morphology in the HSI Colour Space 467
M.C. Tobar, C. Platero, P.M. Gonz´alez, and G. Asensio
Improving Background Subtraction Based on a Casuistry of
Colour-Motion Segmentation Problems 475
I. Huerta, D. Rowe, M. Mozerov, and J. Gonz`alez
Random Forest for Gene Expression Based Cancer Classification:
Overlooked Issues 483
Oleg Okun and Helen Priisalu
Bounding the Size of the Median Graph 491
Miquel Ferrer, Ernest Valveny, and Francesc Serratosa
When Overlapping Unexpectedly Alters the Class Imbalance Effects 499
V. Garc´ıa, R.A. Mollineda, J.S. S´anchez, R. Alejo, and J.M. Sotoca
A Kernel Matching Pursuit Approach to Man-Made Objects Detection
in Aerial Images 507
Wei Wang, Xin Yang, and Shoushui Chen
XVIII Table of Contents – Part II
Anisotropic Continuous-Scale Morphology 515
Michael Breuß, Bernhard Burgeth, and Joachim Weickert
Three-Dimensional Ultrasonic Assessment of Atherosclerotic Plaques 523
Jos´eSeabra,Jo˜ao Sanches, Lu´ıs M. Pedro, and
J. Fernandes e Fernandes
Measuring the Applicability of Self-organization Maps in a Case-Based
Reasoning System 532
A. Fornells, E. Golobardes, J.M. Martorell, J.M. Garrell,
E. Bernad´o, and N. Maci`a
Algebraic-Distance Minimization of Lines and Ellipses for Traffic Sign
Shape Localization 540
Pedro Gil-Jim´enez, Saturnino Maldonado-Basc´on,
Hilario G´omez-Moreno, Sergio Lafuente-Arroyo, and
Javier Acevedo-Rodr´ıguez
Modeling Aceto-White Temporal Patterns to Segment Colposcopic
Images 548
H´ector-Gabriel Acosta-Mesa, Nicandro Cruz-Ram´ırez,
Rodolfo Hern´andez-Jim´enez, and
Daniel-Alejandro Garc´ıa-L´opez
Speech/Music Classification Based on Distributed Evolutionary Fuzzy
Logic for Intelligent Audio Coding 556
J.E. Mu˜noz Exp´osito, N. Ruiz Reyes, S. Garcia Gal´an, and
P. Vera Candeas
Breast Skin-Line Segmentation Using Contour Growing 564
Robert Mart´ı, Arnau Oliver, David Raba, and Jordi Freixenet
New Measure for Shape Elongation 572
MiloˇsStojmenovi´candJoviˇsa
ˇ
Zuni´c
Evaluation of Spectral-Based Methods for Median Graph
Computation 580
Miquel Ferrer, Francesc Serratosa, and Ernest Valveny
Feasible Application of Shape-Based Classification 588
A.Caro,P.G.Rodr´ıguez, T. Antequera, and R. Palacios
3D Shape Recovery with Registration Assisted Stereo Matching 596
Huei-Yung Lin, Sung-Chung Liang, and Jing-Ren Wu
Blind Estimation of Motion Blur Parameters for Image
Deconvolution 604
Jo˜ao P. Oliveira, M´ario A.T. Figueiredo, and Jos´e M. Bioucas-Dias
Table of Contents – Part II XIX
Dependent Component Analysis: A Hyperspectral Unmixing
Algorithm 612
Jos´e M.P. Nascimento and Jos´e M. Bioucas-Dias
Synchronization of Video Sequences from Free-Moving Cameras 620
Joan Serrat, Ferran Diego, Felipe Lumbreras, and
Jos´eManuel
´
Alvarez
Tracking the Left Ventricle in Ultrasound Images Based on Total
Variation Denoising 628
Jacinto C. Nascimento, Jo˜ao M. Sanches, and Jorge S. Marques
Bayesian Oil Spill Segmentation of SAR Images Via Graph Cuts 637
S´onia Pelizzari and Jos´e M. Bioucas-Dias
Unidimensional Multiscale Local Features for Object Detection Under
Rotation and Mild Occlusions 645
Michael Villamizar, Alberto Sanfeliu, and Juan Andrade Cetto
Author Index 653
Robust Automatic Speech Recognition Using
PD-MEEMLIN
Igmar Hern´andez
1
, Paola Garc´ıa
1
, Juan Nolazco
1
, Luis Buera
2
,
and Eduardo Lleida
2
1
Computer Science Department, Tecnolgico de Monterrey,
Campus Monterrey, M´exico
2
Communications Technology Group (GTC), I3A, University of Zaragoza, Spain
{A00778595,paola.garcia,jnolazco,}@itesm.mx, {lbuera,lleida}@unizar.es
Abstract. This work presents a robust normalization technique by
cascading a speech enhancement method followed by a feature vector
normalization algorithm. To provide speech enhancement the Spectral
Subtraction (SS) algorithm is used; this method reduces the effect of ad-
ditive noise by performing a subtraction of the noise spectrum estimate
over the complete speech spectrum. On the other hand, an empirical fea-
ture vector normalization technique known as PD-MEMLIN (Phoneme-
Dependent Multi-Enviroment Models based LInear Normalization) has
also shown to be effective. PD-MEMLIN models clean and noisy spaces
employing Gaussian Mixture Models (GMMs), and estimates a set of
linear compensation transformations to be used to clean the signal. The
proper integration of both approaches is studied and the final design, PD-
MEEMLIN (Phoneme-Dependent Multi-Enviroment Enhanced Models
based LInear Normalization), confirms and improves the effectiveness of
both approaches. The results obtained show that in very high degraded
speech PD-MEEMLIN outperforms the SS by a range between 11.4% and
34.5%, and for PD-MEMLIN by a range between 11.7% and 24.84%. Fur-
themore, in moderate SNR, i.e. 15 or 20 dB, PD-MEEMLIN is as good
as PD-MEMLIN and SS techniques.
1 Introduction
The robust speech recognition field plays a key rule in real environment appli-
cations. Noise can degrade speech signals causing nocive effects in Automatic
Speech Recognition (ASR) tasks. Even though there have been great advances
in the area, robustness still remains an issue. Noticing this problem, several tech-
niques have been developed over the years, for instance the Spectral Subtraction
algorithm (SS) [1]; and in the last decade, SPLICE (State Based Piecewise Lin-
ear Compensation for Enviroments) [2], PMC (Parallel Model Combination) [3],
RATZ (multivariate Gaussian based cepstral normalization) [4] and RASTA (the
RelAtive SpecTrAl Technique) [5]. The research that followed this evolution was
to make a proper combination of algorithms in order to reduce the noise ef-
fects. For example, a good example is described in [6], where the core scheme is
composed of a Continuous SS (CSS) and PMC.
J. Mart´ı et al. (Eds.): IbPRIA 2007, Part II, LNCS 4478, pp. 1–8, 2007.
c
Springer-Verlag Berlin Heidelberg 2007
2I.Hern´andez et al.
Persuing the same idea, a combination of the speech enhanced signal (repre-
sented by the SS method) and a feature vector normalization technique
(PD-MEMLIN [7]) are presented in this work to improve the recognition accu-
racy of the speech recognition system in highly degraded environments [8,9]. The
first technique was selected because of its implementation simplicity and good
performance. The second one is an empirical vector normalization technique that
has been compared against some other algorithms [8] and has obtained impor-
tant improvements.
The organization of the paper is as follows. In Section 2, a brief overview of
the SS and PD-MEMLIN. Section 3 details the new method PD-MEEMLIN. In
Section 4, the experimental results are presented. Finally, the conclusions are
showninSection5.
2 Spectral Subtraction and PD-MEMLIN
In order to evaluate the proposed integration, an ASR system is employed. In
general, a pre-processing stage of the speech waveform is always desirable. The
speech signal is divided into overlaped short windows, from which a set of coeffi-
cients, usually Mel Frequency Cepstral Coefficients (MFCCs)[10], are computed.
The MFCCs are feeded to the training algorithm that calculates the acoustic
models. The acoustic models used in this research are the Hidden Markov Mod-
els (HMMs), which are widely used to model statistically the behaviour of the
phonetic events in speech [10]. The HMMs employ a sequence of hidden states
which characterises how a random process (speech in this case) evolves in time.
Although the states are not observable, a sequence of realizations from these
states can always be obtained. Associated to each state there is a probability
density function, normally a mixture of Gaussians. The criteria used to train
the HMMs is the Maximum Likelihood, thus, the training process becomes an
optimization problem that can be solved iteratively with the Baum and Welch
algorithm.
2.1 Spectral Subtraction
The Spectral Subtraction (SS) algorithm is a simple and known speech enhance-
ment technique. This research is based on the SS algorithm expressed in [9]. It
has the property that it does not requiere the use of an explicit voice activity
detector, as general SS algorithms does. The algorithm is based on the existance
of peaks and valleys in a short noisy speech time subband power estimate [9].
The peaks correspond to the speech activity and the valleys are used to obtain
an estimate of the subband noise power. So, a reliable noise estimation is ob-
tained using a large enough window that can pemit the detection of any peak of
speech activity.
As shown in Figure 1, this algorithm performs a modification of the short time
spectral magnitude of the noisy speech signal during the process of enhancement.
Hence, the output signal can be considered close to the speech clean signal when
Robust Automatic Speech Recognition Using PD-MEEMLIN 3
Fig. 1. Diagram of the Basic SS Method Used
synthesized. The appropriate computation of the spectral magnitude is obtained
with the noise power estimate and the SS algorithm. Let, y(i)=x(i)+n(i), where
y(i) is the noisy speech signal, x(i) is the clean speech signal, n(i) is the noise
signal and i denotes the time index, x(i)andn(i) are statistically independent.
Figure 1 depicts the spectral analysis in which the frames in the time do-
main data are windowed and converted to frequency domain using the Discrete
Fourier Transform (DFT) filter bank with W
DFT
subbands and with a decima-
tion/interpolation ratio named R [9]. After the computation of the noise power
estimation and the spectral weightening, the enhanced signal can be transformed
back to the time domain using the Inverse Discrete Fourier Transform (IDFT).
For the subtraction algorithm it is necessary to estimate the subband noise
power P
n
(λ, k) and the short time signal power |Y (λ, k)|
2
,whereλ is the deci-
mated time index and k are the frequency bins of the DFT. A first order recursive
network is used to obtain a short time signal power as shown in Equation 1.
|Y (λ, k)|
2
= γ ∗ |Y (λ − 1,k)|
2
+(1− γ) ∗|Y (λ, k)|
2
. (1)
Afterwards, the subtraction algorithm is accomplished using an oversubtrac-
tion factor osub(λ, k) and a spectral flooring constant (subf)[12].Theosub(λ, k)
factor is needed to eliminate the musical noise, and it is calculated as a function
of the subband Signal to Noise Ratio SNR
y
(λ, k), λ and k (for a high SNR and
high frequencies less osub factor is required, for low SNR and low frequencies the
osub is less). The subf constant helps the resultant spectral components from
going below a minimum level. It is expressed as a fraction of the original noise
power spectrum. The final relation of the spectral subtraction between subf and
osub is defined by Equation 2.
|
ˆ
X(λ, k)| =
subf ∗P
n
(λ, k)if|Y (λ, k)|∗Q(λ, k) ≤
subf ∗P
n
(λ, k)
|Y (λ, k)|∗Q(λ, k) otherwise
(2)
where Q(λ, k)=(1−
osub(λ, k)
P
n
(λ,k)
|Y (λ,k)|
2
).
The missing element, P
n
(λ, k), is computed using the short subband signal
power P
y
(λ, k) in a representation based on smoothed periodograms, as denoted
by P
y
(λ, k)=ξ∗P
y
(λ−1,k)+(1−ξ)∗|Y (λ, k)|
2
where ξ represents the smoothing
4I.Hern´andez et al.
constant to obtain the periodograms. Then, P
n
(λ, k) is calculated as a weighted
minimum of P
x
(λ, k) in a window of D subband samples. Hence,
P
n
(λ, k)=omin ·P
min
(λ, k), (3)
where P
min
(λ, k) denotes the estimated minimum power and omin is a bias
compensation factor. The data window D is divided into W windows of length
M, allowing to update the minimum every M samples without time consuming.
This noise estimator combined with the spectral subtraction has the ability
to preserve weak speech sounds. If a short time subband power is observed,
the valleys correspond to the noisy speech signal and are used to estimate the
subband noise power.
The last element to be calculated is the SNR
y
(λ, k) in Equation 4 that con-
trols the oversubtraction factor osub(λ, k).
SNR
y
(λ, k)=10log
P
y
(λ, k) − min(P
n
(λ, k),P
y
(λ, k))
P
n
(λ, k)
(4)
Up to this stage osub(λ, k)andsubf can be selected and the spectral substraction
algorithm can be computed.
2.2 PD-MEMLIN
PD-MEMLIN is an empirical feature vector normalization technique which uses
stereo data in order to estimate the different compensation linear transforma-
tions in a previous training process. The clean feature space is modelled as a
mixture of Gaussians for each phoneme. The noisy space is split in several ba-
sic acoustic environments and each environmentismodelledasamixtureof
Gaussians for each phoneme. The transformations are estimated for all basic
environments between a clean phoneme Gaussian and a noisy Gaussian of the
same phoneme.
PD-MEMLIN approximations. Clean feature vectors, x, are modelled using
a GMM for each phoneme, ph
p
ph
(x)=
s
ph
x
p(x|s
ph
x
)p(s
ph
x
), (5)
p(x|s
ph
x
)=N(x; μ
s
ph
x
,Σ
s
ph
x
)
, (6)
where μ
s
ph
x
, Σ
s
ph
x
,andp(s
ph
x
) are the mean vector, the diagonal covariance ma-
trix, and the a priori probability associated with the clean model Gaussian s
ph
x
of the ph phoneme.
Noisy space is split into several basic environments, e, and the noisy feature
vectors, y, are modeled as a GMM for each basic environment and phoneme
p
e,ph
(y)=
s
e,ph
y
p(y|s
e,ph
y
)p(s
e,ph
y
), (7)
Robust Automatic Speech Recognition Using PD-MEEMLIN 5
p(y|s
e,ph
y
)=N(y; μ
s
e,ph
y
,Σ
s
e,ph
y
), (8)
where s
e,ph
y
denotes the corresponding Gaussian of the noisy model for the e
basic environment and the ph phoneme; μ
s
e,ph
y
, Σ
s
e,ph
y
,andp(s
e,ph
y
)arethemean
vector, the diagonal covariance matrix, and the a priori probability associated
with s
e,ph
y
.
Finally, clean feature vectors can be approximated as a linear function, f,
of the noisy feature vector for each time frame t which depends on the basic
environments, the phonemes and the clean and noisy model Gaussians: x ≈
f(y
t
,s
ph
x
,s
e,ph
y
)=y
t
−r
s
ph
x
,s
e,ph
y
,wherer
s
ph
x
,s
e,ph
y
is the bias vector transformation
between noisy and clean feature vectors for each pair of Gaussians, s
ph
x
and s
e,ph
y
.
PD-MEMLIN enhancement. With those approximations, PD-MEMLIN
transforms the Minimum Mean Square Error (MMSE) estimation expression,
ˆx
t
= E[x|y
t
], into
ˆx
t
=y
t
−
e
ph
s
e,ph
y
s
ph
x
r
s
ph
x
,s
e,ph
y
p(e|y
t
)p(ph|y
t
,e)p(s
e
y
|y
t
,e,ph)p(s
ph
x
|y
t
,e,ph,s
e
y
),
(9)
where p(e|y
t
) is the a posteriori probability of the basic environment; p(ph|y
t
,e)is
the a posteriori probability of the phoneme, given the noisy feature vector and the
environment; p(s
e,ph
y
|y
t
,e,ph) is the a posteriori probability of the noisy model
Gaussian, s
e,ph
y
, given the feature vector, y
t
, the basic environment, e,andthe
phoneme, ph. To estimate those terms: p(e|y
t
), p(ph|y
t
,e)andp(s
e,ph
y
|y
t
,e,ph),
(7) and (8) are applied as described in [8]. Finally, the cross-probability model,
p(s
ph
x
|y
t
,e,ph,s
e,ph
y
), which is the probability of the clean model Gaussian, s
ph
x
,
given the feature vector, y
t
, the basic environment, e, the phoneme, ph,andthe
noisy model Gaussian, s
e,ph
y
, and the bias vector transformation, r
s
ph
x
,s
e,ph
y
,are
estimated in a training phase using stereo data for each basic environment and
phoneme [8].
3 PD-MEEMLIN
By combinig both techniques, PD-MEEMLIN arises as an empirical feature
vector normalization which estimates different linear transformations as PD-
MEMLIN, with the special property that a new enhanced space is obtained by
applying SS to the noisy speech signal. Furthermore, this first-stage enhance-
ment produces that the noisy space gets closer to the clean one, making the gap
smaller among them. Figure 2 shows PD-MEEMLIN architecture.
Next, the architecture modules are explained:
– The SS-enhancement of the noisy speech signal is performed, |
ˆ
X(λ, k)|,
P
n
(λ, k)andSNR
y
(λ, k) are calculated.
– Given the clean speech signal and the enhanced noisy speech signal, the clean
and noisy-enhanced GMMs are obtained.
6I.Hern´andez et al.
Fig. 2. PD-MEEMLIN Architecture
– In the testing stage, the noisy speech signal is also SS-enhanced and then
normalized using PD-MEEMLIN.
– These normalized coefficients are forwarded to the decoder.
4 Experimental Results
All the experiments were performed employing the AURORA2 database [13],
clean and noisy data based on TIDigits. Three types of noises were selected:
Subway, Babble and Car from AURORA2, that go from -5dB to 20dB SNR. For
every SNR the SS parameters osub and subf needs to be configured. The param-
eter osub takes values from 0.4 to 4.6 (0.4 for 20dB, 0.7 for 15dB, 1.3 for 10dB,
2.21 for 5dB, 4.6 for 0dB and 4.6 for -5dB) and subf values 0.03 or 0.04 (all SNR
levels except 5dB optimised for 0.04). The phonetic acoustic models employed
by PD-MEEMLIN are obtained from 22 phonemes and 1 silence. The models
set is represented by a mixture of 32 Gaussians each. Besides, two new sets of
each noise were used, PD-MEEMLIN needs one to estimate the enhanced-noisy
model, and onother to obtain the normalized coefficients. The feature vectors
for the recognition process are built by 12 normalized MFCCs followed by the
energy coefficient, its time-derative Δ and the time-acceleration ΔΔ.Forthe
training stage of the ASR system, the acoustic models of 22 phonemes and the
silence consist on a three-state HMMs with a mixture of 8 Gaussians per state.
The combined techniques show that for low noise conditions i.e. SNR=10, 15
or 20 dB, the difference between the original noisy space and the one approxi-
mated to the clean is similar. However, when the SNR is lower (-5dB or 0dB)
the SS improves the performance of PD-MEMLIN. Comparing the combination
of SS with PD-MEMLIN against the case where no techniques are applied, a
significant improvement is shown. The results described before are presented in
Tables 1, 2 and 3. The Tables show ”Sent” that means complete utterances
Robust Automatic Speech Recognition Using PD-MEEMLIN 7
Table 1. Comparative Table for the ASR working with Subway Noise
Subway ASR ASR+SS ASR+PD-MEMLIN ASR+PD-MEEMLIN
SNR Sent % Word % Sent % Word % Sent % Word % Sent % Word %
-5dB 3.40 21.57 10.09 34.22 11.29 37.09 13.29 47.95
0dB 9.09 29.05 20.18 53.71 27.07 61.88 30.87 69.71
5dB 17.58 40.45 32.17 70.00 48.15 80.38 51.65 83.40
10dB 33.07 65.47 50.95 83.23 65.83 90.58 70.13 91.86
15dB 54.45 84.60 64.84 90.02 78.92 94.98 78.22 94.40
20dB 72.83 93.40 76.52 94.56 85.91 97.14 86.71 97.30
Table 2. Comparative Table for the ASR working with Babble Noise
Babble ASR ASR+SS ASR+PD-MEMLIN ASR+PD-MEEMLIN
SNR Sent % Word % Sent % Word % Sent % Word % Sent % Word %
-5dB 4.60 23.08 7.59 29.78 8.49 29.54 6.69 37.79
0dB 11.29 30.41 15.98 44.49 23.48 55.72 20.08 59.50
5dB 20.58 44.23 30.37 65.11 48.75 80.55 49.25 83.70
10dB 40.86 72.85 50.25 80.93 74.93 94.20 69.33 91.48
15dB 69.03 90.54 69.93 90.56 84.12 96.86 81.32 95.54
20dB 82.42 96.17 83.52 95.84 88.91 98.09 88.01 97.98
Table 3. Comparative Table for the ASR working with Car Noise
Car ASR ASR+SS ASR+PD-MEMLIN ASR+PD-MEEMLIN
SNR Sent % Word % Sent % Word % Sent % Word % Sent % Word %
-5dB 3.10 20.18 10.49 28.87 6.79 25.90 13.89 44.31
0dB 8.09 26.18 18.58 46.70 23.58 52.67 35.16 70.47
5dB 14.99 35.34 31.47 66.50 51.95 82.34 58.64 86.30
10dB 28.77 58.13 54.25 82.72 70.83 92.15 70.93 91.90
15dB 57.84 84.04 68.03 90.51 82.02 96.16 81.42 95.86
20dB 78.32 94.61 81.42 95.30 87.01 97.44 87.81 97.77
percentage correctly recognised, and ”Word” indicates the words percentage cor-
rectly recognised. The gap between the clean and the noisy model, for the very
high degraded speech, had been shortened due to the advantages of both tech-
niques. When PD-MEEMLIN is employed the performance is between 11.7%
and 24.84% better than PD-MEMLIN, and between 11.4% and 34.5% better
than SS.
5 Conclusions
In this work a robust normalization technique, PD-MEEMLIN, has been pre-
sented by cascading a speech enhancement method (SS) followed by a feature
vector normalization algorithm (PD-MEMLIN). The results of PD-MEEMLIN
show a better performance than SS and PD-MEMLIN for a very high degraded