Tải bản đầy đủ (.pdf) (668 trang)

gerard medioni, sing bing kang emerging topics i(bookos.org)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (9.82 MB, 668 trang )

Emerging Topics in
Computer Vision
Edited by
G´erard Medioni and Sing Bing Kang
Contents
PREFACE ix
CONTRIBUTORS x
1 INTRODUCTION 1
1.1 Organization 1
1.2 How to Use the Book 2
1.3 Contents of DVDs 2
SECTION I:
FUNDAMENTALS IN COMPUTER VISION 3
2 CAMERA CALIBRATION 5
Zhengyou Zhang
2.1 Introduction 5
2.2 Notation and Problem Statement 7
2.2.1 Pinhole Camera Model 8
2.2.2 Absolute Conic 9
2.3 Camera Calibration with 3D Objects 11
2.3.1 Feature Extraction 13
2.3.2 Linear Estimation of the Camera Projection Matrix 13
2.3.3 Recover Intrinsic and Extrinsic Parameters from P 14
2.3.4 Refine Calibration Parameters Through a Nonlinear
Optimization 15
2.3.5 Lens Distortion 15
2.3.6 An Example 17
2.4 Camera Calibration with 2D Objects: Plane Based Technique 18
2.4.1 Homography between the model plane and its image 18
2.4.2 Constraints on the intrinsic parameters 19


2.4.3 Geometric Interpretation
2
19
2.4.4 Closed-form solution 20
i
ii Contents
2.4.5 Maximum likelihood estimation 22
2.4.6 Dealing with radial distortion 22
2.4.7 Summary 23
2.4.8 Experimental Results 24
2.4.9 Related Work 26
2.5 Solving Camera Calibration With 1D Objects 27
2.5.1 Setups With Free-Moving 1D Calibration Objects 28
2.5.2 Setups With 1D Calibration Objects Moving Around
a fixed Point 29
2.5.3 Basic Equations 30
2.5.4 Closed-Form Solution 32
2.5.5 Nonlinear Optimization 33
2.5.6 Estimating the fixed point 34
2.5.7 Experimental Results 35
2.6 Self Calibration 39
2.7 Conclusion 39
2.8 Appendix: Estimating Homography Between the Model Plane
and its Image 40
Bibliography 41
3 MULTIPLE VIEW GEOMETRY 45
Anders Heyden and Marc Pollefeys
3.1 Introduction 45
3.2 Projective Geometry 46
3.2.1 The central perspective transformation 46

3.2.2 Projective spaces 47
3.2.3 Homogeneous coordinates 49
3.2.4 Duality 52
3.2.5 Projective transformations 54
3.3 Tensor Calculus 56
3.4 Modelling Cameras 58
3.4.1 The pinhole camera 58
3.4.2 The camera matrix 59
3.4.3 The intrinsic parameters 59
3.4.4 The extrinsic parameters 60
3.4.5 Properties of the pinhole camera 62
3.5 Multiple View Geometry 63
3.5.1 The structure and motion problem 63
3.5.2 The two-view case 64
3.5.3 Multi-view constraints and tensors 70
Contents iii
3.6 Structure and Motion I 75
3.6.1 Resection 75
3.6.2 Intersection 75
3.6.3 Linear estimation of tensors 76
3.6.4 Factorization 78
3.7 Structure and Motion II 80
3.7.1 Two-view geometry computation 80
3.7.2 Structure and motion recovery 83
3.8 Auto-calibration 87
3.9 Dense Depth Estimation 90
3.9.1 Rectification 90
3.9.2 Stereo matching 92
3.9.3 Multi-view linking 93
3.10 Visual Modeling 94

3.10.1 3D surface reconstruction 95
3.10.2 Image-based rendering 98
3.10.3 Match-moving 101
3.11 Conclusion 101
Bibliography 103
4 ROBUST TECHNIQUES FOR COMPUTER VISION 109
Peter Meer
4.1 Robustness in Visual Tasks 109
4.2 Models and Estimation Problems 113
4.2.1 Elements of a Model 113
4.2.2 Estimation of a Model 119
4.2.3 Robustness of an Estimator 122
4.2.4 Definition of Robustness 124
4.2.5 Taxonomy of Estimation Problems 127
4.2.6 Linear Errors-in-Variables Regression Model 130
4.2.7 Objective Function Optimization 134
4.3 Location Estimation 139
4.3.1 Why Nonparametric Methods 140
4.3.2 Kernel Density Estimation 141
4.3.3 Adaptive Mean Shift 146
4.3.4 Applications 150
4.4 Robust Regression 157
4.4.1 Least Squares Family 158
4.4.2 M-estimators 163
4.4.3 Median Absolute Deviation Scale Estimate 166
iv Contents
4.4.4 LMedS, RANSAC and Hough Transform 169
4.4.5 The pbM-estimator 173
4.4.6 Applications 178
4.4.7 Structured Outliers 180

4.5 Conclusion 183
Bibliography 183
5 THE TENSOR VOTING FRAMEWORK 191
G´erard Medioni and Philippos Mordohai
5.1 Introduction 191
5.1.1 Motivation 192
5.1.2 Desirable descriptions 194
5.1.3 Our approach 195
5.1.4 Chapter Overview 197
5.2 Related Work 198
5.3 Tensor Voting in 2D 203
5.3.1 Second Order Representation and Voting in 2-D 203
5.3.2 First Order Representation and Voting in 2-D 210
5.3.3 Voting Fields 213
5.3.4 Vote analysis 215
5.3.5 Results in 2-D 218
5.3.6 Illusory Contours 219
5.4 Tensor Voting in 3D 221
5.4.1 Representation in 3-D 222
5.4.2 Voting in 3-D 224
5.4.3 Vote analysis 226
5.4.4 Results in 3-D 228
5.5 Tensor Voting in ND 229
5.5.1 Computational Complexity 232
5.6 Application to Computer Vision Problems 233
5.6.1 Initial Matching 234
5.6.2 Uniqueness 235
5.6.3 Discrete Densification 236
5.6.4 Discontinuity Localization 237
5.6.5 Stereo 239

5.6.6 Multiple View Stereo 241
5.6.7 Visual Motion from Motion Cues 243
5.6.8 Visual Motion on Real Images 245
5.7 Conclusion and Future Work 246
5.8 Acknowledgment 250
Contents v
Bibliography 250
SECTION II:
APPLICATIONS IN COMPUTER VISION 254
6 IMAGE BASED LIGHTING 255
Paul E. Debevec
6.1 Basic Image Based Lighting 257
6.1.1 Capturing Light 257
6.1.2 Illuminating Synthetic Objects with Real Light 260
6.1.3 Lighting Entire Environments with IBL 269
6.2 Advanced Image Based Lighting 269
6.2.1 Capturing a Light Probe in Direct Sunlight 272
6.2.2 Compositing objects into the scene including shadows 282
6.2.3 Image-Based Lighting in Fiat Lux 288
6.2.4 Capturing and Rendering Spatially-Varying Illumination291
6.3 Image Based Relighting 294
6.4 Conclusion 299
Bibliography 301
7 COMPUTER VISION IN VISUAL EFFECTS 305
Doug Roble
7.1 Introduction 305
7.2 Computer Vision Problems Unique to Film 306
7.2.1 Welcome to the Set 306
7.3 Feature Tracking 319
7.4 Optical Flow 321

7.5 Camera Tracking and Structure from Motion 325
7.6 The Future 330
Bibliography 330
8 CONTENT BASED IMAGE RETRIEVAL: AN OVERVIEW 333
Theo Gevers and Arnold W.M. Smeulders
8.1 Overview of the chapter 334
8.2 Image Domains 339
8.2.1 Search modes 339
8.2.2 The sensory gap 341
8.2.3 The semantic gap 342
8.2.4 Discussion 343
8.3 Image Features 344
vi Contents
8.3.1 Color 345
8.3.2 Shape 348
8.3.3 Texture 349
8.3.4 Discussion 352
8.4 Representation and Indexing 352
8.4.1 Grouping data 353
8.4.2 Features accumulation 354
8.4.3 Feature accumulation and image partitioning 357
8.4.4 Salient features 358
8.4.5 Shape and object features 359
8.4.6 Structure and lay-out 361
8.4.7 Discussion 361
8.5 Similarity and Search 362
8.5.1 Semantic interpretation 362
8.5.2 Similarity between features 363
8.5.3 Similarity of object outlines 366
8.5.4 Similarity of object arrangements 367

8.5.5 Similarity of salient features 368
8.5.6 Discussion 369
8.6 Interaction and Learning 369
8.6.1 Interaction on a semantic level 369
8.6.2 Classification on a semantic level 370
8.6.3 Learning 371
8.6.4 Discussion 371
8.7 Conclusion 372
Bibliography 372
9 FACE DETECTION, ALIGNMENT AND RECOGNITION 385
Stan Z. Li and Juwei Lu
9.1 Introduction 385
9.2 Face Detection 388
9.2.1 Appearance and Learning Based Approach 389
9.2.2 Preprocessing 391
9.2.3 Neural and Kernel Methods 393
9.2.4 Boosting Based Methods 394
9.2.5 Post-Processing 400
9.2.6 Evaluation 401
9.3 Face Alignment 404
9.3.1 Active Shape Model 405
9.3.2 Active Appearance Model 407
Contents vii
9.3.3 Modeling Shape from Texture 408
9.3.4 Dealing with Head Pose 414
9.3.5 Evaluation 416
9.4 Face Recognition 419
9.4.1 Preprocessing 419
9.4.2 Feature Extraction 420
9.4.3 Pattern Classification 431

9.4.4 Evaluation 439
Bibliography 445
10 PERCEPTUAL INTERFACES 455
Matthew Turk and Mathias K¨olsch
10.1 Introduction 455
10.2 Perceptual Interfaces and HCI 457
10.3 Multimodal Interfaces 464
10.4 Vision Based Interfaces 472
10.4.1 Terminology 476
10.4.2 Elements of VBI 479
10.4.3 Computer Vision Methods for VBI 491
10.4.4 VBI Summary 504
10.5 Brain-Computer Interfaces 504
10.6 Summary 507
Bibliography 509
SECTION III:
PROGRAMMING FOR COMPUTER VISION 520
11 OPEN SOURCE COMPUTER VISION LIBRARY 521
Gary Bradski
11.1 Overview 521
11.1.1 Installation 522
11.1.2 Organization 527
11.1.3 Optimizations 529
11.2 Functional Groups: What’s Good for What 532
11.2.1 By Area 534
11.2.2 By Task 538
11.2.3 Demos and Samples 542
11.3 Pictorial Tour 545
11.3.1 Functional Groups 545
11.3.2 Demo Tour 561

viii Contents
11.4 Programming Examples Using C/C++ 561
11.4.1 Read Images from Disk 566
11.4.2 Read AVIs from Disk, or Video from a Camera 568
11.5 Other Interfaces 570
11.5.1 Ch 570
11.5.2 Matlab 575
11.5.3 Lush 577
11.6 Appendix A 578
11.7 Appendix B 579
Bibliography 580
12 SOFTWARE ARCHITECTURE FOR COMPUTER VI-
SION 585
Alexandre R.J. Fran¸cois
12.1 Introduction 585
12.1.1 Motivation 585
12.1.2 Contribution 588
12.1.3 Outline 589
12.2 SAI: A Software Architecture Model 590
12.2.1 Beyond Pipes and Filters 590
12.2.2 The SAI Architectural Style 597
12.2.3 Example Designs 601
12.2.4 Architectural Properties 620
12.3 MFSM: An Architectural Middleware 622
12.3.1 MFSM overview 623
12.3.2 A First Image Manipulation Example 626
12.3.3 Custom Elements 634
12.3.4 A Shared Memory Access Example 643
12.4 Conclusion 648
12.4.1 Summary 648

12.4.2 Perspectives 649
12.5 Acknowledgments 651
Bibliography 651
PREFACE
One of the major changes instituted at the 2001 Conference on Computer
Vision and Pattern Recognition (CVPR) in Kauai, HI was the replacement
of the traditional tutorial sessions with a set of short courses. The topics
of these short courses were carefully chosen to reflect the diversity in com-
puter vision and represent very promising areas. The response to these short
courses was a very pleasant surprise, with up to more than 200 people attend-
ing a single short course. This overwhelming response was the inspiration
for this book.
There are three parts in this book. The first part covers some of the more
fundamental aspects of computer vision, the second describes a few interest-
ing applications, and third details specific approaches to facilitate program-
ming for computer vision. This book is not intended to be a comprehensive
coverage of computer vision; it can, however, be used as a complement to
most computer vision textbooks.
A unique aspect of this book is the accompanying DVD which features
videos of lectures by the contributors. We feel that these lectures would be
very useful for readers as quick previews of the topics covered in the book.
In addition, these lectures are much more effective in depicting results in the
form of video or animations, compared to printed material.
We would like to thank all the contributors for all their hard work, and
Bernard Goodwin for his support and enthusiasm for our book project. The
USC Distance Education Network helped to tape and produce the lectures
and Bertran Harden tirelessly assembled all the multimedia content onto a
DVD. We are also grateful to P. Anandan and Microsoft Corporation for the
financial support used to defray some of the lecture production costs.
G´erard Medioni, University of Southern California

Sing Bing Kang, Microsoft Research
November, 2003
ix
CONTRIBUTORS
Gary Bradski
Mgr: Machine Learning Group
Intel Labs
SC12-303
2200 Mission College Blvd.
Santa Clara, CA 95052-8119
USA
www.intel.com/research/mrl/research/opencv
www.intel.com/research/mrl/research/media-visual.htm
Paul Debevec
USC Institute for Creative Technologies
13274 Fiji Way, 5th Floor
Marina del Rey, CA 90292
USA
/>Alexandre R.J. Fran¸cois
PHE-222 MC-0273
Institute for Robotics and Intelligent Systems
University of Southern California
Los Angeles, CA 90089-0273
USA

iris.usc.edu/ afrancoi
x
Contributors xi
Theo Gevers
University of Amsterdam

Kruislaan 403
1098 SJ Amsterdam
The Netherlands

gevers/
Anders Heyden
Centre for Mathematical Sciences
Lund University
Box 118
SE-221 00 Lund
Sweden

www.maths.lth.se/matematiklth/personal/andersp/
Mathias K¨olsch
Computer Science Department
University of California
Santa Barbara, CA 93106
USA

/>Stan Z. Li
Microsoft Research Asia
5/F, Beijing Sigma Center
No. 49, Zhichun Road, Hai Dian District
Beijing, China 100080

www.research.microsoft.com/∼szli
Juwei Lu
Bell Canada Multimedia Lab
University of Toronto
Bahen Centre for Information Technology

Room 4154, 40 St George Str.
Toronto, ON, M5S 3G4
Canada

www.dsp.utoronto.ca/∼juwei/
xii Contributors
Gerard Medioni
SAL 300, MC-0781
Computer Science Department
University of Southern California
Los Angeles, CA 90089-0781
USA

iris.usc.edu/home/iris/medioni/User.html
Peter Meer
Electrical and Computer Engineering Department
Rutgers University
94 Brett Road
Piscataway, NJ 08854-8058
USA

www.caip.rutgers.edu/∼meer
Philippos Mordohai
PHE 204, MC-0273
3737 Watt Way
Los Angeles, CA 90089-0273
USA

iris.usc.edu/home/iris/mordohai/User.html
Marc Pollefeys

Department of Computer Science
University of North Carolina
Sitterson Hall, CB#3175
Chapel Hill, NC 27599-3175
USA

www.cs.unc.edu/∼marc/
Doug Roble
Digital Domain
300 Rose Av
Venice, CA 90291
USA
www.d2.com
Contributors xiii
Arnold Smeulders
ISIS group, University of Amsterdam
Kruislaan 403
1098SJ AMSTERDAM
The Netherlands

www.science.uva.nl/isis/
Matthew Turk
Computer Science Department
University of California
Santa Barbara, CA 93106
USA

www.cs.ucsb.edu/∼mturk
Zhengyou Zhang
Microsoft Corporation

One Microsoft Way
Redmond, WA 98052
USA

www.research.microsoft.com/∼zhang/
Chapter 1
INTRODUCTION
The topics in this book were handpicked to showcase what we consider to
be exciting and promising in computer vision. They are a mix of more
well-known and traditional topics (such as camera calibration, multi-view
geometry, and face detection), and newer ones (such as vision for special
effects and tensor voting framework). All have the common denominator of
either demonstrated longevity or potential for endurance in computer vision,
when the popularity of a number of areas have come and gone in the past.
1.1 Organization
The book is organized into three sections, covering various fundamentals,
applications, and programming aspects of computer vision.
The fundamentals section consists of four chapters. Two of the chapters
deal with the more conventional but still popular areas: camera calibration
and multi-view geometry. They deal with the most fundamental operations
associated with vision. The chapter on robust estimation techniques will be
very useful for researchers and practitioners of computer vision alike. There
is also a chapter on a more recent tool (namely the tensor voting framework)
developed that can be customized for a variety of problems.
The applications section covers two more recent applications (image-
based lighting and vision for visual effects) and three in more conventional
areas (image seach engines, face detection and recognition, and perceptual
interfaces).
One of the more overlooked area in computer vision is the programming
aspect of computer vision. While there are generic commercial packages that

can be used, there exists popular libraries or packages that are specifically
geared for computer vision. The final section of the book describes two
different approaches to facilitate programming for computer vision.
1
2 Introduction Chapter 1
1.2 How to Use the Book
The book is designed to be accompanying material to computer vision text-
books.
Each chapter is designed to be self-contained, and is written by well-
known authorities in the area. We suggest that the reader watch the lecture
first before reading the chapter, as the lecture (given by the contributor)
provides an excellent overview of the topic.
1.3 Contents of DVDs
The two DVDs are organized by chapter as follows:
– Chap. 2: Camera Calibration (Z. Zhang) – VS
– Chap. 3: Multiple View Geometry (A. Heyden, M. Pollefeys) – VS
– Chap. 4: Robust Techniques for Computer Vision (P. Meer) – VS
– Chap. 5: The Tensor Voting Framework (G. Medioni, P. Mordohai) –
VS
– Chap. 6: Image Based Lighting (P.E. Debevec) – VSC
– Chap. 7: Computer Vision in Visual Effects (D. Roble) – SC
– Chap. 8: Content Based Image Retrieval: An Overview (T. Gevers,
A.W.M. Smeulders) – V
– Chap. 9: Face Detection, Alignment and Recognition (S.Z. Li, J. Lu)
– V
– Chap. 10: Perceptual Interfaces (M. Turk, M. K¨olsch) – VS
– Chap. 11: Open Source Computer Vision Library (G. Bradski) – SP
– Chap. 12: Software Architecture for Computer Vision (A.R.J. Fran¸cois)
– VS
(Note: V=video presentation, S=slides in PDF format, C=color images

in both BMP and PDF formats, P=project and source code.)
SECTION I:
FUNDAMENTALS IN
COMPUTER VISION
It is only fitting that we start with some of the more fundamental concepts
in computer vision. The range of topics covered in this section is wide:
camera calibration, structure from motion, dense stereo, 3D modeling, robust
techniques for model fitting, and a more recently developed concept called
tensor voting.
In Chapter 2, Zhang reviews the different techniques for calibrating a
camera. More specifically, he describes calibration techniques that use 3D
reference objects, 2D planes, and 1D lines, as well as self-calibration tech-
niques.
One of more popular (and difficult) areas in computer vision is stereo.
Heyden and Pollefeys describe how camera motion and scene structure can
be reliably extracted from image sequences in Chapter 3. Once this is ac-
complished, dense depth distributions can be extracted for 3D surface recon-
struction and image-based rendering applications.
A basic task in computer vision is hypothesizing models (e.g., 2D shapes)
and using input data (typically image data) to corroborate and fit the models.
In practice, however, robust techniques for model fitting must be used to
handle input noise. In Chapter 4, Meer describes various robust regression
techniques such as M-estimators, RANSAC, and Hough transform. He also
covers the mean shift algorithm for the location estimation problem.
The claim by Medioni and his colleagues that computer vision problems
can be addressed within a Gestalt framework is the basis of their work on
tensor voting. In Chapter 5, Medioni and Mordohai provide an introduction
to the concept of tensor voting, which is a form of binning according to
3
4

Section I:
Fundamentals in Computer Vision
proximity to ideal primitives such as edges and points. They show how this
scheme can be applied to a variety of applications, such as curve and surface
extraction from noisy 2D and 3D points (respectively), stereo matching, and
motion-based grouping.
Chapter 2
CAMERA CALIBRATION
Zhengyou Zhang
Camera calibration is a necessary step in 3D computer vision in order to
extract metric information from 2D images. It has been studied extensively
in computer vision and photogrammetry, and even recently new techniques
have been proposed. In this chapter, we review the techniques proposed
in the literature include those using 3D apparatus (two or three planes or-
thogonal to each other, or a plane undergoing a pure translation, etc.), 2D
objects (planar patterns undergoing unknown motions), 1D objects (wand
with dots) and unknown scene points in the environment (self-calibration).
The focus is on presenting these techniques within a consistent framework.
2.1 Introduction
Camera calibration is a necessary step in 3D computer vision in order to
extract metric information from 2D images. Much work has been done,
starting in the photogrammetry community (see [3, 6] to cite a few), and
more recently in computer vision ([12, 11, 33, 10, 37, 35, 22, 9] to cite a few).
According to the dimension of the calibration objects, we can classify those
techniques roughly into three categories.
3D reference object based calibration. Camera calibration is performed
by observing a calibration object whose geometry in 3-D space is known
with very good precision. Calibration can be done very efficiently [8].
The calibration object usually consists of two or three planes orthog-
onal to each other. Sometimes, a plane undergoing a precisely known

translation is also used [33], which equivalently provides 3D reference
points. This approach requires an expensive calibration apparatus and
5
6 Camera Calibration Chapter 2
an elaborate setup.
2D plane based calibration. Techniques in this category requires to ob-
serve a planar pattern shown at a few different orientations [42, 31].
Different from Tsai’s technique [33], the knowledge of the plane motion
is not necessary. Because almost anyone can make such a calibration
pattern by him/her-self, the setup is easier for camera calibration.
1D line based calibration. Calibration objects used in this category are
composed of a set of collinear points [44]. As will be shown, a camera
can be calibrated by observing a moving line around a fixed point, such
as a string of balls hanging from the ceiling.
Self-calibration. Techniques in this category do not use any calibration
object, and can be considered as 0D approach because only image
point correspondences are required. Just by moving a camera in a
static scene, the rigidity of the scene provides in general two con-
straints [22, 21] on the cameras’ internal parameters from one camera
displacement by using image information alone. Therefore, if images
are taken by the same camera with fixed internal parameters, cor-
respondences between three images are sufficient to recover both the
internal and external parameters which allow us to reconstruct 3-D
structure up to a similarity [20, 17]. Although no calibration objects
are necessary, a large number of parameters need to be estimated, re-
sulting in a much harder mathematical problem.
Other techniques exist: vanishing points for orthogonal directions [4, 19],
and calibration from pure rotation [16, 30].
Before going further, I’d like to point out that no single calibration tech-
nique is the best for all. It really depends on the situation a user needs to

deal with. Following are my few recommendations:
– Calibration with apparatus vs. self-calibration. Whenever possible, if
we can pre-calibrate a camera, we should do it with a calibration appa-
ratus. Self-calibration cannot usually achieve an accuracy comparable
with that of pre-calibration because self-calibration needs to estimate a
large number of parameters, resulting in a much harder mathematical
problem. When pre-calibration is impossible (e.g., scene reconstruction
from an old movie), self-calibration is the only choice.
– Partial vs. full self-calibration. Partial self-calibration refers to the
case where only a subset of camera intrinsic parameters are to be cal-
Section 2.2. Notation and Problem Statement 7
ibrated. Along the same line as the previous recommendation, when-
ever possible, partial self-calibration is preferred because the number
of parameters to be estimated is smaller. Take an example of 3D re-
construction with a camera with variable focal length. It is preferable
to pre-calibrate the pixel aspect ratio and the pixel skewness.
– Calibration with 3D vs. 2D apparatus. Highest accuracy can usually be
obtained by using a 3D apparatus, so it should be used when accuracy is
indispensable and when it is affordable to make and use a 3D apparatus.
From the feedback I received from computer vision researchers and
practitioners around the world in the last couple of years, calibration
with a 2D apparatus seems to be the best choice in most situations
because of its ease of use and good accuracy.
– Calibration with 1D apparatus. This technique is relatively new, and it
is hard for the moment to predict how popular it will be. It, however,
should be useful especially for calibration of a camera network. To
calibrate the relative geometry between multiple cameras as well as
their intrinsic parameters, it is necessary for all involving cameras to
simultaneously observe a number of points. It is hardly possible to
achieve this with 3D or 2D calibration apparatus

1
if one camera is
mounted in the front of a room while another in the back. This is not
a problem for 1D objects. We can for example use a string of balls
hanging from the ceiling.
This chapter is organized as follows. Section 2.2 describes the camera
model and introduces the concept of the absolute conic which is important
for camera calibration. Section 2.3 presents the calibration techniques using
a 3D apparatus. Section 2.4 describes a calibration technique by observing a
freely moving planar pattern (2D object). Its extension for stereo calibration
is also addressed. Section 2.5 describes a relatively new technique which uses
a set of collinear points (1D object). Section 2.6 briefly introduces the self-
calibration approach and provides references for further reading. Section 2.7
concludes the chapter with a discussion on recent work in this area.
2.2 Notation and Problem Statement
We start with the notation used in this chapter.
1
An exception is when those apparatus are made transparent; then the cost would be
much higher.
8 Camera Calibration Chapter 2
2.2.1 Pinhole Camera Model
C
C
θ
θ
α
β
),(
00
v

u










=
Z
Y
X
M
m
m
),( t
R
Figure 2.1. Pinhole camera model
A 2D point is denoted by m =[u, v]
T
. A 3D point is denoted by M =
[X, Y, Z]
T
. We use

x to denote the augmented vector by adding 1 as the last
element:


m =[u, v, 1]
T
and

M =[X, Y, Z, 1]
T
. A camera is modeled by the
usual pinhole (see Figure 2.1): The image of a 3D point M, denoted by m is
formed by an optical ray from M passing through the optical center C and
intersecting the image plane. The three points M, m, and C are collinear. In
Figure 2.1, for illustration purpose, the image plane is positioned between
the scene point and the optical center, which is mathematically equivalent
to the physical setup under which the image plane is in the other side with
respect to the optical center. The relationship between the 3D point M and
its image projection m is given by
s

m = A

Rt


 
P

M ≡ P

M , (2.1)
with A =



αγu
0
0 βv
0
00 1


(2.2)
and P = A

Rt

(2.3)
where s is an arbitrary scale factor, (R, t), called the extrinsic parameters,
is the rotation and translation which relates the world coordinate system to
the camera coordinate system, and A is called the camera intrinsic matrix,
with (u
0
,v
0
) the coordinates of the principal point, α and β the scale factors
Section 2.2. Notation and Problem Statement 9
in image u and v axes, and γ the parameter describing the skew of the
two image axes. The 3 ×4 matrix P is called the camera projection matrix,
which mixes both intrinsic and extrinsic parameters. In Figure 2.1, the angle
between the two image axes is denoted by θ, and we have γ = α cot θ. If the
pixels are rectangular, then θ =90


and γ =0.
The task of camera calibration is to determine the parameters of the
transformation between an object in 3D space and the 2D image observed by
the camera from visual information (images). The transformation includes
– Extrinsic parameters (sometimes called external parameters): orienta-
tion (rotation) and location (translation) of the camera, i.e., (R, t);
– Intrinsic parameters (sometimes called internal parameters): charac-
teristics of the camera, i.e., (α, β, γ, u
0
,v
0
).
The rotation matrix, although consisting of 9 elements, only has 3 degrees
of freedom. The translation vector t obviously has 3 parameters. Therefore,
there are 6 extrinsic parameters and 5 intrinsic parameters, leading to in
total 11 parameters.
We use the abbreviation A
−T
for (A
−1
)
T
or (A
T
)
−1
.
2.2.2 Absolute Conic
Now let us introduce the concept of the absolute conic. For more details,
the reader is referred to [7, 15].

A point x in 3D space has projective coordinates

x =[x
1
,x
2
,x
3
,x
4
]
T
.
The equation of the plane at infinity, Π

,isx
4
= 0. The absolute conic Ω
is defined by a set of points satisfying the equation
x
2
1
+ x
2
2
+ x
2
3
=0
x

4
=0.
(2.4)
Let x

=[x
1
,x
2
,x
3
]
T
be a point on the absolute conic (see Figure 2.2).
By definition, we have x
T

x

= 0. We also have

x

=[x
1
,x
2
,x
3
, 0]

T
and

x
T


x

= 0. This can be interpreted as a conic of purely imaginary points
on Π

. Indeed, let x = x
1
/x
3
and y = x
2
/x
3
be a point on the conic, then
x
2
+ y
2
= −1, which is an imaginary circle of radius

−1.
An important property of the absolute conic is its invariance to any rigid
transformation. Let the rigid transformation be H =


Rt
0 1

. Let x

be
a point on Ω. By definition, its projective coordinates:

x

=

x

0

with
10 Camera Calibration Chapter 2
P
l
a
ne

a
t

i
nfi
ni

ty
0
1
=

−−

mAAm
TT
C

m

x
Absolute Conic
0=
∞∞
xx
T
Image of
Absolute Conic
Figure 2.2. Absolute conic and its image
x
T

x

= 0. The point after the rigid transformation is denoted by x



, and

x


= H

x

=

Rx

0

.
Thus, x


is also on the plane at infinity. Furthermore, x


is on the same
Ω because
x
T

x



=(Rx

)
T
(Rx

)=x
T

(R
T
R)x

=0.
The image of the absolute conic, denoted by ω, is also an imaginary conic,
and is determined only by the intrinsic parameters of the camera. This can
be seen as follows. Consider the projection of a point x

on Ω, denoted by

×