Video Coding
Associate Prof. Nguyen Chan Hung
Head of Research and Development of Multimedia Technology Laboratory (RDLAB)
Hanoi University of Science and Technology
Agenda
Coding process
Video coding standards
Quality evaluation
Open issues
1
Introduction (1/2)
Why video compression technique is
important ?
One movie video without compression
◦
◦
◦
◦
◦
720 x 480 pixels per frame
30 frames per second
Total 90 minutes
Full color
The full data quantity = 167.96 G bytes !!
3
Introduction (2/2)
What is the difference between video
compression and image compression?
◦ Temporal Redundancy
Coding method to remove redundancy
◦ Intraframe Coding
Remove spatial redundancy
◦ Interframe Coding
Remove temporal redundancy
4
2
Desired Features
Better compression
Improved quality
Interactivity and Manipulation of Content
Error Resilience
Processing of content in the compressed
domain
Identification and selective
coding/decoding of the object of interest
Facilitate Search / Indexing (MPEG-7)
Time table
H.26L
H.264
H.263
H.261
MPEG4
MPEG2/H.262
MPEG1
JPEG
Year
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
6
3
Where is MPEG used?
Most probably.
◦ MPEG-1
Video-CD
Usually .mpg or .mpeg files are MPEG-1
DAB Digital Radio is MP2 (MPEG-1 Layer 2)
MP3 files (MPEG-1 Layer 3)
◦ MPEG-2:
.vob, .m2v, rarely .mpg files
Anything to do with DVD
Camcorders, DVD players, DVD recorders, TiVo
Digital TV
◦ MPEG-4:
High Quality AVI files
Video Phones
DivX
Some advanced audio players support MPEG-4 Advanced Audio Coding (AAC)
◦ NetMeeting and similar video-chat
H.263/+/++
◦ H.264
Some content has appeared recently, mainly trailers
R-D Performance of MPEG Codecs
50
48
H264
PSNR (Y)
46
44
MPEG-4
42
MPEG-2
MPEG-1
40
38
36
34
32
350
450
550
650
750
850
950
1050
Bit rate (kbps)
MPEG-1
MPEG-2
MPEG-4
H.264
4
CODEC Design
The most intuitive method to remove
temporal redundancy
3-Dimensional DCT
◦ Remove spatiotemporal correlation
◦ Good for low motion video
◦ Bad for high motion video
F ( x, y, t ) =
N −1 N −1 N −1
2
π (2 x + 1)u
π (2 y + 1)v
π (2t + 1) w
C (u )C (v)C ( w)∑∑∑ Ψ ( x, y, t ) cos
cos 2 N
cos 2 N
N
2N
t = 0 x =0 y = 0
for u = 0,..., N − 1 ,v = 0,..., N − 1 and w = 0,..., N − 1
1/ 2 for k = 0
where N = 8 and C (k ) =
1 otherwise
10
5
(From Princeton EE330 S’01 by B.Liu)
“Horse ride”
Pixel-wise difference w/o motion compensation
Motion estimation
Residue after motion compensation
Motion Estimation
Help understanding the content of image sequence
◦ For surveillance
Help reduce temporal redundancy of video
◦ For compression
Stabilizing video by detecting and removing small, noisy global
motions
◦ For building stabilizer in camcorder
6
Motion Compensation
It aims to reduce the data transmitted by detecting the
motion of objects
◦ Use the previous as reference
◦ In steps:
Split the current frame in blocks. For each one:
Find the best-matching block in the reference frame
The best matching block is coded and transmitted
◦ Next frame can be used a reference too
Hybrid MCMC-DCT Video Encoder
• Intra-frame: encoded without prediction
• Inter-frame: predictively encoded => use quantized frames as ref for residue
7
The Exhaustive Block-Matching
Algorithm
Intensive computation
◦ Not suitable for implementation
◦ Fast Algorithm is necessary
15
Fast Algorithms for Block Matching
Basic ideas
◦ Matching errors near the best match are generally smaller than far away
◦ Skip candidates that are unlikely to give good match
8
Fast BlockBlock-Matching Algorithms
The characteristics of fast algorithm
◦ Not accurate compared with the exhaustive
method
◦ Save large computation
Two famous fast algorithm
◦ Coarse-Fine Three Steps Search Method
◦ 2-D logarithm Search Method
17
Three Steps Search Method (TSS)
Introduced by Koga et al in 1981.
◦ Very popular because of its simplicity and
also robust and near optimal performance.
◦ Searches for the best motion vectors in a
coarse to fine search pattern.
The algorithm:
Step 1: An initial step size is picked.
Eight blocks at a distance of step size
from the centre (around the centre
block) are picked for comparison.
Step 2: The step size is halved. The
centre is moved to the point with the
minimum distortion.
Steps 1 and 2 are repeated till the
step size becomes smaller than 1. A
particular path for the convergence of
this algorithm is shown below:
18
9
2-D logarithm Search Method (TDL)
Introduced by Jain & Jain
requires
more computation, more accurate,
especially when the search window is
large
Step 1: Pick an initial step size. Look at
the block at the Centro the search are
and the four blocks at a distance of s
from this on the X and Y axes. (the five
positions form a + sign)
Step 2 : If the position of best match
is at the centre, halve the step size. If
however, one of the other four points
is the best match, then it becomes the
centre and step 1 is repeated.
Step 3: When the step size becomes
1, all the nine blocks around the centre
are chosen for the search and the best
among them is picked as the required
block.
19
The MPEGMPEG-1 Standard
Group of Pictures
Motion Estimation
Motion Compensation
Differential Coding
DCT
Quantization
Entropy Coding
20
10
Group of Pictures (1/2)
I-frame (Intracoded Frame)
◦ Coded in one frame such as DCT.
◦ This type of frame do not need previous frame
P-frame (Predictive Frame)
◦ One directional motion prediction from a previous frame
The reference can be either I-frame or P-frame
◦ Generally referred to as inter-frame
B-frame (Bi-directional predictive frame)
◦ Bi-directional motion prediction from a previous or future frame
The reference can be either I-frame or P-frame
◦ Generally referred to as inter-frame
21
Group of Pictures (2/2)
The distance between two nearest P-frame or P-frame
and I-frame
◦ denoted by M
The distance between the nearest I-frame
◦ denoted by N
22
11
MPEG--1 = JPEG + Motion Prediction + Rate Control
MPEG
Early motivation: to encode motion video at 1.5Mbits/s for transport over T1
data circuits and for replay from CD-ROM
Defines the decoder but not the encoder
Frames (pictures)
◦ Intra-coded using JPEG
◦ Inter-coded using (interpolated)
motion estimation & compensation
and JPEG for the residuals
Predicted and Bi-directional
MacroBlocks (MBs)
◦ 16×16 pixels block
Rate control
◦ buffer at each end
◦ Test Model 5 (TM5)
23
MPEG--1 – Motion Prediction
MPEG
Motion prediction = motion estimation + error compensation
24
12
MPEG--2 = MPEGMPEG
MPEG-1 + …
Improvements
◦
◦
◦
◦
Color space: could support 4:2:2 and 4:4:4 coding
Quantization: could have 9- or 10- bit precision for DC coefficients
Concealment motion vectors: used when an intra-MB is lost
Pan and Scan: supports display of different aspect ratios, e.g., 16:9
Profiles and levels
◦ Profiles: define the tools or syntactical elements
◦ Levels: define the permissible ranges of parameters
Interlace tools
Scalable coding profiles
System layer: define two bit stream constructs
◦ Program stream (PS): modeled on MPEG-1 (backward compatibility)
◦ Transport stream (TS): more robust, does not need a common time base,
designed for use in error-prone environment.
25
The MPEGMPEG-2 Standard
The main encoder structure is similar to
that of the MPEG-1 standard
Field/frame DCT coding
Field/frame prediction mode selection
Alternative scan order
Various picture sampling formats
User defined quantization matrix
26
13
MPEG – Scalable Coding (SC)
Non-scalable coding
◦ To optimize video quality at a given
bit rate.
Base and enhancement layer SC
◦ To optimize video quality at two given
bit rates.
◦ SNR SC (different quantization accuracy)
◦ Temporal SC (different frame rates)
◦ Spatial SC (different spatial resolution)
Fine granularity scalability (FGS)
◦ To optimize the video quality over a given bit rate
range
◦ Also has base layer and enhancement layer
◦ Enhancement layer uses bit-plane coding
Bit-plane coding considers each quantized DCT
coefficient as a binary integer of several bits
instead of a decimal integer of a certain value
Frequency weighting and selective enhancement
2-layer SNR scalable coder
27
Field/Frame DCT Coding
The field type DCT
◦ Fast motion video
The frame type DCT
◦ Slow motion video
28
14
Alternative Scan Order
Zigzag scan order
◦ Frame DCT
Alternative scan order
◦ Field DCT
29
The MPEGMPEG-2 Encoder (1/2)
Base Layer
◦ Basic quality requirement
◦ For SDTV
Enhanced Layer
◦ High quality service
◦ For HDTV
30
15
The MPEGMPEG-2 Encoder (2/2)
Quantization
◦ User can change the quantization if necessary
◦ Quantization matrix
Various picture sampling formats
◦ 4:4:4
◦ 4:2:2
◦ 4:2:0
8
16
19
22
Qintra =
22
26
26
27
16 19 22 26 27 29 34
16 22 24 27 29 34 37
22 26 27 29 34 34 38
22 26 27 29 34 37 40
26 27 29 32 35 40 48
27 29 32 35 40 48 58
27 29 34 38 46 56 69
29 35 38 46 56 69 83
31
MPEG-4 = MPEGMPEGMPEG-2+Objects+Other
Enhancements
Objects (optional)
◦ Video (texture+shape), image, audio, speech, text, etc.
◦ Encoded using different techniques
◦ Transmitted independently
◦ Composited at the decoder using BInary Format for Scenes (BIFS)
Improvements in MPEG-4 version2
◦ Global motion compensation (GMC)
◦ Quarter pixel motion compensation
◦ Shape-adaptive DCT
Why is MPEG-4 not a success as MPEG-2?
◦ Not substantially better than MPEG-2
◦ Issue of licensing
32
16
MPEG--4 – Error Resilience Tools
MPEG
Video packet resynchronization
◦ Previous coding standards: Resynchronization markers are fixed at the beginning of each
row of MBs
◦ MPEG-4: Resynchronization markers are inserted at every K bits
Data partitioning
◦ Partitions the data in a video packet into a motion part and a texture part separated by a
motion boundary marker (MBM)
Reversible variable length codes (RVLC)
◦ Finds the next resynchronization marker and decode backwards
Header extension code (HEC)
◦ The header information is repeated after the 1-bit HEC
Unequal error protection technique (UEP)
I-VOP
A video
packet
VP
Header
Resync.
marker
DC DCT
data
MB
No.
QP
AC DCT
data
HEC
Repeated
header info.
P-VOP
Motion
data
VP
Header
MBM
Motion
data
DCT
use
Texture
data
discard
data
use
33
Advanced Video Coding/ ITUITU-T Recommendation
H.264/ ISO/IEC MPEGMPEG-4 (Part 10)
H.264 structure
◦ Video coding layer (VCL)
◦ Network abstraction layer (NAL)
Possible applications of H.264
◦ Conversational services operated
below 1Mbps with low latency.
◦ Entertainment services operated between 1-8+ Mbps with moderate latency
such as 0.5-2s in modified MPEG-2/H.222.0 systems.
Broadcast via satellite, cable, terrestrial or DSL
DVD for standard and high-definition video
Video-on-demand via various channels
◦ Streaming services operated at 50-1500kbps with 2s or more of latency.
34
17
New Features of H.264
Multi-mode, multi-reference MC
Motion vector can point out of image border
1/4-, 1/8-pixel motion vector precision
B-frame prediction weighting
4×4 integer transform
Multi-mode intra-prediction
In-loop de-blocking filter
UVLC (Uniform Variable Length Coding)
NAL (Network Abstraction Layer)
SP-slices
Profiles and Levels
Profiles: Baseline, Main, and X
◦ Baseline: Progressive, Videoconferencing & Wireless
◦ Main: esp. Broadcast
◦ X: Mobile network
Baseline profile is the minimum implementation
◦ Without CABAC, 1/8 MC, B-frame, SP-slices
11 levels
◦ Resolution, capability, bit rate, buffer, reference #
◦ Built to match popular international production and
emission formats
◦ From QCIF to D-Cinema
18
Basic Macroblock Coding Structure
Input
Video
Signal
Coder
Control
Transform/
Scal./Quant.
Split into
Macroblocks
16x16 pixels
Control
Data
Decoder
Quant.
Transf. coeffs
Scaling & Inv.
Transform
Entropy
Coding
Intra-frame
Prediction
Intra/Inter
De-blocking
Filter
MotionCompensation
Output
Video
Signal
Motion
Data
Motion
Estimation
Variable block size
The fixed block size may not be suitable for all
motion objects
◦ Improve the flexibility of comparison
◦ Reduce the error of comparison
7 types of blocks for selection
◦ 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, 4×4
38
19
Motion Compensation
Input
Video
Signal
Coder
Control
Transform/
Scal./Quant.
Split into
Macroblocks
16x16 pixels
Control
Data
Decoder
Quant.
Transf. coeffs
Scaling & Inv.
Transform
Entropy
Coding
Intra-frame
Prediction
Intra/Inter
De-blocking
16x16
Filter
MB
0
Types
MotionCompensation
8x8
8x8
Types
Motion
Estimation
16x8
0
Output1
Video
8x4
Signal
0
8x16
0
1
4x8
0 1
Motion
1
Data
Various block sizes and shapes
0
8x8
0 1
2
3
4x4
0 1
2
3
Multiple Reference Frames
The neighboring frames are not the most similar in
some cases
The B-frame can be reference frame
◦ B-frame is close to the target frame in many situations
40
20
Multiple Reference Frames
Input
Video
Signal
Coder
Control
Control
Data
Transform/
Scal./Quant.
-
Decoder
Split into
Macroblocks
16x16 pixels
Quant.
Transf. coeffs
Scaling & Inv.
Transform
Entropy
Coding
Intra-frame
Prediction
Intra/Inter
De-blocking
Filter
Output
Video
Signal
MotionCompensation
Motion
Motion
Estimation
Multiple Reference Data
Frames for
Motion Compensation
B-frame Prediction Weighting
Time
I0
B1
B2
B3
P4
B5
B6
Playback order: I0 B1 B2 B3 P4 B5 B6 ……...
Bitstream order: I0 P4 B1 B3 B2 P8
B5 ……...
21
Intra Prediction
Predict the similarity between the
neighboring pixels in one frame in
advance, and exploit transform coding to
remove the redundancy.
43
Intra--Coded Macroblocks
Intra
H.264
Prediction in
space domain
Spatial prediction
Encode the prediction modes
(Use predictive coding if 4x4
modes are used)
Integer transform of residue
Transform
Quantization
Quantization including scaling
No coefficient prediction
Prediction in
frequency
domain
MPEG-1/2/4, H.261/3
No spatial prediction
8x8 Discrete Cosine
Transform (DCT) for pixel
values
Quantization
Coefficient prediction (for
DC values in MPEG-2 and
AC values in the first row
and column in MPEG-4)
44
22
Spatial Prediction for IntraIntra-Coded MBs
luma
- 4x4:
M A B C D
I
9 modes
J
K
L
M A B C D
I
J
K
L
V
……..
V
H
H
……..
- 16x16: 4 modesH
M A B C D
I
Mean
J
(A-D,
K
I-M)
L
Mean
(H, V)
V
M A B C D E F G H
I
J
K
L
…
H
V
chroma
- 8x8:
4modesH
H
V
……..
V
Mean
(H, V)
H
V
……..
H
V
- The same prediction mode is always applied to both chroma
blocks
45
23