Bài giảng Video Coding

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.09 MB, 23 trang )

Video Coding
Associate Prof. Nguyen Chan Hung
Head of Research and Development of Multimedia Technology Laboratory (RDLAB)
Hanoi University of Science and Technology

Agenda

Coding process
Video coding standards
Quality evaluation
Open issues

1

Introduction (1/2)
Why video compression technique is
important ?
One movie video without compression
◦
◦
◦
◦
◦

720 x 480 pixels per frame
30 frames per second
Total 90 minutes
Full color
The full data quantity = 167.96 G bytes !!

3

Introduction (2/2)
What is the difference between video
compression and image compression?
◦ Temporal Redundancy

Coding method to remove redundancy
◦ Intraframe Coding
Remove spatial redundancy

◦ Interframe Coding
Remove temporal redundancy

4

2

Desired Features
Better compression
Improved quality
Interactivity and Manipulation of Content
Error Resilience
Processing of content in the compressed
domain
Identification and selective
coding/decoding of the object of interest
Facilitate Search / Indexing (MPEG-7)

Time table

H.26L

H.264

H.263
H.261
MPEG4
MPEG2/H.262
MPEG1
JPEG
Year

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

6

3

Where is MPEG used?
Most probably.
◦ MPEG-1
Video-CD
Usually .mpg or .mpeg files are MPEG-1
DAB Digital Radio is MP2 (MPEG-1 Layer 2)
MP3 files (MPEG-1 Layer 3)

◦ MPEG-2:
.vob, .m2v, rarely .mpg files
Anything to do with DVD
Camcorders, DVD players, DVD recorders, TiVo

Digital TV

◦ MPEG-4:
High Quality AVI files
Video Phones
DivX
Some advanced audio players support MPEG-4 Advanced Audio Coding (AAC)

◦ NetMeeting and similar video-chat
H.263/+/++

◦ H.264
Some content has appeared recently, mainly trailers

R-D Performance of MPEG Codecs
50

48

H264

PSNR (Y)

46

44

MPEG-4

42

MPEG-2
MPEG-1

40

38

36

34

32
350

450

550

650

750

850

950

1050

Bit rate (kbps)
MPEG-1

MPEG-2

MPEG-4

H.264

4

CODEC Design

The most intuitive method to remove
temporal redundancy

3-Dimensional DCT
◦ Remove spatiotemporal correlation
◦ Good for low motion video
◦ Bad for high motion video

F ( x, y, t ) =

N −1 N −1 N −1
2
 π (2 x + 1)u 
 π (2 y + 1)v 
 π (2t + 1) w 
C (u )C (v)C ( w)∑∑∑ Ψ ( x, y, t ) cos 
 cos  2 N
 cos  2 N

N
 2N
t = 0 x =0 y = 0

for u = 0,..., N − 1 ,v = 0,..., N − 1 and w = 0,..., N − 1
1/ 2 for k = 0

where N = 8 and C (k ) = 
 1 otherwise
10

5

(From Princeton EE330 S’01 by B.Liu)

“Horse ride”

Pixel-wise difference w/o motion compensation

Motion estimation

Residue after motion compensation

Motion Estimation
Help understanding the content of image sequence
◦ For surveillance

Help reduce temporal redundancy of video
◦ For compression

Stabilizing video by detecting and removing small, noisy global
motions
◦ For building stabilizer in camcorder

6

Motion Compensation
It aims to reduce the data transmitted by detecting the
motion of objects
◦ Use the previous as reference
◦ In steps:
Split the current frame in blocks. For each one:
Find the best-matching block in the reference frame
The best matching block is coded and transmitted

◦ Next frame can be used a reference too

Hybrid MCMC-DCT Video Encoder

• Intra-frame: encoded without prediction
• Inter-frame: predictively encoded => use quantized frames as ref for residue

7

The Exhaustive Block-Matching
Algorithm
Intensive computation
◦ Not suitable for implementation
◦ Fast Algorithm is necessary

15

Fast Algorithms for Block Matching
Basic ideas

◦ Matching errors near the best match are generally smaller than far away
◦ Skip candidates that are unlikely to give good match

8

Fast BlockBlock-Matching Algorithms
The characteristics of fast algorithm
◦ Not accurate compared with the exhaustive
method
◦ Save large computation

Two famous fast algorithm
◦ Coarse-Fine Three Steps Search Method
◦ 2-D logarithm Search Method

17

Three Steps Search Method (TSS)
Introduced by Koga et al in 1981.
◦ Very popular because of its simplicity and
also robust and near optimal performance.
◦ Searches for the best motion vectors in a
coarse to fine search pattern.

The algorithm:
Step 1: An initial step size is picked.
Eight blocks at a distance of step size
from the centre (around the centre
block) are picked for comparison.

Step 2: The step size is halved. The
centre is moved to the point with the
minimum distortion.
Steps 1 and 2 are repeated till the
step size becomes smaller than 1. A
particular path for the convergence of
this algorithm is shown below:

18

9

2-D logarithm Search Method (TDL)
Introduced by Jain & Jain
requires
more computation, more accurate,
especially when the search window is
large
Step 1: Pick an initial step size. Look at
the block at the Centro the search are
and the four blocks at a distance of s
from this on the X and Y axes. (the five
positions form a + sign)
Step 2 : If the position of best match
is at the centre, halve the step size. If
however, one of the other four points
is the best match, then it becomes the
centre and step 1 is repeated.
Step 3: When the step size becomes

1, all the nine blocks around the centre
are chosen for the search and the best
among them is picked as the required
block.
19

The MPEGMPEG-1 Standard
Group of Pictures
Motion Estimation
Motion Compensation
Differential Coding
DCT
Quantization
Entropy Coding

20

10

Group of Pictures (1/2)
I-frame (Intracoded Frame)
◦ Coded in one frame such as DCT.
◦ This type of frame do not need previous frame

P-frame (Predictive Frame)
◦ One directional motion prediction from a previous frame
The reference can be either I-frame or P-frame
◦ Generally referred to as inter-frame

B-frame (Bi-directional predictive frame)
◦ Bi-directional motion prediction from a previous or future frame
The reference can be either I-frame or P-frame
◦ Generally referred to as inter-frame

21

Group of Pictures (2/2)
The distance between two nearest P-frame or P-frame
and I-frame
◦ denoted by M
The distance between the nearest I-frame
◦ denoted by N

22

11

MPEG--1 = JPEG + Motion Prediction + Rate Control
MPEG
Early motivation: to encode motion video at 1.5Mbits/s for transport over T1
data circuits and for replay from CD-ROM

Defines the decoder but not the encoder
Frames (pictures)
◦ Intra-coded using JPEG
◦ Inter-coded using (interpolated)
motion estimation & compensation
and JPEG for the residuals

Predicted and Bi-directional

MacroBlocks (MBs)
◦ 16×16 pixels block

Rate control
◦ buffer at each end
◦ Test Model 5 (TM5)
23

MPEG--1 – Motion Prediction
MPEG
Motion prediction = motion estimation + error compensation

24

12

MPEG--2 = MPEGMPEG
MPEG-1 + …
Improvements
◦
◦
◦
◦

Color space: could support 4:2:2 and 4:4:4 coding
Quantization: could have 9- or 10- bit precision for DC coefficients
Concealment motion vectors: used when an intra-MB is lost

Pan and Scan: supports display of different aspect ratios, e.g., 16:9

Profiles and levels
◦ Profiles: define the tools or syntactical elements
◦ Levels: define the permissible ranges of parameters

Interlace tools
Scalable coding profiles
System layer: define two bit stream constructs
◦ Program stream (PS): modeled on MPEG-1 (backward compatibility)
◦ Transport stream (TS): more robust, does not need a common time base,
designed for use in error-prone environment.

25

The MPEGMPEG-2 Standard
The main encoder structure is similar to
that of the MPEG-1 standard
Field/frame DCT coding
Field/frame prediction mode selection
Alternative scan order
Various picture sampling formats
User defined quantization matrix

26

13

MPEG – Scalable Coding (SC)

Non-scalable coding
◦ To optimize video quality at a given
bit rate.

Base and enhancement layer SC
◦ To optimize video quality at two given
bit rates.
◦ SNR SC (different quantization accuracy)
◦ Temporal SC (different frame rates)
◦ Spatial SC (different spatial resolution)

Fine granularity scalability (FGS)
◦ To optimize the video quality over a given bit rate
range
◦ Also has base layer and enhancement layer
◦ Enhancement layer uses bit-plane coding
Bit-plane coding considers each quantized DCT
coefficient as a binary integer of several bits
instead of a decimal integer of a certain value
Frequency weighting and selective enhancement
2-layer SNR scalable coder
27

Field/Frame DCT Coding
The field type DCT
◦ Fast motion video

The frame type DCT
◦ Slow motion video

28

14

Alternative Scan Order
Zigzag scan order
◦ Frame DCT

Alternative scan order
◦ Field DCT

29

The MPEGMPEG-2 Encoder (1/2)
Base Layer
◦ Basic quality requirement
◦ For SDTV

Enhanced Layer
◦ High quality service
◦ For HDTV

30

15

The MPEGMPEG-2 Encoder (2/2)
Quantization

◦ User can change the quantization if necessary
◦ Quantization matrix
Various picture sampling formats
◦ 4:4:4
◦ 4:2:2
◦ 4:2:0

8

16
19

22
Qintra =
22

26
26

27

16 19 22 26 27 29 34

16 22 24 27 29 34 37
22 26 27 29 34 34 38

22 26 27 29 34 37 40
26 27 29 32 35 40 48

27 29 32 35 40 48 58

27 29 34 38 46 56 69

29 35 38 46 56 69 83
31

MPEG-4 = MPEGMPEGMPEG-2+Objects+Other
Enhancements
Objects (optional)
◦ Video (texture+shape), image, audio, speech, text, etc.
◦ Encoded using different techniques
◦ Transmitted independently
◦ Composited at the decoder using BInary Format for Scenes (BIFS)

Improvements in MPEG-4 version2
◦ Global motion compensation (GMC)
◦ Quarter pixel motion compensation
◦ Shape-adaptive DCT

Why is MPEG-4 not a success as MPEG-2?
◦ Not substantially better than MPEG-2
◦ Issue of licensing

32

16

MPEG--4 – Error Resilience Tools
MPEG
Video packet resynchronization

◦ Previous coding standards: Resynchronization markers are fixed at the beginning of each
row of MBs
◦ MPEG-4: Resynchronization markers are inserted at every K bits

Data partitioning
◦ Partitions the data in a video packet into a motion part and a texture part separated by a
motion boundary marker (MBM)

Reversible variable length codes (RVLC)
◦ Finds the next resynchronization marker and decode backwards

Header extension code (HEC)
◦ The header information is repeated after the 1-bit HEC

Unequal error protection technique (UEP)
I-VOP
A video
packet

VP
Header
Resync.
marker

DC DCT
data
MB
No.

QP

AC DCT
data

HEC

Repeated
header info.

P-VOP

Motion
data

VP
Header

MBM

Motion
data

DCT
use

Texture
data

discard

data
use

33

Advanced Video Coding/ ITUITU-T Recommendation
H.264/ ISO/IEC MPEGMPEG-4 (Part 10)
H.264 structure
◦ Video coding layer (VCL)
◦ Network abstraction layer (NAL)

Possible applications of H.264
◦ Conversational services operated
below 1Mbps with low latency.
◦ Entertainment services operated between 1-8+ Mbps with moderate latency
such as 0.5-2s in modified MPEG-2/H.222.0 systems.
Broadcast via satellite, cable, terrestrial or DSL
DVD for standard and high-definition video
Video-on-demand via various channels

◦ Streaming services operated at 50-1500kbps with 2s or more of latency.

34

17

New Features of H.264
Multi-mode, multi-reference MC
Motion vector can point out of image border

1/4-, 1/8-pixel motion vector precision
B-frame prediction weighting
4×4 integer transform
Multi-mode intra-prediction
In-loop de-blocking filter
UVLC (Uniform Variable Length Coding)
NAL (Network Abstraction Layer)
SP-slices

Profiles and Levels
Profiles: Baseline, Main, and X
◦ Baseline: Progressive, Videoconferencing & Wireless
◦ Main: esp. Broadcast
◦ X: Mobile network

Baseline profile is the minimum implementation
◦ Without CABAC, 1/8 MC, B-frame, SP-slices

11 levels
◦ Resolution, capability, bit rate, buffer, reference #
◦ Built to match popular international production and
emission formats
◦ From QCIF to D-Cinema

18

Basic Macroblock Coding Structure
Input
Video

Signal

Coder
Control
Transform/
Scal./Quant.

Split into
Macroblocks
16x16 pixels

Control
Data

Decoder

Quant.
Transf. coeffs
Scaling & Inv.
Transform
Entropy
Coding

Intra-frame
Prediction

Intra/Inter

De-blocking
Filter

MotionCompensation

Output
Video
Signal
Motion
Data

Motion
Estimation

Variable block size
The fixed block size may not be suitable for all
motion objects
◦ Improve the flexibility of comparison
◦ Reduce the error of comparison

7 types of blocks for selection
◦ 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, 4×4

38

19

Motion Compensation
Input
Video
Signal

Coder
Control
Transform/
Scal./Quant.

Split into
Macroblocks
16x16 pixels

Control
Data

Decoder

Quant.
Transf. coeffs
Scaling & Inv.
Transform
Entropy
Coding

Intra-frame
Prediction

Intra/Inter

De-blocking
16x16
Filter

MB
0
Types

MotionCompensation

8x8
8x8
Types

Motion
Estimation

16x8
0
Output1
Video
8x4
Signal
0

8x16
0

1

4x8

0 1
Motion

1
Data
Various block sizes and shapes
0

8x8
0 1
2

3

4x4
0 1
2

3

Multiple Reference Frames
The neighboring frames are not the most similar in
some cases
The B-frame can be reference frame
◦ B-frame is close to the target frame in many situations

40

20

Multiple Reference Frames
Input

Video
Signal

Coder
Control

Control
Data

Transform/
Scal./Quant.

-

Decoder

Split into
Macroblocks
16x16 pixels

Quant.
Transf. coeffs
Scaling & Inv.
Transform
Entropy
Coding

Intra-frame
Prediction

Intra/Inter

De-blocking
Filter
Output
Video
Signal

MotionCompensation

Motion

Motion
Estimation

Multiple Reference Data
Frames for
Motion Compensation

B-frame Prediction Weighting

Time
I0

B1

B2

B3

P4

B5

B6

Playback order: I0 B1 B2 B3 P4 B5 B6 ……...
Bitstream order: I0 P4 B1 B3 B2 P8

B5 ……...

21

Intra Prediction
Predict the similarity between the
neighboring pixels in one frame in
advance, and exploit transform coding to
remove the redundancy.

43

Intra--Coded Macroblocks
Intra
H.264

Prediction in
space domain

Spatial prediction

Encode the prediction modes
(Use predictive coding if 4x4
modes are used)
Integer transform of residue

Transform

Quantization

Quantization including scaling
No coefficient prediction

Prediction in
frequency
domain

MPEG-1/2/4, H.261/3
No spatial prediction

8x8 Discrete Cosine
Transform (DCT) for pixel
values
Quantization
Coefficient prediction (for
DC values in MPEG-2 and
AC values in the first row
and column in MPEG-4)
44

22

Spatial Prediction for IntraIntra-Coded MBs
luma
- 4x4:

M A B C D
I

9 modes
J
K
L

M A B C D
I
J
K
L

V

……..

V

H

H

……..

- 16x16: 4 modesH

M A B C D
I
Mean
J
(A-D,
K
I-M)
L

Mean
(H, V)

V

M A B C D E F G H
I
J
K
L

…

H

V

chroma
- 8x8:

4modesH

H

V

……..

V

Mean
(H, V)

H

V

……..

H

V

- The same prediction mode is always applied to both chroma
blocks

45

Bài giảng Video Coding

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về