Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 6 2009-10-1
6 Model-Based Design for Embedded Systems
In addition, in many cases, the same simulation environment can be used
for both function and performance verifications. However, most simulation-
based performance estimation methods suffer from insufficient corner-case
coverage. This means that they are typically not able to provide worst-case
performance guarantees. Moreover, accurate simulations are often computa-
tionally expensive.
In other works [5,6], hybrid performance estimation methods have been
presented that combine simulation and analytic techniques. While these
approaches considerably shorten the simulation run-times, they still cannot
guarantee full coverage of corner cases.
To determine guaranteed performance limits, analytic methods must be
adopted. These methods provide hard performance bounds; however, they
are typically not able to model complex interactions and state-dependent
behaviors, which can result in pessimistic performance bounds.
Several models and methods for analytic performance verifications ofdis-
tributed platforms have been presented so far. These approaches are based
on essentially different abstraction concepts. The first idea was to extend
well-known results of the classical scheduling theory to distributed sys-
tems. This implies the consideration of communication delays, which cannot
be neglected in a distributed system. Such a combined analysis of proces-
sor and bus scheduling is often referred to as holistic scheduling analysis.
Rather than a specific performance analysis method, holistic scheduling is
a collection of techniques for the analysis of distributed platforms, each of
which is tailored toward a particular combination of an event stream model,
a resource-sharing policy, and communication arbitration (see [10,11,15] as
examples). Several holistic analysis techniques are aggregated and imple-
mented in the modeling and analysis suite for real-time applications (MAST)
[3].
∗
In [12], a more general approach to extend the concepts of the classical
scheduling theory to distributed systems was presented. In contrast to holis-
tic approaches that extend the monoprocessor scheduling analysis to special
classes of distributed systems, this compositional method applies existing
analysis techniques in a modular manner: the single components of a dis-
tributed system are analyzed with classical algorithms, and the local results
are propagated through the system by appropriate interfaces relying on a
limited set of event stream models.
In this chapter, we will describe a different analytic and modular
approach for performance prediction that does not rely on the classical
scheduling theory. The method uses real-time calculus [13] (RTC), which
extends the basic concepts of network calculus [7]. The corresponding mod-
ular performance analysis (MPA) framework [1] analyzes the flow of event
streams through a network of computation and communication resources.
∗
Available as Open Source software at
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 7 2009-10-1
Performance Prediction of Distributed Platforms 7
1.2 Application Scenario
In this section, we introduce the reader to the system-level performance
analysis by means of a concrete application scenario from the area of video
processing. Intentionally, this example is extremely simple in terms of the
underlying hardware platform and the application model. On the other
hand, it allows us to introduce the concepts that are necessary for a com-
positional performance analysis (see Section 1.4).
The example system that we consider is a digital set-top box for the
decoding of video streams. The architecture of the system is depicted in
Figure 1.2. The set-top box implements a picture-in-picture (PiP) application
that decodes two concurrent MPEG-2 video streams and displays them on
the same output device. The upper stream, V
HR
, has a higher frame reso-
lution and is displayed in full screen whereas the lower stream, V
LR
,hasa
lower frame resolution and is displayed in a smaller window at the bottom
left edge of the screen.
The MPEG-2 video decoding consists of the following tasks: variable
length decoding (VLD), inverse quantization (IQ), inverse discrete cosine
transformation (IDCT), and motion compensation (MC). In the considered
set-top box, the decoding application is partitioned onto three processors:
CPU
1
,CPU
2
,andCPU
3
. The tasks VLD and IQ are mapped onto CPU
1
for the first video stream (process P
1
)andontoCPU
2
for the second video
stream (process P
3
). The tasks IDCT and MC are mapped onto CPU
3
for both
video streams (processes P
2
and P
4
). A pre-emptive fixed priority scheduler
is adopted for the sharing of CPU
3
between the two streams, with the upper
stream having higher priority than the lower stream. This reflects the fact
that the decoder gives a higher quality of service (QoS) to the stream with a
higher frame resolution, V
HR
.
As shown in the figure, the video streams arrive over a network and enter
the system after some initial packet processing at the network interface. The
inputs to P
1
and P
3
are compressed bitstreams and their outputs are par-
tially decoded macroblocks, which serve as inputs to P
2
and P
4
. The fully
decoded video streams are then fed into two traffic-shaping components S
1
and S
2
, respectively. This is necessary because the outputs of P
2
and P
4
are
potentially bursty and need to be smoothed out in order to make sure that
no packets are lost by the video interface, which cannot handle more than a
certain packet rate per stream.
We assume that the arrival patterns of the two streams, V
HR
and V
LR
,
from the network as well as the execution demands of the various tasks in
the system are known. The performance characteristics that we want to ana-
lyze are the worst-case end-to-end delays for the two video streams from
the input to the output of the set-top box. Moreover, we want to analyze the
memory demand of the system in terms of worst-case packet buffer occupa-
tion for the various tasks.
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 8 2009-10-1
8 Model-Based Design for Embedded Systems
Network
Network
interface
V
HR
V
LR
CPU1 CPU3
CPU2
P1
P2
P3
P4
S
2
σ
2
Set-top box
Video
interface
LCD TV
HR
LR
S1
σ
1
S2
σ
2
VLD
IQ
VLD
IQ
IDCT
MC
IDCT
MC
FIGURE 1.2
A PiP application decoding two MPEG-2 video streams on a multiprocessor architecture.
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 9 2009-10-1
Performance Prediction of Distributed Platforms 9
In Section 1.3, we at first will formally describe the above system in the
concrete time domain. In principle, this formalization could directly be used
in order to perform a simulation; in our case, it will be the basis for the MPA
described in Section 1.4.
1.3 Representation in the Time Domain
As can be seen from the example described in Section 1.2, the basic model
of computation consists of component networks that can be described as a
set of components that are communicating via infinite FIFO (first-in first-out)
buffers denoted as channels. Components receive streams of tokens via their
input channels, operate on the arriving tokens, and produce output tokens
that aresent tothe outputchannels. Wealso assumethat thecomponents need
resources in order to actually perform operations. Figure 1.3 represents the
simple component network corresponding to the video decoding example.
Examples of components are tasks that are executed on computing
resources or data communication via buses or interconnection networks.
Therefore, the token streams that are present at the inputs or outputs of a
component could be of different types; for example, they could represent
simple events that trigger tasks in the corresponding computation compo-
nent or they could represent data packets that need to be communicated.
1.3.1 Arrival and Service Functions
In order to describe this model in greater detail, at first we will describe
streams in the concrete time domain. To this end, we define the concept of
arrival functions: R(s, t) ∈ R
≥0
denotes the amount of tokens that arrive in
the time interval [s,t) for all time instances, s, t ∈ R, s < t,andR(t, t) = 0.
Depending on the interpretation of a token stream, an arrival function may
be integer valued, i.e., R(s,t) ∈ Z
≥0
. In other words, R(s,t) “counts” the
P
1
P
3
(a) (b)
P
4
S
2
P
2
S
1
R
HR
R
LR
C
1
C
2
P
1
P
3
P
2
P
4
C
3
S
1
S
2
σ
1
σ
2
FIGURE 1.3
Component networks corresponding to the video decoding example in Sec-
tion 1.2: (a) without resource interaction, and (b) with resource interaction.
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 10 2009-10-1
10 Model-Based Design for Embedded Systems
number of tokens in a time interval. Note that we are taking a very liberal
definition of a token here: It just denotes the amount of data or events that
arrive in a channel. Therefore, a token may represent bytes, events, or even
demanded processing cycles.
In the component network semantics, tokens are stored in channels that
connect inputs and outputs of components. Let us suppose that we had
determined the arrival function R
(s, t) corresponding to a component out-
put (that writes tokens into a channel) and the arrival function R(s, t) corre-
sponding to a component input (that removes tokens from the channel); then
we can easily determine the buffer fill level, B(t), of this channel at some time
t: B(t) = B(s) +R
(s, t) −R(s, t).
As has been described above, one of the major elements of the model is
that components can only advance in their operation if there are resources
available. As resources are the first-class citizens of the performance anal-
ysis, we define the concept of service functions: C(s, t) ∈ R
≥0
denotes the
amount of available resources in the time interval [s, t) for all time instances,
s, t ∈ R, s < t,andC(t, t) = 0. Depending on the type of the underlying
resource, C(s,t) may denote the accumulated time in which the resource is
fully available for communication or computation, the amount of processing
cycles, or the amount of information that can be communicated in [s,t).
1.3.2 Simple and Greedy Components
Using the above concept of arrival functions, we can describe a set of very
simple components that only perform data conversions and synchronization.
• Tokenizer: A tokenizer receives fractional tokens at the input that may
correspond to a partially transmitted packet or a partially executed
task. A discrete output token is only generated if the whole process-
ing or communication of the predecessor component is finished. With
the input and output arrival functions R(s, t) and R
(s, t), respectively,
we obtain as a transfer function R
(s, t) =R(s, t).
• Scaler: Sometimes, the units of arrival and service curves do not match.
For example, the arrival function, R, describes a number of events and
the service function, C, describes resource units. Therefore, we need to
introduce the concept of scaling: R
(s, t) = w · R(s, t), with the positive
scaling factor, w. For example, w may convert events into processor
cycles (in case of computing) or into number of bytes (in case of com-
munication). A much more detailed view on workloads and their mod-
eling can be found in [8], for example, modeling time-varying resource
usage or upper and lower bounds (worst-case and best-case resource
demands).
• AND and OR: As a last simple example, let us suppose a component
that only produces output tokens if there are tokens on all inputs
(AND). Then the relation between the arrival functions at the inputs
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 11 2009-10-1
Performance Prediction of Distributed Platforms 11
R
1
(s, t) and R
2
(s, t), and output R
(s, t) is R
(s, t) = min{B
1
(s) +R
1
(s, t),
B
2
(s) + R
2
(s, t)}, where B
1
(s) and B
2
(s) denote the buffer levels in the
input channels at time s. If the component produces an output token
for every token at any input (OR), we find R
(s, t) = R
1
(s, t) +R
2
(s, t).
The elementary components described above do not interact with the
available resources at all. On the other hand, it would be highly desirable
to express the fact that a component may need resources in order to oper-
ate on the available input tokens. A greedy processing component (GPC) takes
an input arrival function, R(s, t), and produces an output arrival function,
R
(s, t), by means of a service function, C(s, t). It is defined by the input/out-
put relation
R
(s, t) = inf
s≤λ≤t
{R(s, λ) +C(λ, t) +B(s),C(s, t)}
where B(s) denotes the initial buffer level in the inputchannel. The remaining
service function of the remaining resource is given by
C
(s, t) = C(s, t) − R
(s, t)
The above definition can be related to the intuitive notion of a greedy
component as follows: The output between some time λ and t cannot
be larger than C(λ, t), and, therefore, R
(s, t) ≤ R
(s, λ) +C(λ,t),andalso
R
(s, t) ≤ C(s, t). As the component cannot output more than what was
available at the input, we also have R
(s, λ) ≤ R(s, λ) + B(s), and, therefore,
R
(s, t) ≤ min{R(s, λ) + C(λ, t) + B(s), C(s,t)}. Let us suppose that there is
some last time λ
∗
before t when the buffer was empty. At λ
∗
, we clearly
have R
(s, λ
∗
) = R(s, λ
∗
) + B(s). In the interval from λ
∗
to t, the buffer is
never empty and all available resources are used to produce output tokens:
R
(s, t) =R(s,λ
∗
)+B(s)+C(λ
∗
, t). If the buffer is never empty, we clearly have
R
(s, t) = C(s,t), as all available resources are used to produce output tokens.
As a result, we obtain the mentioned input–output relation of a GPC.
Note that the above resource and timing semantics model almost all prac-
tically relevant processing and communication components (e.g., processors
that operate on tasks and use queues to keep ready tasks, communication
networks, and buses). As a result, we are not restricted to model the process-
ing time with a fixed delay. The service function can be chosen to represent
a resource that is available only in certain time intervals (e.g., time division
multiple access [TDMA] scheduling), or which is the remaining service after
a resource has performed other tasks (e.g., fixed priority scheduling). Note
that a scaler can be used to perform the appropriate conversions between
token and resource units. Figure 1.4 depicts the examples of concrete com-
ponents we considered so far. Note that further models of computation can
be described as well, for example, (greedy) Shapers that limit the amount of
output tokens to a given shaping function, σ, according to R
(s, t) ≤ σ(t −s)
(see Section 1.4 and also [19]).
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 12 2009-10-1
12 Model-Based Design for Embedded Systems
Scaler Tokenizer AND OR GPC Shaper
R
R
2
R
2
R
1
R
1
RRRR RRR R R
ω
+
GPC
σ
C
C
FIGURE 1.4
Examples of component types as described in Section 1.3.2.
1.3.3 Composition
The components shown in Figure 1.4 can now be combined to form a
component network that not only describes the flow of tokens but also the
interaction with the available resources. Figure 1.3b shows the component
network that corresponds to the video decoding example. Here, the compo-
nents, as introduced in Section 1.3.2, are used. Note that necessary scaler and
tokenizer components are not shown for simplicity, but they are needed to
relate the different units of tokens and resources, and to form tokens out of
partially computed data.
For example, the input events described by the arrival function, R
LR
,trig-
ger the tasks in the process P
3
, which runs on CPU
2
whose availability is
described by the service function, C
2
. The output drives the task in the pro-
cess P
4
, which runs on CPU
3
with a second priority. This is modeled by feed-
ing the GPC component with the remaining resources from the process P
2
.
We can conclude that the flow of event streams is modeled by connecting
the “arrival” ports of the components and the scheduling policy is modeled
by connecting their “service” ports. Other scheduling policies like the non-
preemptive fixed priority, earliest deadline first, TDMA, general processor
share, various servers, as well as any hierarchical composition of these poli-
cies can be modeled as well (see Section 1.4).
1.4 Modular Performance Analysis with Real-Time
Calculus
In the previous section, we have presented the characterization of event and
resource streams, and their transformation by elementary concrete processes.
We denote these characterizations as concrete, as they represent components,
event streams, and resource availabilities in the time domain and work on
concrete stream instances only. However, event and resource streams can
exhibit a large variability in their timing behavior because of nondetermin-
ism and interference. The designer of a real-time system has to provide per-
formance guarantees that cover all possible behaviors of a distributed system
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 13 2009-10-1
Performance Prediction of Distributed Platforms 13
and its environment. In this section, we introduce the abstraction of the MPA
with the RTC [1] (MPA-RTC) that provides the means to capture all possible
interactions of event and resource streams in a system, and permits to derive
safe bounds on best-case and worst-case behaviors.
This approach was first presented in [13] and has its roots in network cal-
culus [7]. It permits to analyze the flow of event streams through a network
of heterogeneous computation and communication resources in an embed-
ded platform, and to derive hard bounds on its performance.
1.4.1 Variability Characterization
In the MPA, the timing characterization of event streams and of the resource
availability is based on the abstractions of arrival curves and service curves,
respectively. Both the models belong to the general class of variability char-
acterization curves (VCCs), which allow to precisely quantify the best-case
and worst-case variabilities of wide-sense-increasing functions [8]. For sim-
plicity, in the rest of the chapter we will use the term VCC if we want to refer
to either arrival or service curves.
In the MPA framework, an event stream is described by a tuple of arrival
curves, α(Δ) =[α
l
(Δ), α
u
(Δ)], where α
l
: R
≥0
→ R
≥0
denotes the lower
arrival curve and α
u
: R
≥0
→ R
≥0
the upper arrival curve of the event
stream. We say that a tuple of arrival curves, α(Δ), conforms to an event
stream described by the arrival function, R(s, t), denoted as α |= R iff for all
t > s we have α
l
(t − s) ≤ R(s, t) ≤ α
u
(t − s). In other words, there will be
at least α
l
(Δ) events and at most α
u
(Δ) events in any time interval [s, t) with
t − s = Δ.
In contrast to arrival functions, which describe one concrete trace of an
eventstream,atuple ofarrivalcurves representsallpossible tracesof astream.
Figure 1.5a shows an example tuple of arrival curves. Note that any event
stream can be modeled by an appropriate pair of arrival curves, which means
that this abstraction substantially expands the modeling power of standard
event arrival patterns such as sporadic, periodic, or periodic with jitter.
Similarly, the availability of a resource is described by a tuple of service
curves, β(Δ) =[β
l
(Δ), β
u
(Δ)], where β
l
: R
≥0
→ R
≥0
denotes the lower
service curve and β
u
: R
≥0
→ R
≥0
the upper service curve. Again, we say
that a tuple of service curves, β(Δ), conforms to an event stream described
by the service function, C(s, t), denoted as β |= C iff for all t > s we have
β
l
(t −s) ≤ C(s, t) ≤ β
u
(t −s). Figure 1.5b shows an example tuple of service
curves.
Note that, as defined above, the arrival curves are expressed in terms
of events while the service curves are expressed in terms of workload/
service units. However, the component model described in Section 1.4.2
requires the arrival and service curves to be expressed in the same unit.
The transformation of event-based curves into resource-based curves and
vice versa is done by means of so-called workload curves which are VCCs
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 14 2009-10-1
14 Model-Based Design for Embedded Systems
8
6
4
2
0
(a) (b)
5101520
# Events
α
u
α
l
Δ
# Cycles
3e4
2e4
1e4
0102030
β
u
β
l
Δ
FIGURE 1.5
Examples of arrival and service curves.
GPC
αα
β
(a)
GPC
R
C
C
β
R
(b)
FIGURE 1.6
(a) Abstract and (b) concrete GPCs.
themselves. Basically, these curves define the minimum and maximum
workloads imposed on a resource by a given number of consecutive events,
i.e., they capture the variability in execution demands. More details about
workload transformations can be found in [8]. In the simplest case of a con-
stant workload w for all events, an event-based curve is transformed into a
resource-based curve by simply scaling it by the factor w. This can be done
by an appropriate scaler component, as described in Section 1.3.
1.4.2 Component Model
Distributed embedded systems typically consist of computation and com-
munication elements that process incoming event streams and are mapped
on several different hardware resources. We denote such event-processing
units as components. For instance, in the system depicted in Figure 1.2, we
can identify six components: the four tasks, P
1
, P
2
, P
3
and P
4
, as well as the
two shaper components, S
1
and S
2
.
In the MPA framework, an abstract component is a model of the process-
ing semantics of a concrete component, for instance, an application task or
a concrete dedicated HW/SW unit. An abstract component models the exe-
cution of events by a computation or communication resource and can be
Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 15 2009-10-1
Performance Prediction of Distributed Platforms 15
seen as a transformer of abstract event and resource streams. As an example,
Figure 1.6 shows an abstract and a concrete GPC.
Abstract components transform input VCCs into output VCCs, that is,
they are characterized by a transfer function that relates input VCCs to out-
put VCCs. We say that an abstract component conforms to a concrete com-
ponent if the following holds: Given any set of input VCCs, let us choose an
arbitrary trace of concrete component inputs (event and resource streams)
that conforms to the input VCCs. Then, the resulting output streams must
conform to the output VCCs as computed using the abstract transfer func-
tion. In other words, for any input that conforms to the corresponding input
VCCs, the output must also conform to the corresponding output VCCs.
In the case of the GPC depicted in Figure 1.6, the transfer function Φ ofthe
abstract component is specified by a set of functions that relate the incoming
arrival and service curves to the outgoing arrival and service curves. In this
case, we have Φ =[f
α
, f
β
] with α
= f
α
(α, β) and β
= f
β
(α, β).
1.4.3 Component Examples
In the following, we describe the abstract components of the MPA frame-
work that correspond to the concrete components introduced in Section 1.3:
scaler, tokenizer, OR, AND, GPC, and shaper.
Using the above relation between concrete and abstract components, we
can easily determine the transfer functions of the simple components, tok-
enizer, scaler, and OR, which are depicted in Figure 1.4.
• Tokenizer: The tokenizer outputs only integer tokens and is character-
ized by R
(s, t) =R(s, t). Using the definition of arrival curves, we
simply obtain as the abstract transfer function α
u
(Δ) =α
u
(Δ) and
α
l
(Δ) =α
l
(Δ).
• Scaler:AsR
(s, t) = w · R(s, t),wegetα
u
(Δ) = w · α
u
(Δ) and α
l
(Δ) =
w · α
l
(Δ).
• OR: The OR component produces an output for every token at any
input: R
(s, t) = R
1
(s, t) +R
2
(s, t). Therefore, we find α
u
(Δ) = α
u
1
(Δ) +
α
u
2
(Δ) and α
l
(Δ) = α
l
1
(Δ) + α
l
2
(Δ).
The derivation of the AND component is more complex and its corre-
sponding transfer functions can be found in [4,17].
As described in Section 1.3, a GPC models a task that is triggered by
the events of the incoming event stream, which queue up in a FIFO buffer.
The task processes the events in a greedy fashion while being restricted
by the availability of resources. Such a behavior can be modeled with the
following internal relations that are proven in [17]:
∗
∗
The deconvolutions in min-plus and max-plus algebra are defined as (f g)(Δ) =
sup
λ≥0
{f(Δ + λ) − g(λ)} and (f g)(Δ) = inf
λ≥0
{f(Δ + λ) − g(λ)}, respectively. The convo-
lution in min-plus algebra is defined as (f ⊗g)(Δ) = inf
0≤λ≤Δ
{f(Δ −λ) +g(λ)}.