191
3
Experimental Design and Analysis
Chapter Overview
How do fuel composition, excess oxygen, and furnace tempera-
ture fluctuations affect some important response such as process
conversion rate or emissions? What factors affect NOx and CO?
Are burners in similar process units behaving differently or alike?
What is the statistical uncertainty that the proposed model is
correct? To score models for these kinds of investigations, we need
some understanding of statistics.
This chapter begins with some elementary statistics and distri-
butions, and progresses to its seminal tool for separating informa-
tion from noise — the analysis of variance. Next, the chapter covers
factorial designs — the foundation of statistically cognizant exper-
iments. We find that simple rules produce fractional factorials and
reduce the number of required experiments. We show that by
modifying the alias structure, we can clear factors of certain biases.
We then discuss the importance of replication in obtaining an
independent estimate of statistical error, and we show how block-
ing can reduce it further. Further discussion shows how orthogo-
nality eliminates mutual factor biases. The chapter moves on to
consider how to mute certain adulterating effects, including hys-
teresis and lurking factors, and how to validate analytical integrity
with residuals plots. For looking at many factors with few exper-
iments, we introduce screening designs such as simplex and highly
fractionated designs. The reader then learns how random and fixed
effects differ, and how they affect the analysis. To show how one
may assess curvature in factor space, a discussion of second-order
designs follows. The chapter concludes by considering the sequen-
tial assembly of the various experimental designs.
© 2006 by Taylor & Francis Group, LLC
192 Modeling of Combustion Systems: A Practical Approach
3.1 Some Statistics
A statistic is a descriptive measure that summarizes an important property
of a collection of data. For example, consider the group of numbers in braces:
{1, 10, 100}. Though there are only three data values, we could define an
unlimited number of statistics related to them. Here are a few:
• The maximum, 100, is a statistic because it summarizes a property of
the data, namely, that all data are equal to or below a certain value, 100.
• The minimum, 1, is also a statistic, defining the magnitude that all
data meet or exceed.
• The half range, 49.5, that is, (100 – 1)/2, is a statistic. It is a measure
of the dispersion of the data.
• The count, 3, tells us the number of data points. If the data were
repeated measures of the same quantity differing only by measure-
ment error, the count would relate to a measure of certainty. Intu-
itively, we would expect that the more replicates we measure, the
more certain we become of the true value.
• The median, 10, is the middle value of an ordered data set. It measures
central tendency. Presuming the data comprise replicate observations,
one intuitively expects the true value to be closer to the middle than
the extremes of the observations.
There are ever more statistics, but let us pause here to answer some inter-
esting questions:
• Can we really describe three observations with five or more statis-
tics? Yes.
• How can we have five statistics for only three observations? Not all
of the statistics are independent. In fact, no more than three can be
independent if we are deriving our statistics from these particular
data — the number of independent statistics cannot exceed the num-
ber of data points. The reason for so many statistics is that we have
so many questions that we want to ask of our data; for example:
– What limit represents a safe upper bound for a NOx prediction?
– How likely are we to exceed this upper limit?
– What confidence do we have that our predicted value represents
the true value?
– How precisely does our model fit the data?
– Is a particular point within or beyond the pale of the data?
– How large a margin should we state if we want 99.9% of all
future data to be below it?
© 2006 by Taylor & Francis Group, LLC
Experimental Design and Analysis 193
For every important question, it seems someone or several have invented
one or more statistics. In this chapter, we shall describe important statistics
that relate to modeling in general and combustion modeling in particular.
3.1.1 Statistics and Distributions
Suppose we wish to measure a response (y) contaminated by a randomly
distributed error term (e). We would like to separate the information (μ)
from the noise (e). One option would be to repeatedly measure the response
at the same condition and average the results. In summation notation we
have , or , where n is the number of replicate
measurements. We may divide by the total number of measurements to give
Here we will designate it with an overbar. That is,
(3.1)
Now if e (the error vector) were truly random, we would expect the long-
run average to be zero. This occurs when n → ∞. We refer to long-run results
as expected values and designate them with the expectation operator, E( ). We
will refer to the true value for y as μ. Therefore, E(y) = μ and is an unbiased
estimator for μ. Intuitively, we would expect all the y values to distribute
about the true value, differing only by e. Since our best estimate of μ from
the data is , then is a measure of central tendency — the inclination of the
average of repeated measures to converge to the true value.
The mean is an important statistic — one might dare say the most impor-
tant statistic — but it is insufficient to characterize certain aspects of some
populations. For example, suppose the average height of 100 adult male
humans is 1.80 m (5.9 ft). How many will be 0.01 m (<1 in.) tall? How many
will be 3.59 m (11.8 ft) tall? We know from experience that there are no adult
male humans at either of these extremes. Yet the average of these two num-
bers is 1.80 m. Therefore, as important as central tendency statistics are, we
are also interested in other measures. That is, we would also like some
measure of dispersion. Dispersion indicates how values differ from the mean.
One statistic that quantifies dispersion is the variance.
Let us define the variance (V) of a sample (y) as follows:
(3.2)
y =+
∑∑∑
μ e yn n
∑
=+μ e
y
n
n
n
n
n
∑
=+
μ e
y
y
n
=
∑
y
y y
Vyy()y =−
()
∑
2
© 2006 by Taylor & Francis Group, LLC
But in Chapter 1 we defined this as the arithmetic mean (Equation 1.55).
194 Modeling of Combustion Systems: A Practical Approach
Then the following equation gives the mean variance:
(3.3)
The long run average of the mean variance is
(3.4)
However, if we are using the sample mean derived from a finite data set
to estimate the variance, tends to overestimate σ
2
unless n is large. The
reason for the overestimation is that we have already used the data to
determine . Therefore, plus n – 1 data points exactly determine the n
th
data value; i.e., is not a completely independent measure of dispersion.
So the proper denominator to estimate σ
2
in Equation 3.3 is n – 1, not n. In
other words, we use up (lose) one degree of freedom when we use a finite data
set to estimate . Thus, n – 1 are the degrees of freedom for the estimated
variance. We shall use the symbol s
2
to denote this quantity:
(3.5a)
This is also called the sample-adjusted variance. Obviously, Equation 3.5a and
Equation 3.3 become identical as n → ∞.
One problem with Equation 3.5a is that the units differ from y because the
variance uses the squared value of the response. For this reason, we define
the sample standard deviation as
(3.6)
It has the same units as the response and it is an unbiased estimator for the
true standard deviation, σ. Now s will tell us something about the dispersion
of possible values about the true mean. To find out what, we need to know
something about how values distribute themselves in the long run.
3.1.2 The Normal, Chi-Squared (
χχ
χχ
2
), F, and t Distributions
To develop the idea of distributions further, let us consider Figure 3.1.
1
V
yy
n
()y =
−
()
∑
2
V
()
σ
2
2
=
→∞
=
→∞
−
()
∑
lim
()
lim
n
V
n
yy
n
y
V()y
y y
V()y
y
s
yy
n
2
2
1
=
−
()
−
∑
s
yy
n
=
−
()
−
∑
2
1
© 2006 by Taylor & Francis Group, LLC
Experimental Design and Analysis 195
Galton’s board looks a bit like a pinball machine. It comprises a vertical
slate with pegs arranged in a triangle pattern that widens from top to bottom.
A ball dropped onto the topmost peg may fall either to the left or to the
right, whereupon it strikes the next peg and again may fall to either the left
or the right. The ball continues in this fashion until it ultimately drops into
one of the bins below. What is the probability that a dropped ball will fill
any particular bin? To answer the question, we begin by calculating the
distribution of the possibilities.
3.1.2.1 The Normal Distribution
Most of us are familiar with the normal distribution — the so-called bell-
shaped curve — perhaps it is more nearly cymbal shaped. At any rate,
Equation 3.7 gives the mathematical representation:
(3.7)
Here, N(y) is the frequency of y and e is Euler’s constant, e = 2.71828.… The
probability of finding y between two limits, – ∞ < a and b < + ∞, is given by
(3.8)
FIGURE 3.1
The Galton board. The Galton board comprises a vertical arrangement of pegs such that a ball
may take one of two possible paths at each peg, finally arriving at a bin below. The numbers
between the pegs show the number of paths leading through each space. The numbers follow
Pascal’s triangle (superimposed numbers). The total number of paths for this Galton board
sums to 256. Thus, for the ball shown arriving at the bin, the probability is 56/256 = 21.9%.
One of 56 possible paths leading to that bin is shown (dotted line). The distribution approaches
the normal probability distribution as the number of rows in the Galton board increases.
Bin
Peg
Ball
82856705628 8 1
1
1
1
7
7
6
6
6
55
44
33
2
21 2135 35
15 1520
1
10 10
1
1
1
1
1
1
1
1
1
1
1
Ny e
y
()=
−
−
⎛
⎝
⎜
⎞
⎠
⎟
1
2
2
1
2
2
πσ
μ
σ
Py e d
y
y
a
b
() ;=
−
⎛
⎝
⎜
⎞
⎠
⎟
−
−
⎛
⎝
⎜
⎞
⎠
⎟
⌠
⌡
⎮
⎮
⎮
1
2
1
2
2
π
μ
σ
μ
σ
001<<Py()
© 2006 by Taylor & Francis Group, LLC
196 Modeling of Combustion Systems: A Practical Approach
The statistics μ and σ
2
completely characterize the normal distribution.
One may standardize the normal distribution using the coding .
(3.9)
Figure 3.2 depicts the above equation.
The normal distribution has the following properties:
• It is symmetrical about its center at z = 0.
• It has an inflection point where the curve changes from concave
down to concave up (at z = ±1).
• The area under the curve sums to unity.
3.1.2.2 Probability Distribution for Galton’s Board
Galton’s board is a very good simulator of random error even though New-
tonian physics dictate the ball’s motion. Yet, we have no way of predicting
what bin the ball will fall into on any given trial because very small variations
affect the path of the ball. Such variations include:
• The elasticity and roundness of the ball’s surface
• The elasticity, angle, and roundness of each peg
• The mutual interactions among balls
At each peg, the distribution is a binary one: the ball will fall either to the
left or to the right. In no way can we consider this a normal distribution. It
is an equiprobable binary distribution. Notwithstanding, statistical consid-
erations allow us to do the following:
• Calculate the ultimate distribution of the balls into the slots
• Calculate the probability of any given ball falling into a particular slot
• Show that the ultimate distribution is a normal probability distribution
FIGURE 3.2
The normal distribution. The so-called bell-shaped curve has a maximum at z = 0, y = 1 and
points of inflection at z = ±1, y = . The area under the curve sums to unity.
zy=−()μσ
Nz e
z
()=
−
1
2
2
2
π
12/ πe
© 2006 by Taylor & Francis Group, LLC
Experimental Design and Analysis 197
To derive the probability distribution for Galton’s board, we proceed as
follows. First, we count the total number of paths through each space. At
the first peg, we have one path to the left and one path to the right. So the
possible paths from left to right are distributed as {1, 1}. At the second row
of pegs, we may take one path to the left and fall outside the far left peg.
But if the ball jumps left and then right, it will fall between the two pegs on
the second row. Likewise, if the ball falls to the right of the first peg and to
the left of the second peg, it will also fall between the two pegs of the second
row; therefore, there are two paths leading between the two pegs of the
second row. Finally, if the ball takes a right jump at the first peg and then a
right jump at the second peg, it will fall to the right of the right peg. There-
fore, the number of paths from left to right at this level is {1, 2, 1}. Now the
total number of paths between any two pegs will be the sum of the paths
communicating with it. For Galton’s board there are two such paths over-
head and to the left and right. Thus, the distribution of paths for the next
row of pegs is {1, 3, 3, 1}. We may continue in this fashion all the way down
the board.
3.1.2.3 Pascal’s Triangle
We know the pattern {1}, {1 1}, {1 2 1}, {1 3 3 1} … as Pascal’s triangle (Figure
3.3). Pascal’s triangle is a numerical triangle having outside edges of 1. The
FIGURE 3.3
Pascal’s triangle. Each term is calculated by adding the two terms above it. In some versions,
the second row of ones (1 1) is omitted, but we include it here for consistency. Horizontal rows
(f) are numbered starting with zero at top and incrementing by 1. Entries (k) in each row are
numbered starting from 0 at left and incrementing by 1. Thus, the coordinates (f, k) = (4,2)
correspond to the value 6. f also indicates the number of factors in a factorial design discussed
presently. The sum of any horizontal row equals the number of terms in the saturated factorial
(2
f
). k indicates the overall order of the term. One may calculate the value of entry k in row f
directly using the formula f !/[k! (f – k)!], e.g., 4!/[2!(4 – 2)!] = 6.
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
⠄
1
2
4
8
16
32
64
128
256
1 8 28 56 70 56 28 8 1
11
1
1
1
1
1
11
1
12
133
1446
1551010
16615 1520
7721353521
© 2006 by Taylor & Francis Group, LLC
198 Modeling of Combustion Systems: A Practical Approach
sum of the two numbers immediately above forms each lower entry. The
sum of the numbers in a horizontal row is always
n = 2
f
(3.10)
where f is the number of the row starting from top down; all counting for
rows and entries begins with zero, i.e., 0, 1, 2, … . Equation 3.11 gives the
k
th
entry in row f directly:
(3.11)
where m is the number contained in the k
th
entry of the f
th
row.
For reference, we have superimposed Pascal’s triangle onto Galton’s board
in Figure 3.1. Each number represents the number of possible paths travers-
ing through the interstice. As shown, the board has eight rows of pegs. At
the bottom, we have nine slots and the distribution of paths is {1, 8, 28, 56,
70, 56, 28, 8, 1}. The total number of paths is 1 + 8 + 28 + 56 + 70 + 56 + 28
+ 8 + 1 = 256 = 2
8
. So, the probabilities for a ball falling into any given slot
from left to right are 1/256, 8/256, 28/256, 56/256, 70/256, 28/256, 8/256,
and 1/256, whose fractions sum to 1. This is a binomial frequency distribution.
We may find this directly by the ratio of Equations 3.10 and 3.11:
(3.12)
where B(f, k) is the probability of the ball finding its way to the k
th
interstice
(counting from zero) under the f
th
peg. For reference, we have superimposed
a bar graph in Figure 3.1 for each bin. The bar is proportional to the proba-
bility of a ball finding its way to that particular bin. The distribution
approaches the normal probability distribution as f → ∞. But even after
several rows, the resemblance to Equation 3.7 is unmistakable. In fact,
(3.13)
where and and y = k. This is part of an even broader
concept known as the central limit theorem: as the number of independent
and identically distributed random variables increases, the aggregate distri-
bution approaches the normal probability distribution.
With the following substitution,
(3.14)
m
f
kfk
=
−
()
!
!!
Bfk
f
kf k
f
(,)
!
!!
=
−
()
1
2
lim
!
!!
,
f
f
kf k
Ne
f
→∞
−
()
⎡
⎣
⎢
⎢
⎤
⎦
⎥
⎥
=
()
=
−
1
2
1
2
1
2
μσ
πσ
yy−
⎛
⎝
⎜
⎞
⎠
⎟
μ
σ
2
μ= f 2 σ= f 4
z
y
=
−
⎛
⎝
⎜
⎞
⎠
⎟
μ
σ
© 2006 by Taylor & Francis Group, LLC
Experimental Design and Analysis 199
Letting and σ = 1, Equation 3.13 reduces to Equation 3.9:
(3.9)
We call Equation 3.9 the probability density function. Then the cumulative
probability function for –a < z < a is
(3.15)
We call the variable, z, the standard unit variate, and when the limits of the
integration are from – ∞ to + ∞, the integral attains unity.
Equation 3.15 implies a two-tail test because we are asking for the proba-
bility for z being between –a and a. If we were only interested in P[N(z)]
being greater than –a or less than a, we would only be interested in one tail
of the distribution. Since the distribution is symmetrical, the probability of
the one-tailed test is exactly half that of the two-tailed test.
Most computer spreadsheets have functions to calculate this. Excel™ has
several related functions. The function normdist(x,m,s,TRUE) evaluates
Equation 3.8, where x is a particular value, m the mean, and s the standard
deviation. The function normdist(z,m,s,FALSE) evaluates
1 – normdist(z,m,s,TRUE). The function normsdist(z)— note the s
in the middle of this function name — evaluates Equation 3.15. For example,
normsdist(1.96) =
Many statistical tests are strictly valid only for normally distributed errors.
However, according to the central limit theorem, even if the parent distribu-
tion is not normally distributed, the accumulation of several levels of ran-
domly distributed deviations will tend toward a normal distribution. This
was the case with Galton’s board. The parent distribution was binary (highly
nonnormal), yet the final distribution approached the normal distribution
quite closely. So we expect estimates of the mean to distribute around the
arithmetic mean, approaching the normal distribution. In fact, according to
Equation 3.13, μ and σ completely determine the shape of the normal prob-
ability curve. Thus, from these two statistics alone, we can derive any other
property or statistic for normally distributed data. If data do not distribute
normally, often one can transform them to such. For example, emissions data
such as CO and NOx are always nonnegative, and therefore they do not
μ= f 2
Nz e
z
()=
−
1
2
2
2
π
PNz e dz
z
a
a
()
⎡
⎣
⎤
⎦
=
−
−
⌠
⌡
⎮
1
2
2
2
π
1
2
0 975
2
2
196
196
π
edz
z
−
−
⌠
⌡
⎮
≈
.
.
.
© 2006 by Taylor & Francis Group, LLC
200 Modeling of Combustion Systems: A Practical Approach
distribute normally. However, the logarithms of these quantities are normally
distributed (e.g., ln(NOx)). As we shall see, this will permit us to estimate:
• How likely we are to exceed some upper limit
• What limit represents a safe upper bound for a prediction
• What confidence we have that our predicted value represents the
true value
3.1.2.4 The Chi-Squared Distribution
Another important distribution is the distribution of variance. The variation
will never be negative because variance is a squared quantity. Thus, variance
cannot distribute normally. In fact, it distributes as a chi-squared distribution.
Knowing something about this distribution will allow us to develop a list
of additional statistics related to the model, such as:
• The goodness of fit
• The confidence that a particular factor belongs in the model
• The probability that we can accurately predict future values
The chi-squared distribution has the following form:
(3.16)
where n is the degrees of freedom (an integer); z is the standard variate,
defined in Equation 3.14; and is the gamma function, defined as
Excel has several related functions. The function gammaln(z) will return
the natural log of the gamma function for positive arguments of z. To obtain
the gamma function itself, one uses exp(gammaln(z)). Equation 3.17 gives
the cumulative probability function for the chi-squared distribution:
(3.17)
χ
2
2
22
2
2
2
(,)nz
ze
n
nz
n
=
⎛
⎝
⎜
⎞
⎠
⎟
−
−
Γ
Γ()n
Γ ntedt
nt
()
=
−−
∞
⌠
⌡
⎮
⎮
1
0
Pzn
ze
n
nz
n
a
χ
2
2
22
2
0
2
2
(, )
⎡
⎣
⎤
⎦
=
⎛
⎝
⎜
⎞
⎠
⎟
−
−
⌠
⌡
⎮
⎮
⎮
⎮
⎮
Γ
⎮⎮
dz
© 2006 by Taylor & Francis Group, LLC
Experimental Design and Analysis 201
One may use the Excel function CHIDIST(z,n) to calculate this.
The gamma function has some interesting properties. One way to think of
it is as a generalization of the discrete factorial function. That is,
(3.18)
where n is an integer. For example, Γ(4) = (4 – 1)! = 3! = (3)(2)(1) = 6. Other
properties include
(3.19)
(3.20)
(3.21)
For example,
(3.22)
For example,
3.1.2.5 The F Distribution
Often, in deciding the significance of certain terms in our mathematical
models, we will examine the ratio of two variances. The ratio of variances
cannot distribute normally because it comprises a ratio of two χ
2
distribu-
tions and the result must always be nonnegative. Equation 3.23 gives the F
distribution:
Γ nn
()
=−
()
1!
ΓΓnnn+
()
=
()
1
Γ
1
2
⎛
⎝
⎜
⎞
⎠
⎟
=π
Γ
21
2
21
2
135 2 1
2
1
nk
n
n
k
n
n
+
⎛
⎝
⎜
⎞
⎠
⎟
=
+
=
⋅⋅ −
()
⎡
⎣
=
∏
ππ
$
⎢⎢
⎢
⎤
⎦
⎥
⎥
Γ
7
2
135
2
15
8
3
⎛
⎝
⎜
⎞
⎠
⎟
=
⋅⋅
=ππ
Γ
12
2
1
2
21
1
2
13
1
−
⎛
⎝
⎜
⎞
⎠
⎟
=−
()
−
=−
()
⋅⋅
=
∏
n
k
n
n
k
n
n
n
ππ
5521$ n−
()
⎡
⎣
⎢
⎢
⎤
⎦
⎥
⎥
Γ−
⎛
⎝
⎜
⎞
⎠
⎟
=−
()
⋅⋅⋅
=
7
2
1
2
1357
16
105
4
4
ππ
© 2006 by Taylor & Francis Group, LLC
202 Modeling of Combustion Systems: A Practical Approach
(3.23)
The related cumulative probability function is
(3.24)
and is given by the Excel function FDIST(m,n,z).
3.1.2.6 The t Distribution
The distribution is a special distribution called the t distribution.*
It accounts for deviation from the normal distribution owing to less than an
infinite number of independent trials.
(3.25)
The t distribution approaches the normal distribution as n→∞. In general,
the t distribution has a flatter peak and broader tails than the normal distri-
bution. The t distribution adjusts the normal probability function for the
uncertainty of having less than an infinite number of samples. For n > 20,
the t and normal distributions are practically identical. The associated cumu-
lative probability function is
* W.S. Gosset writing under the pen name “Student” while working for the Guiness Brewing
Company derived the t distribution that bears his pen name — Student’s t distribution. Details
are available in any statistics text. See, for example, Mendenhall, W., Scheaffer, R.L., and Wack-
erly, D.D., Mathematical Statistics with Applications, 3rd ed., PWS Publishers, Boston, 1986, p. 273.
Fmnz
nm m
n
z
m
m
m
,,
()
=
+
⎛
⎝
⎜
⎞
⎠
⎟
⎛
⎝
⎜
⎞
⎠
⎟
−
⎛
⎝
⎜
⎞
⎠
⎟
Γ
Γ
2
2
2
1
222
1
2
⎛
⎝
⎜
⎞
⎠
⎟
⎛
⎝
⎜
⎞
⎠
⎟
+
⎛
⎝
⎜
⎞
⎠
⎟
+
Γ
nmz
n
mn
PFmnz
nm m
n
z
m
m
,,
()
⎡
⎣
⎤
⎦
=
+
⎛
⎝
⎜
⎞
⎠
⎟
⎛
⎝
⎜
⎞
⎠
⎟
−
⎛
⎝
⎜
Γ
2
2
2
1
⎞⎞
⎠
⎟
+
⎛
⎝
⎜
⎞
⎠
⎟
⎛
⎝
⎜
⎞
⎠
⎟
+
⎛
⎝
⎜
⎞
⎠
⎟
⌠
⌡
⎮
ΓΓ
mn mz
n
mn
a
22
1
2
0
⎮⎮
⎮
⎮
⎮
⎮
⎮
⎮
⎮
dz
Fn(, )1
tnz
n
n
nz
n
,
()
=
+
⎛
⎝
⎜
⎞
⎠
⎟
⎛
⎝
⎜
⎞
⎠
⎟
+
⎛
⎝
⎜
⎞
⎠
⎟
+
Γ
Γ
1
2
2
1
2
2
1
π
22
© 2006 by Taylor & Francis Group, LLC
Experimental Design and Analysis 203
(3.26)
The Excel function TDIST(z,n,1) gives the single-tailed function.
TDIST(z,n,2) gives the two-tailed test of Equation 3.26.
3.2 The Analysis of Variance (ANOVA)
The F distribution allows us to estimate probabilities for ratios of variances.
We use it in an important technique known as the analysis of variance
(ANOVA). ANOVA is one of the most important concepts in statistical exper-
imental design (SED). It is based on an amazing identity:
(3.27)
where SST stands for sum of squares, total; SSM is the sum of squares, model;
and, SSR is the sum of squares, residual term. (The reader should note that
although the statistical literature uses these abbreviations often, some texts
use slightly different acronyms.)* The residual error includes whatever vari-
ation the model does not explain. SSM and SSR are the estimators for the
model and the residual variance, respectively.
The above identity seems to break the rules of algebra, by ignoring the
cross product. But in fact, the cross product vanishes for least squares solu-
tions. This is easy to show:
* For example, Montgomery uses SST (mean sum of squares, treatments) in lieu of our SSM and
he uses SSTO (sum of squares, total) in lieu of our SSM: Montgomery, D.C., Design and Analysis
of Experiments, 5th
ed., John Wiley & Sons, New York, 2001, pp. 531–535. However, other texts are
consistent with our nomenclature here. See Box and Draper
2
for example.
Patnz
n
n
nz
,,
()
⎡
⎣
⎤
⎦
=
+
⎛
⎝
⎜
⎞
⎠
⎟
⎛
⎝
⎜
⎞
⎠
⎟
+
⎛
⎝
Γ
Γ
1
2
2
1
2
2
π
⎜⎜
⎞
⎠
⎟
+
−
⌠
⌡
⎮
⎮
⎮
⎮
⎮
⎮
⎮
⎮
n
a
a
dz
1
2
yy yy yy−
()
≡−
()
+−
()
≡+
∑∑∑
222
ˆˆ
SST SSM SSR
yy−
()
=
∑
2
yy yy−
()
+−
()
⎡
⎣
⎤
⎦
∑
ˆˆ
2
=−
()
+−
()
−
()
+−
()
∑∑ ∑
yy yyyy yy
ˆˆˆ
22
2
© 2006 by Taylor & Francis Group, LLC
204 Modeling of Combustion Systems: A Practical Approach
Now, noting that
we have
But both terms are identically 0 for least squares (see Equations 1.86 and
1.87). Therefore, the cross product is also identically zero.
We may construct a table making use of the identity of Equation 3.27. It
will have the slots shown in Table 3.1.
The column headers are M = model, R = residual, and T = total. The row
headers are SS (sum of squares), DF (degrees of freedom), F (F ratio). The
entries, defined by appending the row header to the column header, are SSM,
SSR, SST, DFM (degrees of freedom, model), DFR (degrees of freedom,
residual), and DFT (degrees of freedom, total). Those are the slots, now for
the filler.
Consider the following model:
(3.28)
We desire to know if a
1
is significant. In other words, does x really influence
the response or not? If not, then the experiments amount to repeated mea-
sures of y, which differ only by experimental error ; this would be
equivalent to and a
1
= 0 for Equation 3.28. We shall call the null
hypothesis. For the null hypothesis, a
0
is the only model parameter. Then the
total degrees of freedom becomes n – 1, where n is the total number of
observations. Therefore, Equation 3.28 reduces to Equation 3.5b and is equiv-
alent to the total sum of squares divided by the total degrees of freedom:
(3.5b)
TABLE 3.1
The Naked ANOVA
Term SS DF MS F
1–P
(DFM, DFR) P(F,DFM,DFR)
M SSM DFM MSM MSM/MSR P
R SSR DFR MSR
T SST DFT
yyyy−
()
−
()
=−
()
∑
ˆˆ
eXay
T
yyyy−
()
−
()
=−
∑
ˆˆ
eXa ey
TT
ˆ
ya ax=+
01
ˆ
yy=
ay
0
=
ˆ
yy=
s
yy
n
2
2
1
==
−
()
−
∑
SST
DFT
© 2006 by Taylor & Francis Group, LLC
Experimental Design and Analysis 205
The alternative hypothesis is that |a
1
|
> 0. If |a
1
|
> 0, then Equation 3.5b
is not the correct measure of random variance because the total variance
includes a nonrandom contribution from the model. If the alternative
hypothesis is true, the appropriate estimation of experimental error will be
the variance left over after subtracting the model from the data. In other
words,
(3.5c)
Here, p is the total number of model parameters. For the current case, p = 2,
i.e., a
0
and a
1
. Note that SSR = SST – SSM and DFR = DFT – DFM. SSR is the
sum of squared residual error. The mean squared residual (MSR) gives an
estimate of the error variance. The residual is what remains after subtracting
the portion of the total variance that belongs to the model, SSM. Equation
3.29 gives the mean squared model variance:
(3.29)
This accounting is easy to remember. The model variance, SSM, is the
variance over and above the mean; the residual variance (SSR) is the total
variance (SST) minus the model variance (SSR = SST – SSM). If we add these
two contributions (SSM + SSR), we obtain the total variance (SST = SSM +
SSR) — the variance of the actual data over and above the mean. Perhaps
vector notation is more straightforward and easier to remember:
(model – mean) (3.30)
(actual – model) (3.31)
(actual – mean) (3.32)
These relations are easy to prove. Note that .
Expanding, this yields , but for least
squares, (see Equation 1.96), so . One may
also substitute giving if desired. Expanding SST gives:
=
= =
s
yy
np
2
2
===
−
()
−
∑
MSR
SSR
DFR
ˆ
MSM
SSM
DFM
==
−
()
−
∑
ˆ
yy
p
2
1
SSM =−
ˆˆ
yy yy
TT
SSR =−yy yy
TT
ˆˆ
SST =−yy yy
TT
(
ˆ
)(
ˆ
)(
ˆ
)yy yy−−=−
∑
T
yy
k
k
n
2
(
ˆ
)(
ˆ
)
ˆˆ ˆˆ
y y y y yy yy yy yy− −=−−+
T TTTT
yy yy
TT
ˆˆˆ
=
(
ˆ
)(
ˆ
)
ˆ
yy yy yyyy−−=−
TTT
ˆ
yXa=
ˆˆ
yy aXXa
TTT
=
SST =−
()
=− +
∑∑∑∑
yy y y y y
k
k
n
2
22
2 ynyny
222
2−+
∑
yny
22
−
∑
yy yy
TT
−
© 2006 by Taylor & Francis Group, LLC
206 Modeling of Combustion Systems: A Practical Approach
Therefore,
Finally,
which we may expand as
But note that
because
Thus,
=
We may use either the summation or matrix equations to determine SSM,
SSR, and SST.
3.2.1 Use of the F Distribution
The proper measure of random error depends on knowing whether or not
the null hypothesis is true. But how can we know? Recall that a ratio of
variances distributes as an F distribution. Now if MSM/MSR ~ 1, then the
null hypothesis is true, and if MSM/MSR >> 1, the null hypothesis is false,
leaving the alternative hypothesis. But what if the ratio is 1.5? Should we
accept the null hypothesis then? A ratio greater than 1 could occur merely
by chance. Now we are 100% confident that if our ratio were infinitely large,
then MSM would be significant. If we were willing to be less confident, say
SST =−
()
=
∑
yy
k
k
n
2
yny
22
−=−
∑
yy yy
TT
SSM =−
()
∑
ˆ
yy
2
ˆ
yy−
()
=
∑
2
ˆˆ
yyy y
∑∑∑
−+2
2
ˆ
yyy
∑∑∑
==
yy
∑∑
=+=
ˆ
e
ˆˆ
yy
∑∑∑
+=+e0
ˆˆ
yy y ny−
()
=−
∑∑
2
22
ˆˆ
yy yy aXXa yy
TTTT T
−= −
© 2006 by Taylor & Francis Group, LLC
Experimental Design and Analysis 207
95% confident, then our theoretical ratio would not need to be as large as
infinity.
A normal distribution needed μ and σ to determine its shape. An F distri-
bution needs three things: the degrees of freedom in the numerator of the
variance ratio, the degrees of freedom in the denominator of the variance
ratio, and the confidence with which we desire to say that the ratio differs
from chance. Here is the general procedure.
Once we have the sum-of-squares relations, we calculate the mean sum of
squares by the degrees of freedom. Thus, MSM = SSM/DFM and MSR =
SSR/DFR. The mean squares are variances, and they distribute as chi-
squared variables. Therefore, the ratio of mean squares distributes as an F
distribution. Again, to determine an F distribution we need three things.
1.The degrees of freedom used to determine MSM (i.e., DFM)
2.The degrees of freedom used to determine MSR (i.e., DFR)
3.Decide the probability (P) we will use in judging that MSM/MSR ≠
1; that is, the ratio differs from 1 by more than chance alone. We
shall call C = 1 – P the confidence. Thus, if P ≤ 0.05, then we have C
≥ 95% that MSM/MSR > F. (Most texts use a lowercase p to indicate
probability. However, this text uses a lowercase p to indicate the
number of model parameters. Therefore, we use an uppercase P so
as not to confuse the two.)
Step 1: Specify P. We must do this before beginning any analysis. We
will certainly want P to be less than 0.10, indicating that we have
90% confidence or more (1 – 0.10 = 0.90) that our correlation is not
due to chance. The typical test is P ≤ 0.05 (95% confidence), denoted
F
95
(m, n), where the subscript indicates the percent confidence, and
m and n are the DFM and DFR, respectively.
Step 2: Determine MSM/MSR, m, and n.
Step 3: Compare MSM/MSR with the minimum F ratio. If MSM/
MSR > F
C
(m, n), then reject the null hypothesis and accept the alter-
native hypothesis — that the model is significant at the C × 100%
confidence level.
on C (or alternatively P), m, and n. For example, suppose we want to be sure
that a term belongs in the model with 95% confidence. Let us say that m =
DFM = 3 and n = DFR = 2 and C = 95% (P = 0.05). Then according to the
table, F
C
(m, n) = 19.2. (We may also find this from the Excel function
FINV(0.05,3,2)=19.2.) If MSM/MSR F
95
(3, 2) = 19.2, then we shall reject
the null hypothesis and conclude that MSM is a significant effect. Statistical
programs will give the P value directly and obviate the need for the table.
Spreadsheets will also do the same. In Excel, the command FDIST(F,m,n)
gives the P value for F
1–P
. Let us consider an example.
© 2006 by Taylor & Francis Group, LLC
Appendix E, Table E.4 gives the minimum ratio that this is so, depending
208 Modeling of Combustion Systems: A Practical Approach
Example 3.1 ANOVA for a Single-Factor Investigation
Problem statement: Derive the ANOVA for the following hypo-
thetical data of Table 3.2.
Compare the MSM/MSR ratio with that of an F
95
distribution
having the same degrees of freedom and determine whether the
null hypothesis is valid at the given confidence level. Use a
spreadsheet to assess the value of P.
Solution: We solve for the model using least squares
and generate
and
From these and Equations 3.30 to 3.32, we determine the following
entries to the ANOVA table: SSM = 729.30, SSR = 14.78, and SST =
744.08. Now DFM = 1 (a
1
), DFT = 6 (7 – 1 DF for the mean), leaving
DFR = 5 (6 – 1). This gives the following mean squares: MSM =
729.30/1 = 729.30, MSR = 14.78/5 = 2.96. Finally, we have F =
MSM/MSR = 729.30/2.96 = 246.77. This gives Table 3.3.
Now the ratio for F
95
E.4. Therefore, we conclude that the effect is significant. Using the
Excel function we have FDIST(246.77,1,5)= 1.9E-05, or P <
0.0001. At any rate, the model is statistically significant
and differs from the alternative (null) hypothesis that .
TABLE 3.2
A Single Factor Example
xy
–3 –19.5
–2 –17.9
–1 –12.7
0 –6.7
1 0.5
2 2.4
3 10.2
ya ax=+
01
a
a
0
1
624
510
⎛
⎝
⎜
⎞
⎠
⎟
=
−
⎛
⎝
⎜
⎞
⎠
⎟
.
.
ˆ
.
.
.
.
.
.
.
y =
−
−
−
−
⎛
⎝
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎞
21 6
17 9
12 7
67
05
24
12
⎠⎠
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
⎟
ya ax=+
01
ˆ
yy=
© 2006 by Taylor & Francis Group, LLC
(1, 5) = 246.77 according to Appendix E, Table
Experimental Design and Analysis 209
3.3 Two-Level Factorial Designs
A two-level factorial is an experimental design that investigates each factor at
two levels: high and low. Factorial designs generate orthogonal matrices and
allow us to assess separately each effect in a partitioned ANOVA. To begin,
let us consider the effect of three factors on NOx: excess oxygen, O
2
(x
1
); air
preheat temperature, APH (x
2
); and furnace bridgewall temperature, BWT
(x
3
). To investigate every possible combination of high and low factors
requires a minimum of 2
f
points. For example, consider Table 3.4.
Table 3.4 gives all possible combinations of high and low factor levels
along with the response ln(NOx). We use + and – to signify high and low
levels in the table. Actually, we can let + and – refer to +1 and –1, respectively,
by coding all values as follows:
(3.33)
where k is an index from 1 to p, is the k
th
-coded factor, is the k
th
factor
in the original metric, is the average value defined by Equation 3.34, and
is the half range, defined by Equation 3.35:
TABLE 3.3
ANOVA for Example 3.1
Term SS DF MS F(1, 5) P
Model, M 729.30 1 729.30 246.77 <0.0001
Residual, R 14.78 5 2.96
Total, T 744.08 6
TABLE 3.4
A Factorial Design in Three Factors
Point x
1
x
2
x
3
y
1 – – – 1.81
2 – – + 2.67
3 – + – 2.47
4 – + + 3.34
5 + – – 3.92
6 + – + 5.58
7 + + – 4.59
8 + + + 6.25
x
k
kk
k
=
−ξξ
ξ
x
k
ξ
k
ξ
k
ξ
k
© 2006 by Taylor & Francis Group, LLC
210 Modeling of Combustion Systems: A Practical Approach
(3.34)
(3.35)
where is the high level of the k
th
factor in the original metric and is
the low value. Equivalently, we may combine Equations 3.33 through 3.35
in a single equation:
(3.36)
They are linear transforms that make x
k
dimensionless, with zero mean
and unit half range in either direction. They give convenient numerical
properties, help minimize round-off error in later matrix calculations, and
establish a uniform dimensionless metric. One may easily invert them:
(3.37)
These are the typical coding relations for experimental design. However,
two other single-factor linear transforms may be useful in some situations:
0/1 coding and deviation-normalized coding:
(0/1 coding) (3.38)
(inverse for 0/1 coding) (3.39)
, where (deviation-normalized coding) (3.40)
(inverse for deviation-normalized coding) (3.41)
It follows that the linear transforms have the following relations:
(3.42)
ξ
ξξ
k
kk
=
+
+−
2
ˆ
ξ
ξξ
k
kk
=
−
+−
2
ξ
k
+
ξ
k
−
x
k
kk kk
kk
=
−
()
+−
()
−
()
+−
+−
ξξ ξξ
ξξ
ξξξ
kkkk
x=+
ˆ
x
k
k
k
,/
ˆ
01
2
=
−
−
ξξ
ξ
ξξ ξ
kkk
x=+
−
2
01
ˆ
,/
x
s
ks
kk
k
,
=
−
()
ξξ
s
n
k
kk
=
−
()
−
∑
ξξ
2
1
ξξ
kkksk
sx=+
,
ξξξξξξ
kkksk kk kkk
sx x x=+= +=+
−
,,/
ˆ
2
01
© 2006 by Taylor & Francis Group, LLC
Experimental Design and Analysis 211
One may use Equation 3.42 to convert among transforms. For factorial
designs, we will generally use ±1 coding. For the regression of historical data
sets from unplanned experiments, deviation-normalized coding may be pref-
erable. 0/1 coding will not generate orthogonal matrices, but sometimes the
investigator desires a model where the low level corresponds to zero. For
example, in classical experimentation, 0/1 coding better highlights that we
are exploring along a single-factor axis at a time. Also, 0/1 coding has some
advantage for representing a categorical factor with multiple levels (see
Example 3.2 Factor Coding
Problem statement: If the factors have the following ranges, give
the equations to code their values to ±1: oxygen, 1% to 5%; air
preheat temperature, 25 to 325°C; and furnace temperature, 800
to 1100°C. Give the inverse relations also.
Solution: From Equations 3.33 through 3.35 we have
, ,
and the inverse relations are
, ,
3.3.1 ANOVA for Several Model Effects
If X is orthogonal, we can assess each effect separately in the ANOVA table.
In Table 3.4, the model we are attempting to fit is
and it has the following normal matrix equation:
x
1
1
3
2
=
−
⎡
⎣
⎤
⎦
⎡
⎣
⎤
⎦
ξ %
%
x
2
2
175
=
−
⎡
⎣
⎤
⎦
⎡
⎣
⎤
⎦
ξ
o
o
C
150 C
x
3
3
950
150
=
−
⎡
⎣
⎤
⎦
⎡
⎣
⎤
⎦
ξ
o
o
C
C
ξ
11
32=
⎡
⎣
⎤
⎦
+
⎡
⎣
⎤
⎦
%%x ξ
22
175 150=
⎡
⎣
⎤
⎦
+
⎡
⎣
⎤
⎦
oo
CCx
ξ
33
950 150=
⎡
⎣
⎤
⎦
+
⎡
⎣
⎤
⎦
oo
CCx
ya ax ax ax=+ + +
0112233
30 63
10 05
267
505
8
8
8
8
.
.
.
.
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
=
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟⎟
⎟
⎟
⎟
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
a
a
a
a
a
a
a
a
0
1
2
3
0
1
2
3
;
⎟⎟
⎟
⎟
=
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
383
126
033
063
.
.
.
.
© 2006 by Taylor & Francis Group, LLC
Chapter 4).
212 Modeling of Combustion Systems: A Practical Approach
From the ANOVA, we may also derive a statistic to measure overall good-
ness of fit. We shall call it the coefficient of determination and represent it with
the symbol r
2
. It has the following definition:
(3.43)
If we desire, we may augment the ANOVA table with r
2
, though strictly
speaking it is not part of the accounting. However, the table has some empty
Table 3.5 gives the consolidated form of ANOVA that we will use.
However, because each effect in a factorial design is orthogonal to every
other, we may partition SSM and DFM further, as shown in Table 3.6.
We see now that the SSM in Table 3.5 partitions as 12.63 + 0.89 + 3.19 =
16.70, where the sums are for x
1
, x
2
, and x
3
, respectively. We can also see that
though Table 3.5 showed that the model as a whole was significant at greater
than 99.9% confidence (P = 0.0006), not all effects are significant at exactly
the same confidence level. In this particular case, it happens that all effects
are significant with greater than 95% confidence (P < 0.05). However, this
need not be the case. Even if the model is very significant, some individual
effects may not be.
3.3.2 General Features of Factorial Designs
A factorial design uses all possible high- and low-factor combinations of f
factors and comprises 2
f
experimental points. Therefore, the factorial design
can fit at most 2
f
terms according to
TABLE 3.5
Term SS DF MS FP
M16.7035.5770.50.0006
R0.3240.08 r
2
98.1%
T17.027 s 0.28
TABLE 3.6
Term SS DF MS FP
Model
a
1
12.63 1 12.63 159.8 0.0002
a
2
0.89 1 0.89 11.3 0.0283
a
3
3.19 1 3.19 40.3 0.0031
R 0.32 4 0.08 r
2
98.1%
T 17.02 7 s 0.28
r
2
1==−
SSM
SST
SSR
SST
© 2006 by Taylor & Francis Group, LLC
space, so why not? For that matter, we can also show s = √MSR.
Basic ANOVA for Table 3.4
Partitioned ANOVA for Table 3.4
Experimental Design and Analysis 213
(3.44)
The last term in Equation 3.44 will be the f-factor interaction. For example,
if f = 3, the last interaction (the eighth term, 2
3
=8) will be a
123
x
1
x
2
x
3
. Also, in
Equation 3.43, the number of summands indicates the overall order of the
term. For example,
comprises all the third-order interaction terms for a given factorial design,
presuming f = 3 or greater.
To construct the X matrix for all terms, we may make use of Equation 3.44.
We may also use Figure 3.3 to determine the number of terms that are first,
second, and third order, etc. Using Equation 3.11, the value of the m
th
param-
eter indicates the number of terms (m) of overall order f. For example, for
f = 2 we have entries (1, 2, 1) indicating that there is 1 zero-order term (k = 0),
2 first-order terms (k = 1), and 1 second-order (overall) term (k = 2). Since the
factorial has only high and low values of each factor, no term may contain
factors having an individual order above 1. Therefore, factorial terms that
overall are second order are of the form x
j
x
k
, terms that overall are third
order have the form x
h
x
j
x
k
, and so forth. If we want to know the number of
third-order terms for the 2
5
factorial design, we can use Equation 3.11 directly
to find
or we can view Figure 3.3 and note that for f = 5 and p = 3, m = 10.
3.3.3 Construction Details of the Two-Level Factorial
To construct the X matrix for a two-level factorial design, we use the follow-
ing rules.
1. There will be n = 2
f
design points, where f is the number of factors.
2. Construct factor columns by alternating between low (–) and high
(+) factor values in blocks of 2
f–k
, where k is the factor subscript.
3. Continue for f factors. The resulting matrix comprises the design
matrix of factor coordinates.
ya ax axx axxx
kk jkjk hjkhjk
kjkhjkjk
=+ + +
∑∑∑∑
<<<
0
∑∑∑
+$
axxx
hjk h j k
kjkhj
∑∑∑
<<
m =
−
()
=
5
353
10
!
!!
© 2006 by Taylor & Francis Group, LLC
214 Modeling of Combustion Systems: A Practical Approach
Example 3.3 Construction of a 2
3
Factorial Design
Problem statement: Construct a 2
3
factorial design using the fore-
going procedure.
Solution: As this is a 2
3
factorial design, f = 3 and n = 2
3
= 8. For
the first factor, k = 1; therefore, we alternate the first row in blocks
of 2
(3–1)
= 4:
For the second factor, k = 2, and we alternate signs in blocks of
2
(3–2)
= 2:
Finally, we add the third factor in blocks of 2
(3–3)
= 1:
Pt
1
2
3
4
5
6
7
8
1
x
−
−
−
−
+
+
+
+
Pt
1
2
3
4
5
6
7
8
12
xx
−−
−−
−+
−+
+−
+−
++
++
Pt
1
2
3
4
5
6
7
8
xxx
122
−−−
−−+
−+−
−++
+−−
+−+
++−
+++
© 2006 by Taylor & Francis Group, LLC
Experimental Design and Analysis 215
This is the 2
3
factorial design having eight data points comprising
all possible high/low combinations of three factors.
One may also derive the factorial matrix from binary counting. For readers
designs, conversion of the whole numbers from 0 to 2
f
–1 to binary gives the
sign pattern directly. Binary order is the sign pattern of the whole numbers
expressed in binary. The method also allows one to jump immediately to the
n
th
-factor pattern by expressing n – 1 as a binary whole number. Some authors
refer to this as standard order; others use the phrase to refer to the reverse
sign pattern. On occasion, we will renumber our points starting from 1 rather
than zero.
Example 3.4 Construction of a 2
3
Factorial Design,
Binary Counting Method
Problem statement: Construct a 2
3
factorial design using binary
counting. For a 2
6
design in binary order, what is the sign pattern
of the 40
th
entry?
Solution: This is a 2
3
factorial design. Therefore, the whole num-
bers from 0 to 2
3
– 1 are 0, 1, 2, 3, 4, 5, 6, and 7. Converting these
to binary we have
0 = 000 = – – –
1 = 001 = – – +
2 = 010 = – + –
3 = 011 = – + +
4 = 100 = + – –
5 = 101 = + – +
6 = 110 = + + –
7 = 111 = + + +
If we prefer to number our points starting from 1 rather than 0,
we add one to the above entries.
In matrix form (annotated), we would have
© 2006 by Taylor & Francis Group, LLC
unfamiliar with binary and related bases, see Appendix F. For full factorial