Tải bản đầy đủ (.pdf) (53 trang)

Class Notes in Statistics and Econometrics Part 29 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (491.27 KB, 53 trang )

CHAPTER 57
Applications of GLS with Nonspherical Covariance
Matrix
In most cases in which the covariance matrix is nonspherical, Ψ contains un-
known parameters, which must be estimated before formula (26.0.2) can be applied.
Of course, if all entries of Ψ are unknown, such estimation is impossible, since one
needs n(n + 1)/2 − 1 parameters to sp ecify a symmetric matrix up to a multiplica-
tive factor, but with n observations only n unrelated parameters can be estimated
consistently. Only in a few exceptional cases, Ψ is known, and in some even more
exceptional cases, there are unknown parameters in Ψ but (26.0.2) doe s not depend
on them. We will discuss such examples first: heteroskedastic disturbances with
known relative variances, and some examples involving equicorrelated disturbances.
1233
1234 57. APPLICATIONS WITH NONSPHERICAL COVARIANCE
57.1. Cases when OLS and GLS are identical
Problem 498. From y = Xβ + ε
ε
ε with ε
ε
ε ∼ (o, σ
2
I) follows P y = P Xβ + Pε
ε
ε
with P ε
ε
ε ∼ (o, σ
2
P P

). Which conditions must P satisfy so that the generalized


least squares regression of P y on P X with covariance matrix P P

gives the same
result as the original regression?
Problem 499. We are in the model y = Xβ + ε
ε
ε, ε
ε
ε ∼ σ
2
Ψ. As always, we
assume X has full column rank, and Ψ is nonsingular. We will discuss the special
situation here in which X and Ψ are such that ΨX = XA for some A.
• a. 3 points Show that the requirement ΨX = XA is equivalent to the
requirement that
R
[ΨX] =
R
[X]. Here
R
[B] is the range space of a matrix B, i.e.,
it is the vector space consisting of all vectors that can be written in the form Bc for
some c. Hint: For ⇒ show first that
R
[ΨX] ⊂
R
[X], and then show that
R
[ΨX]
has the same dimension as

R
[X].
Answer. ⇒: Clearly
R
[ΨX] ⊂
R
[X] since ΨX = XA and every XAc has the form Xd with
d = Ac. And since Ψ is nonsingular, and the range space is the space spanned by the column
vectors, and the columns of ΨX are the columns of X premultiplied by Ψ, it follows that the range
space of ΨX has the same dimension as that of X. ⇐: The ith column of ΨX lies in
R
[X], i.e.,
it can be written in the form Xa
i
for some a
i
. A is the matrix whose columns are all the a
i
. 
• b. 2 points Show that A is nonsingular.
57.2. HETEROSKEDASTIC DISTURBANC ES 1235
Answer. A is square, since XA = ΨX, i.e., XA has has as many columns as X. Now assume
Ac = o. Then XAc = o or ΨXc = o, and since Ψ is nonsingular this gives Xc = o, and since X
has full column rank, this gives c = o. 
• c. 2 points Show that XA
−1
= Ψ
−1
X.
Answer. X = Ψ

−1
ΨX = Ψ
−1
XA, a nd now postmultiply by A
−1
. 
• d. 2 points Show that in this case (X

Ψ
−1
X)
−1
X

Ψ
−1
= (X

X)
−1
X

,
i.e., the OLS is BLUE (“Kruskal’s theorem”).
Answer. (X

Ψ
−1
X)
−1

X

Ψ
−1
=

(A
−1
)

X

X

−1
(A
−1
)

X

= (X

X)
−1
A

(A
−1
)


X

=
(X

X)
−1
X


57.2. Heteroskedastic Disturbances
Heteroskedasticity means: error terms are independent, but their variances are
not equal. Ψ is diagonal, with positive diagonal elements. In a few rare cases the
relative variances are known. The main example is that the observations are means
of samples from a homoskedastic population with varying but known sizes.
This is a plausible example of a situation in which the relative variances are
known to be proportional to an observed (positive) nonrandom variable z (which may
1236 57. APPLICATIONS WITH NONSPHERICAL COVARIANCE
or may not be one of the explanatory variables in the regression). Here
V

ε
ε] = σ
2
Ψ
with a known diagonal
(57.2.1)
Ψ =






z
1
0 ··· 0
0 z
2
··· 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· z
n





. Therefore P =






1/

z
1
0 ··· 0
0 1/

z
2
··· 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· 1/

z

n





,
i.e., one divides every observation by the appropriate factor so that after the division
the standard deviations are equal. Note: this means that this transformed regression
usually no longer has a constant term, and therefore also R
2
loses its meaning.
Problem 500. 3 points The specificat ion is
(57.2.2) y
t
= β
1
+ β
2
x
t
+ β
3
x
2
t
+ ε
t
,
with E[ε

t
] = 0, var[ε
t
] = σ
2
x
2
t
for some unknown σ
2
> 0, and the errors are uncor-
related. Someone runs the OLS regression
(57.2.3)
y
t
x
t
= γ
1
+ γ
2
1
x
t
+ γ
3
x
t
+ v
t

and you have the estimates ˆγ
1
, ˆγ
2
, and ˆγ
3
from this regression. Compute estimates
of β
1
, β
2
, and β
3
using the ˆγ
i
. What properties do your estimates of the β
i
have?
57.2. HETEROSKEDASTIC DISTURBANC ES 1237
Answer. Divide the original specification by x
t
to get
(57.2.4)
y
t
x
t
= β
2
+ β

1
1
x
t
+ β
3
x
t
+
ε
t
x
t
.
Therefore ˆγ
2
is the BLUE of β
1
, ˆγ
1
that of β
2
, and ˆγ
3
that of β
3
. Note that the constant terms of
the old and new regression switch places! 
Now let us look at a random parameter model y
t

= x
t
γ
t
, or in vector notation,
using ∗ for element-by-element multiplication of two vectors, y = x ∗ γ. Here
γ
t
∼ IID(β, σ
2
), one can also write it γ
t
= β + δ
t
or γ = ιβ + δ with δ ∼ (o, σ
2
I).
This model can be converted into a heteroskedastic Least Squares model if one
defines ε
ε
ε = x ∗ δ. Then y = xβ + ε
ε
ε with ε
ε
ε ∼ (o, σ
2
Ψ) where
(57.2.5) Ψ =






x
2
1
0 ··· 0
0 x
2
2
··· 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· x
2
n






.
Since x

Ψ
−1
= x
−1

(taking the inverse element by element), and therefore x

Ψ
−1
x =
n, one gets
ˆ
β =
1
n

y
t
x
t
and var[
ˆ
β] = σ
2
/n. On the other hand, x


Ψx =

x
4
,
therefore var[
ˆ
β
OLS
] = σ
2

x
4
(

x
2
)
2
. Assuming that the x
t
are independent drawings
1238 57. APPLICATIONS WITH NONSPHERICAL COVARIANCE
of a random variable x with zero mean and finite fourth moments, it follows
(57.2.6) plim
var[
ˆ
β
OLS

]
var[
ˆ
β]
= plim
n

x
4
(

x
2
)
2
=
plim
1
n

x
4
(plim
1
n

x
2
)
2

=
E[x
4
]
(E[x
2
])
2
This is the kurtosis (without subtracting the 3). Theoretically it can be anything
≥ 1, the Normal distribution has kurtosis 3, and the economics time series usually
have a kurtosis between 2 and 4.
57.3. Equicorrelated Covariance Matrix
Problem 501. Assume y
i
= µ + ε
i
, where µ is nonrandom, E[ε
i
] = 0, var[ε
i
] =
σ
2
, and cov[ε
i
, ε
j
] = ρσ
2
for i = j (i.e., the ε

i
are equicorrelated).
(57.3.1)
V

ε
ε] = σ
2





1 ρ ··· ρ
ρ 1 ··· ρ
.
.
.
.
.
.
.
.
.
.
.
.
ρ ρ ··· 1






.
If ρ ≥ 0, then these error terms could have been obtained as follows: ε
ε
ε = z + ιu
where z ∼ (o, τ
2
I) and u ∼ (0, ω
2
) independent of z.
• a. 1 point Show that the covariance matrix of ε
ε
ε is
V

ε
ε] = τ
2
I + ω
2
ιι

.
57.3. EQUICORRELAT ED COVARIANCE MATRIX 1239
Answer.
V
[ιu] = ι var[u]ι


, add this to
V
[z]. 
• b. 1 point What are the values of τ
2
and ω
2
so that ε
ε
ε has the above covariance
structure?
Answer. To write it in the desired form, the following identities must hold: for the off-
diagonal elements σ
2
ρ = ω
2
, which g ives the desired formula for ω
2
and for the diagonal elements
σ
2
= τ
2

2
. Solving this for τ
2
and plugging in the formula for ω
2
gives τ

2
= σ
2
−ω
2
= σ
2
(1−ρ),

• c. 3 points Using matrix identity (A.8.20) (for ordinary inverses, not for g-
inverses) show that the generalized least squares formula for the BLUE in this model
is equivalent to the ordinary least squares formula. In other words, show that the
sample mean ¯y is the BLUE of µ.
Answer. Setting γ = τ
2

2
, we want to show that

ι

(I +
ιι

γ
)
−1
ι

−1

ι

(I +
ιι

γ
)
−1
y =

ι

I
−1
ι

−1
ι

I
−1
y.(57.3.2)
This is even true for arbitrary h and A:
h

(A +
hh

γ
)

−1
= h

A
−1
γ
γ + h

A
−1
h
;(57.3.3)

h

(A +
hh

γ
)
−1
h

−1
=
γ + h

A
−1
h

γh

A
−1
h
=
1
h

A
−1
h
+
1
γ
;(57.3.4)
1240 57. APPLICATIONS WITH NONSPHERICAL COVARIANCE
Now multiply the left sides and the righthand sides (use middle term in (57.3.4))

h

(A +
hh

γ
)
−1
h

−1

h

(A +
hh

γ
)
−1
=

h

A
−1
h

−1
h

A
−1
.(57.3.5)

• d. 3 points [Gre97, Example 11.1 on pp. 499/500]: Show that var[¯y] does not
converge to zero as n → ∞ while ρ remains constant.
Answer. By (57.3.4),
(57.3.6) var[¯y] = τ
2
(
1

n
+
1
γ
) = σ
2
(
1 −ρ
n
+ ρ) =
τ
2
n
+ ω
2
As n → ∞ this converges towards ω
2
, not to 0. 
Problem 502. [Chr87, pp. 361–363] Assume there are 1000 families in a
certain town, and denote the income of family k by z
k
. Let µ =
1
1000

1000
k=1
z
k
be the population average of all 1000 incomes in this fin ite population, and let

σ
2
=
1
1000

1000
k=1
(z
k
− µ)
2
be the population variance of the incomes. For the pur-
poses of this question, the z
k
are nonrandom, therefore µ and σ
2
are nonrandom as
well.
You pick at random 2 0 families without replacement, ask them what their income
is, and you want to compute the BLUE of µ on the basis of this random sample. Call
57.3. EQUICORRELAT ED COVARIANCE MATRIX 1241
the incomes in the sample y
1
, . . . , y
20
. We are using the letters y
i
instead of z
i

for this
sample, because y
1
is not necessarily z
1
, i.e., the income of family 1, but it may be,
e.g., z
258
. The y
i
are random. The process of taking the sample of y
i
is represented
by a 20 × 1000 matrix of random variables q
ik
(i = 1, . . . ,20, k = 1, . . . , 1000) with:
q
ik
= 1 if family k has been picked as ith family in the sample, and 0 otherwise. In
other words, y
i
=

1000
k=1
q
ik
z
k
or y = Qz.

• a. Let i = j and k = l. Is q
ik
independent of q
il
? Is q
ik
independent of q
jk
?
Is q
ik
independent of q
jl
?
Answer. q
ik
is not independent of q
il
: if q
ik
= 1, this means that family ik as been selected as
the jth family in the sample. Sin ce only one family can be selected as the ith family in the sample,
this implies q
il
= 0 for all l = k. q
ik
is dependent of q
jk
, because sampling is without replac eme nt:
if family k has been selected as the ith family in the sample, then it cannot be selected again as the

jth family of the sample. Is q
ik
independ ent of q
jl
? I think it is. 
• b. Show that the first and second moments are
(57.3.7)
E[q
ik
] = 1/1000, and E[q
ik
q
jl
] =





1/1000 if i = j and k = l
1/(1000 ·999) if i = j and k = l
0 otherwise.
1242 57. APPLICATIONS WITH NONSPHERICAL COVARIANCE
For these formulas you need the rules how to take expected values of discrete random
variables.
Answer. Since q
ik
is a zero-one variable, E[
q
ik

] = Pr[
q
ik
= 1] = 1/1000. This is obvious if
i = 1, and one can use a symmetry argument that it should not depend on i. And since for a zero-
one variable, q
2
ik
= q
ik
, it follows E[q
2
ik
] = 1/1000 too. Now for i = j, k = l, E[q
ik
q
jl
] = Pr[q
ik
=
1 ∩q
jl
= 1] = (1/1000)(1/999). Again this is obvious for i = 1 and j = 2, and can be extended by
symmetry to arbitrary pairs i = j. For i = j, E[q
ik
q
jk
] = 0 since z
k
cannot be chosen twice, and

for k = l, E[q
ik
q
il
] = 0 since only one z
k
can be chosen as the ith element in the sample. 
• c. Since

1000
k=1
q
ik
= 1 for all i, one can write
(57.3.8) y
i
= µ +
1000

k=1
q
ik
(z
k
− µ) = µ + ε
i
where ε
i
=


1000
k=1
q
ik
(z
k
− µ). Show that
(57.3.9) E[ε
i
] = 0 var[ε
i
] = σ
2
cov[ε
i
, ε
j
] = −σ
2
/999 for i = j
Hint: For the covariance note that from 0 =

1000
k=1
(z
k
− µ) follows
(57.3.10)
0 =
1000


k=1
(z
k
−µ)
1000

l=1
(z
l
−µ) =

k=l
(z
k
−µ)(z
l
−µ)+
1000

k=1
(z
k
−µ)
2
=

k=l
(z
k

−µ)(z
l
−µ)+1000σ
2
.
57.3. EQUICORRELAT ED COVARIANCE MATRIX 1243
Answer.
E[ε
i
] =
1000

k=1
(z
k
− µ) E[q
ik
] =
1000

k=1
z
k
− µ
1000
= 0(57.3.11)
var[ε
i
] = E[ε
2

i
] =
1000

k,l=1
(z
k
− µ)(z
l
− µ) E[q
ik
q
il
] =
1000

k=1
(z
k
− µ)
2
1000
= σ
2
(57.3.12)
and for i = j follows, using the hint for the last equal-sign
cov[ε
i
, ε
j

] = E[ε
i
ε
j
] =
1000

k,l=1
(z
k
− µ)(z
l
− µ) E[q
ik
q
jl
] =

k=l
(z
k
− µ)(z
l
− µ)
1000 ·999
= −σ
2
/999.
(57.3.13)


With ι
20
being the 20 × 1 column vector consisting of ones, one can therefore
write in matrix notation
y = ι
20
µ + ε
ε
ε E[ε
ε
ε] = o
V

ε
ε] = σ
2
Ψ
1244 57. APPLICATIONS WITH NONSPHERICAL COVARIANCE
where
(57.3.14) Ψ =





1 −1/999 ··· −1/999
−1/999 1 ··· −1/999
.
.
.

.
.
.
.
.
.
.
.
.
−1/999 −1/999 ··· 1





From what we know about GLS with equicorrelated errors (question
501) follows
therefore that the sample mean ¯y is the BLUE of µ. (This last part was an explanation
of the relevance of the question, you are not required to prove it.)
CHAPTER 58
Unknown Parameters in the Covariance Matrix
If Ψ depends on certain unknown parameters which are not, at the same time,
components of β or functions thereof, and if a consistent estimate of these parameters
is available, then GLS with this estimated covariance matrix, called “feasible GLS,”
is usually asymptotically efficient. This is an important result: one does not not
need an efficient estimate of the covariance matrix to get efficient estimates of β! In
this case, all the results are asymptotically valid, with
ˆ
Ψ in the formulas instead of
Ψ. These estimates are sometimes even unbiased!

1245
1246 58. UNKNOWN PARAMETERS IN THE COVARIANCE MATRIX
58.1. Heteroskedasticity
Heteroskedasticity means: error terms are independent, but do not have equal
variances. There are not enough data to get consistent estimates of all error variances,
therefore we need additional information.
The simplest kind of additional information is that the sample can be partitioned
into two different subsets, each subset corresponding to a different error variance,
with the relative variances known. Write the model as
(58.1.1)

y
1
y
2

=

X
1
X
2

β +

ε
ε
ε
1
ε

ε
ε
2

;
V
[

ε
ε
ε
1
ε
ε
ε
2

] = σ
2

κ
2
1
I O
O κ
2
2
I

= Φ.

Assume y
1
has n
1
and y
2
n
2
observations. The GLSE is
(58.1.2)
ˆ
β = (X

Φ
−1
X)
−1
X

Φ
−1
y =

X
1

X
1
κ
2

1
+
X
2

X
2
κ
2
2

−1

X
1

y
1
κ
2
1
+
X
2

y
2
κ
2
2


.
To make this formula operational, we have to replace the κ
2
i
by estimates. The
simplest way (if each subset has at least k + 1 observations) is to use the unbiased
estimates s
2
i
(i = 1, 2) from the OLS regressions on the two subsets separately.
Associated with this estimation is also an easy test, the Goldfeld Quandt test [Gre97,
551/2]. simply use an F -test on the ratio s
2
2
/s
2
1
; but reject if it is too big or too
58.1. HETEROSKEDASTICITY 1247
small. If we don’t have the lower significance points, check s
2
1
/s
2
2
if it is > 1 and
s
2
2

/s
2
1
otherwise.
Problem 503. 3 points In the model
(58.1.3)

y
1
y
2

=

X
1
X
2

β +

ε
ε
ε
1
ε
ε
ε
2


;
V
[

ε
ε
ε
1
ε
ε
ε
2

] =

σ
2
1
I O
O σ
2
2
I

in which X
1
is a 10×5 and X
2
a 20×5 matrix, you run the two regressions separately
and you get s

2
1
= 500 and s
2
2
= 100. Can you reject at the 5% significance level that
these variances are equal? Can you reject it at the 1% level? The enclosed tables are
from [Sch59, pp. 424–33].
Answer. The distribution of the ratio of estimated variances is s
2
2
/s
2
1
∼ F
15,5
, but since its
observed value is smaller than 1, use instead s
2
1
/s
2
2
∼ F
5,15
. The upper significance points for 0.005%
F
(5,15;0.005)
= 5.37 (which g ives a two-sided 1% significance level), for 1% it is F
(5.15;0.01)

= 4.56
(which gives a two-sided 2% significance level), for 2.5% F
(5,15;0.025)
= 3.58 (which gives a two-sided
5% significance level), and for 5% it is F
(5,15;0.05)
= 2.90 (which gives a two-sided 10% significance
level). A table can be found for instance in [Sch59, pp. 428/9]. To get the upper 2.5% point one
can also use the Splus-command qf(1-5/200,5,15). One can also get the lower significance points
simply by the command qf(5/200,5,15). The test is therefore significant at the 5% level but not
significant at the 1% level. 
1248 58. UNKNOWN PARAMETERS IN THE COVARIANCE MATRIX
Since the so-called Kmenta-Oberhofer conditions are satisfied, i.e., since Ψ does
not depend on β, the following iterative procedure converges to the maximum like-
lihood estimator:
(1) start with some initial estimate of κ
2
1
and κ
2
2
. [Gre97, p. 516] proposes to
start with the assumption of homoskedasticity, i.e., κ
2
1
= κ
2
2
= 1, but if each group
has enough observations to make separate estimates then I think a better starting

point would be the s
2
i
of the separate regressions.
(2) Use those κ
2
i
to get the feasible GLSE.
(3) use this feasible GLSE to get a new set κ
2
i
= s
2
i
(but divide by n
i
, not n
i
−k).
(4) Go back to (2).
Once the maximum likelihood estimates of β, σ
2
, and κ
2
i
are computed (actually
σ
2
and κ
2

i
cannot be identified separately, therefore one conventionally imp ose s a
condition like σ
2
= 1 or

i
κ
2
i
= n to identify them), then it is easy to test for
homoskedasticity by the LR test. In order to get the maximum value of the likelihood
function it saves us some work to start with the concentrated likelihood functions,
therefore we start with (35.0.17):
(58.1.4)
log f
y
(y; β, Ψ) = −
n
2
(1 + ln 2π −ln n) −
n
2
ln(y −Xβ)

Ψ
−1
(y −Xβ) −
1
2

ln det[Ψ]
58.1. HETEROSKEDASTICITY 1249
Since ˆσ
2
=
1
n
(y − Xβ)

Ψ
−1
(y − Xβ) and det[kΨ] = k
n
det[Ψ] one can rewrite
(35.0.17) as
(58.1.5) log f
y
(y; β, Ψ) = −
n
2
(1 + ln 2π) −
1
2
ln det[ˆσ
2
Ψ]
Now in the constrained case, with homoskedasticity assumed, Ψ = I and we will
write the OLS estimator as
ˆ
ˆ

β and
ˆ
ˆσ
2
= (
ˆ
ˆε

ˆ
ˆε)/n. Then ln det[
ˆ
ˆσ
2
I] = n ln[
ˆ
ˆσ
2
]. Let
ˆ
β be the unconstrained MLE, and
(58.1.6)
ˆ
Ψ =

ˆσ
2
1
I O
O ˆσ
2

2
I

there ˆσ
2
i
= ˆε

i
ˆε
i
/n
i
. The LR statistic is therefore (compare [Gre97, p. 516])
(58.1.7) λ = 2(log f
constrained
− log f
unconstrained
) = n ln
ˆ
ˆσ
2


n
i
ln ˆσ
2
i
In this particular case, the feasible GLSE is so simple that its finite sample

prop e rties are known. Therefore [JHG
+
88] use it as a showcase example to study
the question: Should one use the feasible GLSE always, or should one use a pre-test
estimator, i.e., test whether the variances are equal, and use the feasible GLS only if
this test can be rejected, otherwise use OLS? [JHG
+
88, figure 9.2 on p. 364] gives
the trace of the MSE-matrix for several possibilities.
1250 58. UNKNOWN PARAMETERS IN THE COVARIANCE MATRIX
58.1.1. Logarithm of Error Variances Proportional to Unknown Linear
Combination of Explanatory Variables. When we discussed heteroskedasticity
with known relative variances, the main example was the prior knowledge that the
error variances were proportional to some observed z. To generalize this procedure,
[Har76] proposes the following specification:
(58.1.8) ln σ
2
t
= z

t
α,
where α is a vector of unknown nonrandom parameters, and Z =



z

1
.

.
.
z

n



consists
of observations of m nonrandom explanatory variables which include the constant
“variable” ι. The variables in Z are often functions of certain variables in X, but
this is not necessary for the derivation that follows.
A special case of this specification is σ
2
t
= σ
2
x
p
t
or, after taking logarithms,
ln σ
2
t
= ln σ
2
+ p ln x
t
. Here Z =


ι ln x

and α

=

ln σ
2
p

.
Write (58.1.8) as 0 = z

t
α −ln σ
2
t
and add ln ε
2
t
to both sides to get
(58.1.9) ln ε
2
t
= z

t
α + ln(ε
2
t


2
t
).
This can be considered a regression equation with ln(ε
2
t

2
t
) as the disturbance term.
The assumption is that var[ln(ε
2
t

2
t
)] does not depend on t, which is the case if the
58.1. HETEROSKEDASTICITY 1251
ε
t

t
are i.i.d. The lefthand side of (58.1.9) is not observed, but one can take the
OLS residuals ˆε
t
; usually ln ˆε
2
t
→ ln ε

2
t
in the probability limit.
There is only one hitch: the disturbances in regression (58.1.9) do not have
zero expected value. Their expected value is an unknown constant. If one ignores
that and runs a regression on (58.1.9), one gets an inconsistent estimate of the
element of α which is the coefficient of the constant term in Z. This estimate really
estimates the sum of the constant term plus the expected value of the disturbance.
As a consequence of this inconsistency, the vector exp(Zα) estimates the vector of
variances only up to a joint multiplicative constant. I.e., this inconsistency is such
that the plim of the variance estimates is not equal but nevertheless proportional
to the true variances. But proportionality is all one needs for GLS; the missing
multiplicative constant is then the s
2
provided by the least squares formalism.
Therefore all one has to do is: run the regression (58.1.9) (if the F test does not
reject, then homoskedasticity cannot be rejected), get the (inconsistent but propor-
tional) es timates ˆσ
2
t
= exp(z

t
α), divide the tth observation of the original regression
by ˆσ
t
, and re-run the original regression on the transformed data. Consistent esti-
mates of σ
2
t

are then the s
2
from this transformed regression times the inconsistent
estimates ˆσ
2
t
.
58.1.2. Testing for heteroskedasticity: One test is the F -test in the proce-
dure just described. Then there is the Goldfeld-Quandt test: if it is possible to order
1252 58. UNKNOWN PARAMETERS IN THE COVARIANCE MATRIX
the observations in order of increasing error variance, run separate regressions on the
portion of the date with low variance and that with high variance, perhaps leaving
out some in the middle to increase power of the test, and then just making an F-test
with
SSE
high
/d.f.
SSE
low
/d.f.
.
Problem 504. Why does the Goldfeld-Quandt not use SSE
high
− SSE
low
in
the numerator?
58.1.3. Heteroskedasticity with Unknown Pattern. For consiste ncy of
OLS one needs
plim

1
n
X

ε
ε
ε = o(58.1.10)
Q = plim
1
n
X

X exists and is nonsingular(58.1.11)
Q

= plim
1
n
X

ΨX exists and is nonsingular(58.1.12)
Proof:
(58.1.13)
V
[
ˆ
β
OLS
] =
σ

2
n

1
n
X

X

−1
1
n
X

ΨX

1
n
X

X

−1
therefore plim
V
[
ˆ
β
OLS
] =

σ
2
n
Q
−1
Q

Q
−1
.
58.1. HETEROSKEDASTICITY 1253
Look at the following simple example from [Gre97, fn. 3 on p. 547:] y = xβ +ε
ε
ε
with var[ε
i
] = σ
2
z
2
i
. For the variance of the OLS estimator we need
(58.1.14) X

ΨX =

x
1
. . . x
n







z
2
1
0 ··· 0
0 z
2
2
··· 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ··· z
2
n









x
1
.
.
.
x
n



=
n

i=1
x
2
i
z
2
i
.
Then by (58.1.13) var[
ˆ

β
OLS
] = σ
2

i
x
2
i
z
2
i
(

i
x
2
i
)
2
. Now assume that x
i
and z
i
are indepen-
dent observations of the random variables x and z with E[z
2
] = 1 and cov[x
2
, z

2
] = 0.
In this case the naive regression output for the variance of
ˆ
β, which is s
2
N
= s
2
/

x
2
,
is indeed a consistent estimate of the variance.
(58.1.15)
plim
var[
ˆ
β
OLS
]
s
2
N
= plim
σ
2

x

2
z
2
s
2

x
2
= plim
σ
2
s
2
1
n

i
x
2
i
z
2
i
1
n

i
x
2
i

=
E[x
2
z
2
]
E[x
2
]
=
cov[x
2
, z
2
] + E[x
2
] E[z
2
]
E[x
2
]
= 1
I.e., if one simply runs OLS in this model, then the regression printout is not mis-
leading. On the other hand, it is clear that always var[
ˆ
β
OLS
] > var[
ˆ

β]; therefore if z
is observed, then one can do better than this.
1254 58. UNKNOWN PARAMETERS IN THE COVARIANCE MATRIX
Problem 505. Someone says: the formula
(58.1.16)
V
[
ˆ
β
OLS
] = σ
2
(X

X)
−1
X

ΨX(X

X)
−1
is useless; if one knows Ψ then one will use GLS, and if one does not know Ψ then
there are not enough data to estimate it. Comment on this.
Answer: This is a fallacy. In the above formula one does not need Ψ but X

ΨX,
which is a k ×k symmetric matrix, i.e., it has k(k +1)/2 different elements. And even
an inconsistent estimate of Ψ can lead to a consistent estimate of X


ΨX. Which
inconsistent estimate of Ψ shall we use? of course
ˆ
Ψ =



ˆε
2
1
··· 0
.
.
.
.
.
.
.
.
.
0 ··· ˆε
2
n



. Now since
(58.1.17) X

ΨX =


x
1
··· x
n




σ
2
1
··· 0
.
.
.
.
.
.
.
.
.
0 ··· σ
2
n







x

1
.
.
.
x

n



=

i
σ
2
i
x
i
x

i
one gets White’s heteroskedastic-consistent estimator.
(58.1.18) Est.V ar[
ˆ
β
OLS
] =

ˆε

ˆε
n
(X

X)
−1
(

i
ˆε
2
i
x
i
x

i
)(X

X)
−1
58.2. AUTOCORRELATION 1255
This estimator has become very fashionable, since one does not have to bother with
estimating the covariance structure, and since OLS is not too inefficient in these
situations.
It has been observed, however, that this estimator gives too small confidence
intervals in small samples. Therefore it is recommended in small samples to multiply
the estimated variance by the factor n/(n −k) or to use

ˆε
2
i
m
ii
as the estimates of σ
2
i
.
See [DM93, p. 554].
58.2. Autocorrelation
While heteroskedasticity is most often found with cross-sectional data, autocor-
relation is more common w ith time-series.
Prop e rties of OLS in the presence of autocorrelation. If the correlation between
the observations dies off sufficiently rapidly as the observations become further apart
in time, OLS is consistent and asymptotically normal, but inefficient. There is one
important exception to this rule: if the regression includes lagged dependent variables
and there is autocorrelation, then OLS and also GLS is inconsistent.
1256 58. UNKNOWN PARAMETERS IN THE COVARIANCE MATRIX
Problem 506. [JHG
+
88, p. 577] and [Gre97, 13.4.1]. Assume
y
t
= α + βy
t−1
+ ε
t
(58.2.1)
ε

t
= ρε
t−1
+ v
t
(58.2.2)
where v
t
∼ IID(0, σ
2
v
) and all v
t
are independent of ε
0
and y
0
, and |ρ| < 1 and
|β| < 1.
• a. 2 points Show that v
t
is independent of all ε
s
and y
s
for 0 ≤ s < t.
Answer. Both proofs by induction. First independence of v
t
of ε
s

: By induction assumpt ion,
v
t
independ ent of ε
s−1
and since t > s, i.e., t = s, v
t
is also independent of v
s
, therefore v
t
independ ent of ε
s
= ρε
s−1
+ v
s
. Now independence of v
t
of y
s
: By induction assumption, v
t
independ ent of y
s−1
and since t > s, v
t
is also independent of ε
s
, therefore v

t
independ ent of
y
s
= α + βy
s−1
+ ε
s
. 
• b. 3 points Show that var[ε
t
] = ρ
2t
var[ε
0
] + (1 − ρ
2t
)
σ
2
v
1−ρ
2
. (Hint: use induc-
tion.) I.e., since |ρ| < 1, var[ε
t
] converges towards σ
2
ε
=

σ
2
v
1−ρ
2
.
Answer. Here is the induction step. Assume that var[ε
t−1
] = ρ
2(t−1)
var[ε
0
]+(1−ρ
2(t−1)
)
σ
2
v
1−ρ
2
.
Since ε
t
= ρε
t−1
+ v
t
and v
t
is independent of ε

t−1
, it follows
(58.2.3)
var[ε
t
] = ρ
2
var[ε
t−1
]+var[v
t
] = ρ
2t
var[ε
0
]+ρ
2
(1−ρ
2(t−1)
)
σ
2
v
1 −ρ
2

2
v
= ρ
2t

var[ε
0
]+(1−ρ
2t
)
σ
2
v
1 −ρ
2
.
58.2. AUTOCORRELATION 1257

• c. 2 points (c) Show that cov[ε
t
, y
t−1
] = ρβ cov[ε
t−1
, y
t−2
] + ρ var[ε
t−1
].
Answer.
cov[ε
t
, y
t−1
] = cov[ρε

t−1
+ v
t
, α + βy
t−2
+ ε
t−1
](58.2.4)
= ρβ cov[ε
t−1
, y
t−2
] + ρ var[ε
t−1
](58.2.5)

• d. (d) 1 point Show that, if the process has had enough time to become sta-
tionary, it follows
(58.2.6) cov[ε
t
, y
t−1
] =
ρ
1 −ρβ
σ
2
ε
Answer. Do not yet compute var[ε
t−1

] at this point, just call it σ
2
ε
. Assu min g stationarity,
i.e., cov[ε
t
, y
t−1
] = cov[ε
t−1
, y
t−2
], it follows
cov[ε
t
, y
t−1
](1 −ρβ) = ρσ
2
ε
(58.2.7)
cov[ε
t
, y
t−1
] =
ρ
1 −ρβ
σ
2

ε
(58.2.8)

×