A dissertation submitted in partial satisfaction of the requirements for the degree of doctor of philosophy

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (750.83 KB, 45 trang )

Essays in the Economics of Education

UMI Number: 3183857

by
Jesse Morris Rothstein

A.B. (Harvard University) 1995

A dissertation submitted in partial satisfaction of the
requirements for the degree of

Copyright 2003 by
Rothstein, Jesse Morris
All rights reserved.

Doctor of Philosophy
in
Economics
in the
GRADUATE DIVISION
of the
UNIVERSITY OF CALIFORNIA, BERKELEY
UMI Microform 3183857

Committee in charge:
Professor David Card, Chair
Professor John M. Quigley
Professor Steven Raphael

Spring 2003

Copyright 2005 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.

ProQuest Information and Learning Company
300 North Zeeb Road
P.O. Box 1346
Ann Arbor, MI 48106-1346

Abstract
Essays in the Economics of Education
by
Jesse Morris Rothstein
Doctor of Philosophy in Economics
Essays in the Economics of Education

University of California, Berkeley
Professor David Card, Chair

Copyright 2003
by

Three essays consider implications of the strong association between student
background characteristics and academic performance.

Jesse Morris Rothstein
Chapter One considers the incentives that school choice policies might create for the
efficient management of schools. These incentives would be diluted if parents prefer
schools with desirable peer groups to those with inferior peers but better policies and

instruction. I model a “Tiebout choice” housing market in which schools differ in both peer
group and effectiveness. If parental preferences depend primarily on school effectiveness,
we should expect both that wealthy parents purchase houses near effective schools and that
decentralization of educational governance facilitates this residential sorting. On the other
hand, if the peer group dominates effectiveness in parental preferences, wealthy families will
still cluster together in equilibrium but not necessarily at effective schools. I use a large
sample of SAT-takers to examine the distribution of student outcomes across schools within
metropolitan areas that differ in the structure of educational governance, and find little
evidence that parents choose schools for characteristics other than peer groups.

1

This result suggests that competition may not induce improvements in educational
productivity, and indeed I do not obtain Hoxby’s (2000a) claimed relationship between
school decentralization and student performance. I address this discrepancy in Chapter
Two. Using Hoxby’s own data and specification, as described in her published paper, I am
unable to replicate her positive estimate, and I find several reasons for concern about the
validity of her conclusions.
Chapter Three considers the role of admissions tests in predictions of student
collegiate performance. Traditional predictive validity studies suffer from two important
shortcomings. First, they do not adequately account for issues of sample selection. Second,
they ignore a wide class of student background variables that covary with both test scores

To Joanie, for everything.

and collegiate success. I propose an omitted variables estimator that is consistent under
restrictive but sometimes plausible sample selection assumptions. Using this estimator and
data from the University of California, I find that school-level demographic characteristics
account for a large portion of the SAT’s apparent predictive power. This result casts doubt

on the meritocratic foundations of exam-based admissions rules.

2

i

3. College Performance Predictions and the SAT

Contents
List of Figures

iv

List of Tables

v

Preface

vi

Acknowledgements

x

1. Good Principals or Good Peers? Parental Valuation of School Characteristics,
Tiebout Equilibrium, and the Incentive Effects of Competition among
Jurisdictions

1

1.1. Introduction .........................................................................................................................1
1.2. Tiebout Sorting and the Role of Peer Groups: Intuition...........................................10
1.3. A Model of Tiebout Sorting on Exogenous Community Attributes ........................15
1.3.1. Graphical illustration of market equilibrium
21
1.3.2. Simulation of expanding choice
24
1.3.3. Allocative implications and endogenous school effectiveness
27
1.4. Data .....................................................................................................................................28
1.4.1. Measuring market concentration
28
1.4.2. Does district structure matter to school-level choice?
30
1.4.3. SAT data
34
1.5. Empirical Results: Choice and Effectiveness Sorting.................................................37
1.5.1. Nonparametric estimates
38
1.5.2. Regression estimates of linear models
39
1.6. Empirical Results: Choice and Average SAT Scores ..................................................49
1.7. Conclusion..........................................................................................................................51
Tables and Figures for Chapter 1..............................................................................................55
2. Does Competition Among Public Schools Really Benefit Students? A
Reappraisal of Hoxby (2000)

References

128

Appendices

135

Appendix A. Choice and School-Level Stratification.......................................................135
Appendix B. Potential Endogeneity of Market Structure................................................137
Appendix C. Selection into SAT-Taking............................................................................141
Appendix D. Proofs of Results in Chapter 1, Section 3...................................................144
Tables and Figures for Appendices ........................................................................................153

69

2.1. Introduction .......................................................................................................................69
2.2. Data and Methods.............................................................................................................72
2.2.1. Econometric framework
76
2.3. Replication..........................................................................................................................78
2.4. Sensitivity to Geographic Match.....................................................................................80
2.5. Are Estimates From the Public Sector Biased? ............................................................82
2.6. Improved Estimation of Appropriate Standard Errors...............................................85
2.7. Conclusion..........................................................................................................................88
Tables and Figures for Chapter 2..............................................................................................90
ii

97

3.1. Introduction .......................................................................................................................97
3.2. The Validity Model .........................................................................................................100

3.2.1. Restriction of range corrections
101
3.2.2. The logical inconsistency of range corrections
102
3.3. Data ...................................................................................................................................104
3.3.1. UC admissions processes and eligible subsample construction
106
3.4. Validity Estimates: Sparse Model.................................................................................107
3.5. Possible Endogeneity of Matriculation, Campus, and Major ...................................110
3.6. Decomposing the SAT’s Predictive Power .................................................................114
3.7. Discussion ........................................................................................................................119
Tables and Figures for Chapter 3............................................................................................122

iii

List of Figures
1.1
1.2
1.3
1.4
1.5
1.6
1.7

List of Tables

Schematic: Illustrative allocations of effective schools in Tiebout
equilibrium, by size of peer effect and number of districts ........................................62
Simulations: Average effectiveness of equilibrium schools in 3and 10-district markets, by income and importance of peer group...........................63

Simulations: Slope of effectiveness with respect to average income in
Tiebout equilibrium, by market structure and importance of peer group................64
Distribution of district-level choice indices across 318 U.S.
metropolitan areas.............................................................................................................65
Student characteristics and average SAT scores, school level ....................................66
Nonparametric estimates of the school-level SAT score-peer group
relationship, by choice quartile........................................................................................67
“Upper limit” effect of fully decentralizing Miami’s school governance
on the across-school distribution of SAT scores .........................................................68

3.1

Conditional expectation of SAT given HSGPA, three samples...............................127

B1
C1
D1

Number of school districts over time ..........................................................................160
SAT-taking rates and average SAT scores across MSAs ...........................................161
Illustration of single-crossing: Indifference curves in q-h space.............................161

1.1
1.2
1.3
1.4
1.5

Summary statistics for U.S. MSAs ..................................................................................55
Effect of district-level choice index on income and racial stratification...................56

Summary statistics for SAT sample................................................................................57
Effect of Tiebout choice on the school-level SAT score-peer group gradient........58
Effect of Tiebout choice on the school-level SAT score-peer group
gradient: Alternative specifications................................................................................59
Effect of Tiebout choice on the school-level SAT score-peer group
gradient: Evidence from the NELS and the CCD......................................................60
Effect of Tiebout choice on average SAT scores across MSAs.................................61

1.6
1.7
2.1
2.2

First-stage models for the district-level choice index ..................................................90
Basic models for NELS 8th grade reading score, Hoxby (2000b)
and replication ...................................................................................................................91
Effect of varying the sample definition on the estimated choice effect ...................92
Models that control for the MSA private enrollment share........................................93
Estimated choice effect when sample includes private schools .................................94
Alternative estimators of the choice effect sampling error, base
replication sample .............................................................................................................95
Estimates of Hoxby’s specification on SAT data .........................................................96

2.3
2.4
2.5
2.6
2.7
3.1
3.2

3.3
3.4

Summary statistics for UC matriculant and SAT-taker samples ..............................122
Basic validity models, traditional and proposed models............................................123
Specification checks ........................................................................................................124
Individual and school characteristics as determinants of SAT scores
and GPAs .........................................................................................................................125
Accounting for individual and school characteristics in FGPA prediction............126

3.5
A1
A2

Evidence on choice-stratification relationship: Additional measures.....................153
Alternative measures of Tiebout choice: Effects on segregation and
stratification .....................................................................................................................154
Effect of district-level choice on tract-level income and racial stratification .........155
First-stage models for MSA choice index....................................................................156
2SLS estimates of effect of Tiebout choice.................................................................157
Sensitivity of individual and school average SAT variation to
assumed selection parameter .........................................................................................158
Stability of school mean SAT score and peer group background
characteristics over time.................................................................................................158
Effect of Tiebout choice on the school-level SAT score-peer group
gradient: Estimates from class rank-reweighted sample...........................................159

A3
B1
B2

C1
C2
C3

iv

v

insights into the underlying processes and new ways of thinking about the available policy

Preface

options.
The first two chapters consider parents’ choice of schools for their children. The

It is a well-established fact that students’ socioeconomic background has substantial
predictive power for their educational outcomes. Children whose parents are highly

claim that parental choice can create incentives for schools to become more productive is a

educated, whose households are stable, and whose families have high incomes substantially

tenet of the neoclassical analysis of education. It relies crucially on the assumption that

outperform their less advantaged peers on every measure of educational output.

parents will choose effective, productive schools. This is far from obvious—if peer effects

With nearly as long a pedigree is the idea that these family background effects may

are important, parents may be perfectly rational in preferring wealthy, ineffective schools to

operate above the individual level. The school-level association between average student

competitors that are less advantaged but more effective, and even if there are no peer effects,

background and average performance is typically much stronger than is the same association

the strong association between school average test scores and student composition may

at the individual level. The interpretation of school-level correlations is nevertheless

make it difficult for parents to assess a school’s effectiveness. But if parents, in practice

controversial: They may arise because academic outcome measures are noisy, implying that

even if not by intent, choose schools primarily on the basis of their student composition

group means are more reliable than are individual scores; because students with

rather than for their effectiveness, the incentives created for school administrators will be

unobservably attentive parents disproportionately attend schools that enroll observably

diluted.

advantaged students; because the system of education funding assigns greater resources to

Chapter One develops this idea and implements tests of the hypothesis that school

schools in wealthy neighborhoods; or because there really are peer effects in educational

effectiveness is an important determinant of residential choices among local-monopoly

production.

school districts. I model a “Tiebout”-style housing market in which house prices ration

For many purposes, however, one need not know why it is that schools with

access to desirable schools, which may be desirable either because they are particularly

advantaged students outscore those with disadvantaged students; the fact that they do is

effective or because they enroll a desirable set of students. I develop observable implications

itself of substantial importance. This dissertation focuses on two such topics: The

of these two hypotheses for the degree of stratification of student test scores across schools,

competitive impacts of school choice programs, and the design of college admissions rules.

and I look for evidence of these implications in data on the joint distribution of student

In each case, when I incorporate into the standard analysis the key fact that student

characteristics and SAT scores. I find strong evidence that schools are an important

composition may function as a signal of student performance (and vice versa), I obtain new

component of the residential choice and that housing markets create sorting by family
income across schools. Tests of the hypothesis that this sorting is driven by parental pursuit

vi

vii

of effective schools, however, come up empty. This suggests that residential choice

implement an omitted variables estimator that is unbiased under restrictive, but sometimes

processes–and possibly, although the analogy is not particularly strong, non-residential

plausible, assumptions about the selection process.

choice programs like vouchers—are unlikely to create incentives for schools to become
more effective.

A second shortcoming of the validity literature is more fundamental. In a world in
which student background characteristics are known to be correlated with academic success

This result conflicts with a well-known recent result from Hoxby (2000a), who

(i.e. with both SAT scores and collegiate grades), it is quite difficult to interpret validity

argues that metropolitan areas with less centralized educational governance, and therefore

estimates that fail to take account of these background characteristics. A study can identify a

more competition among local school districts, produce better student outcomes at lower

test as predictively valid without being informative about whether the test provides an

cost. In Chapter Two, I attempt to get to the bottom of the discrepancy. I reanalyze a

independent measure of academic preparedness or simply proxies for the excluded

portion of Hoxby’s data, and find reason to suspect the validity of her conclusions. I am

background characteristics.

unable to reproduce her results, which appear to be quite sensitive to the exact sample and

In University of California data, I find evidence that observable background

specification used. I find suggestive evidence, however, that her estimates, from a sample of

characteristics—particularly those describing the composition of the school, rather than the

public school students, are upward biased by selection into private schools. Moreover, an

individual’s own background—are strong predictors of both SAT scores and collegiate

investigation of the sampling variability of Hoxby’s estimates leads to the conclusion that her

performance, and that much of the SAT’s apparent predictive power derives from its

standard errors are understated, and that even her own point estimates of the competitive

association with these background characteristics. This suggests that the SAT may not be a

effect are not significantly different from zero.

crucial part of the performance-maximizing admissions rule, as the background variables

Chapter Three turns to a wholly different, but not unrelated, topic, the role of

themselves provide nearly all the information contained in SAT scores. It also suggests that

admissions exam scores in the identification of well-prepared students in the college

existing predictive validity evidence does not establish the frequent claim that the SAT is a

admissions process. The case for using such exams is often made with “validity” studies,

meritocratic admissions tool, unless demographic characteristics are seen as measures of

which estimate the correlation between test scores and eventual collegiate grades, both with

student merit.

and without controls for high school grade point average. I argue that there are two
fundamental problems with these studies as they are often carried out. First, they do not
adequately account for the biases created by estimation from a selected sample of students
whose collegiate grades are observable because they were granted admission. I propose and
viii

ix

that in Chapter 3 by the Center for Studies in Higher Education. David Card and Alan

Acknowledgements

Krueger provided the SAT data used throughout. Cecilia Rouse provided the hard-to-obtain

I am very much indebted to David Card, for limitless advice and support throughout

School District Data Book used in Chapters 1 and 2. Saul Geiser and Roger Studley of the

my graduate school career. The research here has benefited in innumerable ways from his

University of California Office of the President provided the student records that permitted

many suggestions, as have I. It is hard to imagine a better advisor.

the research in Chapter 3. The usual disclaimer applies: Any opinions, findings,

I am grateful to the members of my various committees—Alan Auerbach, John

conclusions or recommendations expressed are my own and do not necessarily reflect the

Quigley, Steve Raphael, Emmanuel Saez, and Eugene Smolensky—for reading drafts that

views of the National Science Foundation, the Fisher Center, the Center for Studies in

were far too long and too unpolished, and for nevertheless finding many errors and

Higher Education, the College Board, the UC Office of the President, or any of my

omissions.

advisors.

I have benefited from discussions with David Autor, Jared Bernstein, Ken Chay,

Last, but not least, there is a sense in which Larry Mishel deserves substantial credit

Tom Davidoff, John DiNardo, Nada Eissa, Jonah Gelbach, Alan Krueger, David Lee,

for my Ph.D., as without his determined efforts at persuasion, I would never have pursued it

Darren Lubotsky, Rob McMillan, Jack Porter, and Diane Whitmore, and from participants at

in the first place.

several seminars where I have presented versions of the work contained here. I also thank
my various officemates over the last five years, particularly Liz Cascio, Justin McCrary, Till
von Wachter, and Eric Verhoogen, for many helpful conversations. All of the research
contained here has been much improved by my interactions with those mentioned above,
and with others who I have surely neglected here.
One must live while conducting research. I thank my family and friends for putting
up with me these last five years and for helping me to stay sane throughout. I hope that I
have not been too unbearable.
Much of my graduate career was supported under a National Science Foundation
Graduate Research Fellowship. In addition, the research in Chapters 1 and 2 was partially
supported by the Fisher Center for Real Estate and Urban Economics at U.C. Berkeley and
x

xi

The potential effects of school choice programs depend critically on what

Chapter 1.

characteristics parents value in schools. Hanushek, for example, notes that parents might
not choose effective schools over others that are less effective but offer “pleasant

Good Principals or Good Peers? Parental
Valuation of School Characteristics, Tiebout
Equilibrium, and the Incentive Effects of
Competition among Jurisdictions

surroundings, athletic facilities, [and] cultural advantages,” (1981, p. 34). To the extent that
parents choose productive schools, market discipline can induce greater productivity from
school administrators and teachers. If parents primarily value other features, however,
market discipline may be less successful. Hanushek cautions: “If the efficiency of our school
systems is due to poor incentives for teachers and administrators coupled with poor decisionmaking by consumers, it would be unwise to expect much from programs that seek to

1.1.

Introduction

strengthen ‘market forces’ in the selection of schools,” (1981, p. 34-35; emphasis added).

Many analysts have identified principal-agent problems as a major source of
underperformance in public education. Public school administrators need not compete for
customers and are therefore free of the market discipline that aligns producer incentives with
consumer demand in private markets. Chubb and Moe, for example, argue that the interests

of parents and students “tend to be far outweighed by teachers’ unions, professional
organizations, and other entrenched interests that, in practice, have traditionally dominated

Moreover, if students’ outcomes depend importantly on the characteristics of their
classmates (i.e. if so-called “peer effects” are important components of educational
production), even rational, fully informed, test-score-maximizing parents may prefer schools
with poor management but desirable peer groups to better managed competitors that enroll
less desirable students, and administrators may be more reliably rewarded for enrolling the
right peer group than for offering effective instruction.

the politics of education,” (1990, p. 31).1 One proposed solution—advocated by Friedman
(1962) and others—is to allow dissatisfied parents to choose another school, and to link
school administrators’ compensation to parents’ revealed demand. This would strengthen
parents relative to other actors, and might “encourage competition among schools, forcing
them into higher productivity,” (Hoxby, 1994, p. 1).
1

Chubb and Moe also identify the school characteristics that parents would presumably choose, given more
influence: “strong leadership, clear and ambitious goals, strong academic programs, teacher professionalism,
shared influence, and staff harmony,” (p. 187). See also Hanushek (1986) and Hanushek and Raymond
(2001).

The mechanisms typically proposed to increase parental choice—vouchers, charter
schools, etc.—are not at present sufficiently widespread to permit decisive empirical tests
either of parental revealed preferences or of their ultimate effects on school productivity.2
Economists have long argued, however, that housing markets represent a long established,
potentially informative form of school choice (Tiebout, 1956; Brennan and Buchanan, 1980;

2

1

Hsieh and Urquiola (2002) study a large-scale voucher program in Chile, but argue that effects on school
productivity cannot be distinguished from the allocative efficiency effects of student stratification.

2

Oates, 1985; Hoxby, 2000a). Parents exert some control over their children’s school

A second issue is that there is little or no threat of market entry when competition is

assignment via their residential location decisions, and can exit undesirable schools by

among geographically-based school districts. In the absence of entry, administrators of

moving to a neighborhood served by a different school district. As U.S. metropolitan areas

undesirable districts are not likely to face substantial declines in enrollment. Indeed, a

vary dramatically in the amount of control over children’s school assignment that the

reasonable first approximation is that total (public) school and district enrollments are

residential decision affords to parents, one can hope to infer the effect of so-called Tiebout

invariant to schools’ relative desirability.5 Instead, Tiebout choice works by rewarding the

choice by comparing student outcomes across metropolitan housing markets (Borland and

administrator of a preferred school with a better student body and with wealthier and more

Howsen, 1992; Hoxby, 2000a).3

motivated parents. There are obvious benefits for educational personnel in attracting an

In this chapter, I use data on school assignments and outcomes of students across
schools within different metropolitan housing markets to assess parents’ revealed

advantaged population, and I assume throughout this chapter that the promise of such
rewards can create meaningful incentives for school administrators.

preferences. To preview the results, I find little evidence that parents use Tiebout choice to

My analysis of parental choices focuses on the possibility that parents may choose

select effective schools over those with desirable peers, or that schools are on average more

schools partly on the basis of the peer group offered. Although existing research does not

effective in markets that offer more choice.

conclusively establish the causal contribution of peer group characteristics to student

In modeling the effects of parental preferences on equilibrium outcomes under

outcomes (see, e.g., Coleman et al., 1966; Hanushek, Kain, and Rivkin, 2001; Katz, Kling,

Tiebout choice, it is important to account for two key issues that do not arise under choice

and Liebman, 2001), anecdotal evidence suggests that parents may place substantial weight

programs like vouchers. The first is that residential choice rations access to highly-

on the peer group in their assessments of schools and neighborhoods. Realtor.com, a web

demanded schools by willingness-to-pay for local housing.4 As a result, both schools and

site for house hunters, offers reports on several neighborhood characteristics that parents

districts in high-choice markets (those with many competing school districts) are more

apparently value. These include a few variables that may be interpreted as measures of

stratified than in low-choice markets. Increased stratification can have allocative efficiency

school resources or effectiveness (e.g. class size and the number of computers); detailed

consequences that confound estimates of the effect of choice on productive efficiency.

socioeconomic data (e.g. educational attainment and income); and the average SAT score at
the local high school. Given similar average scores, test-score maximizers should prefer

3
4

Hoxby argues that this sort of analysis can “demonstrate general properties of school choice that are helpful
for thinking about reforms,” (2000a, p. 1209). Belfield and Levin (2001) review other, similar studies.
Small-scale voucher programs may not have to ration desired schools, or may be able to use lotteries for this
purpose. One imagines that broader programs will use some form of price system, perhaps by allowing
parents to “top up” their vouchers (Epple and Romano, 1998).

3

5

Poor school management can, of course, lead parents to choose private schools, lowering public enrollment.
Similarly, areas with bad schools may disproportionately attract childless families. These are likely secondorder effects. The private option, in any case, is not the mechanism by which residential choice works but an
alternative to it: Inter-jurisdictional competition has been found to lower private enrollment rates (Urquiola,
1999; Hoxby, 2000a).

4

demographically unfavorable schools, as these must add more value to attain the same

identical peer groups. I allow a continuous distribution of student characteristics, which

outcomes as their competitors with more advantaged students.6 While it is possible that

forces parents to trade off peer group against effectiveness in their school choices. This

parents use the demographic data in this way, it seems more likely that home buyers prefer

seems a more accurate characterization of Tiebout markets, as the median U.S. metropolitan

wealthier neighborhoods, even conditional on average student performance (Downes and

area has fewer than a dozen school districts from which to choose. It leads to a substantially

Zabel, 1997).7

different understanding of the market dynamics, as Hoxy’s assumption of competing schools

With several school characteristics over which parents may choose, understanding
which schools are chosen and which administrators are rewarded requires a model of

with identical peer groups eliminates the “stickiness” that concern for peer group can create
and that is the primary focus here.

residential choice. I build on the framework of so-called multicommunity models in the

As in other multicommunity models, equilibrium in my model exhibits complete

local public finance literature (Ross and Yinger, 1999), but I introduce a component of

stratification: High-income families live in districts that are preferred to (and have higher

school desirability that is exogenous to parental decisions, “effectiveness,” which is thought

housing prices than) those where low-income families live. That this must hold regardless of

of as the portion of schools’ effects on student performance that does not depend on the

what parents value points to a fundamental identification problem in housing price-based

characteristics of enrolled students. Parental preferences among districts depend on both

estimates of parental valuations: 8 Peer group and, by extension, average student

peer group and effectiveness, and I consider the implications of varying the relative weights

performance are endogenous to unobserved determinants of housing prices. One

of these characteristics for the rewards that accrue in equilibrium to administrators of

estimation strategy that accommodates this endogeneity is that taken by Bayer, McMillan,

effective schools.

and Reuben (2002), who estimate a structural model for housing prices and community

Hoxby (1999b) also models Tiebout choice of schools, but she assumes a discrete

composition in San Francisco.

distribution of student types and allows parents to choose only among schools offering

I adopt a different strategy: I compare housing markets that differ in the strength of
the residential location-school assignment link, and I develop simple reduced-form

This does not rely on assumptions about the peer effect: The effect of individual characteristics on own test
scores, distinct from any spillover effects, is not attributable to the school, and test-score-maximizing parents
should penalize the average test scores of schools with advantaged students to remove this effect (Kain,
Staiger, and Samms, 2002).
7 Postsecondary education offers additional evidence of strong preferences over the peer group: Colleges
frequently trumpet the SAT scores of their incoming students—the peer group—while data on graduates’
achievements relative to others with similar initial qualifications, which would arguably be more informative
about the college’s contribution, are essentially non-existent. Along these lines, Tracy and Waldfogel (1997)
find that popular press rankings of business schools reflect the quality of incoming students more than the
schools’ contributions to students’ eventual salaries (but see also Dale and Krueger, 1999, who obtain

somewhat conflicting results at the undergraduate level).
6

implications of parental valuations for the across-school distribution of student
characteristics and educational outcomes as a function of the strength of this link. This
across-market approach has the advantage that it does not rely on strong exclusion
restrictions or distributional assumptions. My primary assumptions are that the causal effect
8

5

Shepard (1999) reviews hedonic studies of housing markets

6

of individual and peer characteristics on student outcomes does not vary systematically with

moving to the next lower peer group district and thus reduces the probability that wealthy

the structure of educational governance; that the peer effect can be summarized with a small

families will be trapped in districts with ineffective schools.

number of moments of the within-school distribution of student characteristics; and that

Effectiveness sorting should be observable as a magnification of the causal peer

school effectiveness acts to shift the average student outcome independent of the set of

effect, as it creates a positive correlation between the peer group and an omitted variable—

students enrolled.

school effectiveness—in regression models for student outcomes.9 This provides my

Like Baker, McMillan, and Reuben (2002), I identify parental valuations by the

identification: I look for evidence that the apparent peer effect, the reduced-form gradient

location of clusters of high income families: If parental preferences over communities depend

of school average test scores with respect to student characteristics, is larger in high-choice

exclusively on the effectiveness of the local schools, the most desirable—and therefore

than in low-choice markets. If parents select schools for effectiveness, wealthy parents

wealthiest—communities are necessarily those with the most effective schools. If peer

should be better able to obtain effective schools in markets where decentralized governance

group matters at all to parents, however, there can be “unsorted” equilibria in which

facilitates the choice of schools through residential location, and student performance should

communities with ineffective schools have the wealthiest residents and are the most

be more tightly associated with peer characteristics in these markets. If parents instead select

preferred. These equilibria result from coordination failures: The wealthy families in

schools primarily for the peer group, there is no expectation that wealthy students will attend

ineffective districts would collectively have the highest bids for houses assigned to more

effective schools in equilibrium, regardless of market structure, and the peer group-student

effective schools, but no individual family is willing to move alone to a district with

performance relationship should not vary systematically with Tiebout choice.

undesirable peers.

I use a unique data set consisting of observations on more than 300,000

The more importance that parents attach to school effectiveness, the more likely we

metropolitan SAT takers from the 1994 cohort, matched to the high schools that students

are to observe equilibria in which wealthy students attend more effective schools than do

attended. The size of this sample permits accurate estimation of both peer quality and

lower-income students. Moreover, if parental concern for peer group is not too large, the

average performance for the great majority of high schools in each of 177 metropolitan

model predicts that this equilibrium effectiveness sorting will tend to be more complete in

housing markets. I find no evidence that the association between peer group and student

high-choice markets, those with many small school districts, than in markets with more

performance is stronger in high-choice than in low-choice markets. This result is robust to

centralized governance. This is because higher choice markets divide the income

9

distribution into smaller bins, which reduces the cost (in peer quality) that families pay for

7

Willms and Echols (1992, 1993) are the first authors of whom I am aware to note the importance of the
distinction between preferences for peer group and for effective schools. They use hierarchical linear
modeling techniques (Raudenbush and Willms, 1995; Raudenbush and Bryk, 2002), and estimate school
effectiveness as the residual from a regression of total school effects on peer group. This is appropriate if
there is no effectiveness sorting; otherwise, it may understate the importance of effectiveness in output and in
parental choices.

8

nonlinearity in the causal effects of the peer group as well as to several specifications of the

does matter for student performance, but that it does not matter greatly to parental

educational production function. Moreover, although there is no other suitable data set with

residential choices.11 This could be because effectiveness is swamped by the peer group in

nearly the coverage of the SAT sample, the basic conclusions are supported by models

parental preferences or because it is difficult to observe directly. In either case,

estimated both on administrative data measuring high school completion rates and on the

administrators who pursue unproductive policies are unlikely to be disciplined by parental

National Education Longitudinal Study (NELS) sample.

exit and Tiebout choice can create only weak incentives for productive school management.

This result calls the incentive effects of Tiebout choice into question, as it indicates
that administrators of effective schools are no more likely to be rewarded with high demand

1.2.

for local housing in high-choice than in low-choice markets. To explore this further, I
estimate models for the effect of Tiebout choice on mean scores across metropolitan areas.

Tiebout Sorting and the Role of Peer Groups: Intuition
In this section I describe the Tiebout choice process and its observable implications

in the context of a very simple educational technology with peer effects. Let
t ij = x ij β + x j γ + µ j + ε ij

Consistent with the earlier results, I find no evidence that high-choice markets produce
higher average SAT scores. Together with the within-market estimates, this calls into
question Hoxby’s (1999a, 2000a) conclusion that Tiebout choice induces higher productivity

from school administrators.10
There are three plausible explanations for the pattern of findings presented here.

(1)

be a reduced-form representation of the production function, where t ij is the test score (or
other outcome measure) of student i when he or she attends school j ; x ij is an index of the
student’s background characteristics; x j is the average background index among students at

First, it may be that school and district policies are not responsible for a large share of the

school j ; and µ j —which need not be orthogonal to x j —measures the “effectiveness” of

extant across-school variation in student performance. We would not then expect to

school j, its policies and practices that contribute to student performance.12

observe effectiveness sorting, regardless of its extent, in the distribution of student SAT
scores. Second, the number of school districts may not capture variation in parents’ ability
In fact, the main empirical approach cannot well distinguish between the case where parents value
effectiveness to the exclusion of all else and that where they ignore effectiveness entirely, as in either case
effectiveness sorting may not depend on the market structure. The former hypothesis seems implausible on
prior grounds, however.
12 In the empirical application in Section 1.5, I allow for more general technologies in which the effects of
individual or peer characteristics are arbitrarily nonlinear or higher moments of the peer group distribution
enter the production function. The key assumption is that all families agree on the relative importance of
peer group and school effectiveness. This rules out some forms of interactions between x ij and ( x j , µ j )
11

to exercise Tiebout choice. Results presented in Section 1.4.2 offer suggestive evidence

against this interpretation, but do not rule it out. A final explanation is that effectiveness
10

Hoxby (2000a) argues that market structure is endogenous to school quality. Instrumenting for it and using
relatively sparse data from the NELS and the National Longitudinal Survey of Youth, she finds a positive
effect of choice on mean scores across markets. I discuss the endogeneity issue in Appendix B, and consider
several instrumentation strategies. As none indicate substantial bias in OLS results, the main discussion here
treats market structure as exogenous. Chapter 2 investigates Hoxby’s results in greater detail.

in (1). The assumption of similar preference structures is common in studies of consumer demand, and in
particular underlies both the multicommunity and hedonic literatures. If it is violated, of course, the
motivating question of whether parents prefer good principals or good peers is not well posed.

9

In view of the vast literature documenting the important role of family background

10

unions.14 It is worth noting that the relative magnitude of µ j may be quite modest. Family

characteristics—e.g. ethnicity, parental income and education—in student achievement

background variables typically explain the vast majority of the differences in average student

(Coleman et al., 1966; Phillips et al., 1998; Bowen and Bok, 1998), I assume that x ij is

test scores across schools, potentially leaving relatively little room for efficiency (or school

positively correlated with willingness-to-pay for educational quality. In the empirical analysis

“value added”) effects.15 Nevertheless, most observers believe that public school efficiency

below, I also estimate specifications that allow willingness-to-pay to depend on family

is important, that it exerts a non-trivial role on the educational outcomes of students, and

income while other characteristics have direct effects on student achievement.

that it varies substantially across schools.

Since model (1) excludes school resources, the term x j γ potentially captures both

The potential efficiency-enhancing effects of increased Tiebout choice operate

conventional peer group effects and other indirect effects associated with the family

through the assumption that parents prefer schools with µ j -promoting policies. To the

background characteristics of students at school j . For example, wealthy parents may be

extent that this is true, Tiebout choice induces a positive correlation between µ j and x j ,

more likely to volunteer in their children’s schools, or to vote for increased tax rates to

since high- x i families will outbid lower- x i families for homes near the most preferred

support education. They may also be more effective at exerting “voice” to manage agent
behavior, even without the exit option that school choice policies provide (Hirschman,

schools. Thus, active Tiebout choice can magnify the apparent impact of peer groups on
student outcomes in analyses that neglect administrative quality. Formally,

1970). Finally, student composition may operate as an employment amenity for teachers and

[

teachers that can be hired for any fixed salary (Antos and Rosen, 1975).13

]

[

]

E t j |x j = x j (β + γ ) + E µ j |x j ,

administrators, reducing the salaries that the school must pay and increasing the quality of

(2)

or, simplifying to a linear projection,

[

]

E * t j |x j = x j (β + γ + θ * ),

The effectiveness parameter in (1), µ j , encompasses the effects of any differences

(3)

across schools that do not depend on the characteristics of students that they enroll. It may
include, for example, the ability and effort levels of local administrators, their choice of
curricula, or their effectiveness in resisting the demands of bureaucrats and teacher’s

13

The distinction between direct and indirect effects of school composition is not always clear in discussions of
peer effects. Studies that use transitory within-school variation in the composition of the peer group (Hoxby,
2000b; Angrist and Lang, 2002; Hanushek, Kain, and Rivkin, 2001) likely estimate only the direct peer effect,
while those that use the assignment of students to schools (Evans, Oates, and Schwab, 1992; Katz, Kling, and
Liebman, 2001) likely estimate something closer to the full reduced-form effect of school composition.

11

More precisely, ability and effort of school personnel is included in µ only to the extent that a good peer
group does not enable a school to bid the best employees away from low- x schools. A wealthy, involved
population may not ensure high-quality, high-effort staff if agency problems produce district hiring policies
that do not reflect parents’ preferences (Chubb and Moe, 1990), or if it is difficult to enforce contracts over
unobservable components of administrator actions (Hoxby, 1999b).
15 In the SAT data used here, a regression of school mean scores on average student characteristics has an R2 of
0.74. The correlation is substantially stronger in California’s school accountability data (Technical Design
Group, 2000). Of course, these raw correlations may overstate the causal importance of peer group if there is
effectiveness sorting.
14

12

(

)

( )

where θ * ≡ cov x j , µ j var x j represents the degree of effectiveness sorting in the local

concerned only with school effectiveness, high- µ schools attract high- x families regardless

market. (For notational simplicity, I neglect the intercept in both test scores and school

of the market structure, and θ * need not vary with local competition. Similarly, when

effectiveness.) The stronger are parental preferences for effective schools (relative to

parental concern for peer group is large enough, even in highly competitive markets high- x

schools with other desired attributes), the more actively will high- x i families seek out

families are not drawn to high- µ schools, and again θ * is largely independent of market

neighborhoods in effective districts, and the larger will θ * tend to be in Tiebout equilibrium.
The weaker are parental preferences for µ j relative to other factors, the smaller will θ

structure.
This idea forms the basis of my empirical strategy. In essence, I compare the sorting

*

parameter θ * in equation (3) across metropolitan housing markets with greater and lesser

tend to be.
Importantly, one would expect the degree of local competition in public schooling

[

degrees of residential school choice. Let θ = θ (c , δ ) = E θ * |c , δ

]

be the average

(i.e. the number of school districts in the local area among which parents can choose) to

effectiveness sorting of markets characterized by the parameters c and δ , where c is the

affect the magnitude of θ * whenever parents care both about peer groups and school

degree of jurisdictional competition (i.e. the number of competing districts from which

effectiveness. The reasoning is simple: If there are only a small number of local districts and

parents can choose, adjusted for their relative sizes) and δ is the importance that parents

parents value the peer group, they may be “stuck” with a high- x /low- µ school, even in

place on peer group relative to effectiveness.17 The argument above, supported by the

housing market equilibrium, by their unwillingness to sacrifice peer group in a move to a

theoretical model developed in the next section, predicts that ∂θ ∂c > 0 for moderate values

more effective school district. These coordination failures are less likely in markets with

of δ but that ∂θ ∂c = 0 when δ is zero or large (i.e. when parents care only about

more interjurisdictional competition, as in these markets there are always alternative districts
that are relatively similar in the peer group offered, and parents are able to select effective
schools without paying a steep price in reduced peer quality.16
When parental concern for peer group is moderate, then, a high degree of public
school choice is needed to ensure that high- µ schools attract high- x families, and θ * tends
to be larger in high-choice than in low-choice markets. On the other hand, when parents are
16

effectiveness or only about peer group). To the extent that θ tends to increase with choice,
then, we can infer that parents’ peer group preferences are small enough to prevent a
breakdown in high-choice markets of the sorting mechanism that rewards high- µ
administrators with high- x students. On the other hand, if θ is no larger in high-choice

17

θ ( c , δ ) is treated as a random variable, as there can be multiple equilibria in these markets. My empirical
strategy assumes that δ is constant across markets, and that a sample of markets with the same c parameter
*

will trace out the distribution of θ . An equilibrium selection model in which families could somehow
coordinate on the most efficient equilibrium would violate this assumption.
*

In the high choice limit, this is analogous to Hoxby’s (1999b) model of choice among schools with identical
peers.

13

14

than in low-choice cities it is more difficult to draw inferences about parental valuations,

My model is a much simplified version of so-called “multicommunity” models. I

which may be characterized either by very small or very large δ . In either case, however, we

maintain the usual assumptions that the number of communities is fixed and finite, and that

can expect little effect of expansions of Tiebout choice on school efficiency, as in the former

access to desirable communities is rationed through the real estate market.19 There is no

even markets with only a few districts can provide market discipline and in the latter no

private sector that would de-link school quality from residential location. Although some

plausible amount of governmental fragmentation will create efficiency-enhancing incentives

authors (i.e. Epple and Zelenitz, 1981) include a supply side of the housing market, I assume

for school administrators.

that communities are endowed with perfectly inelastic stocks of identical houses. 20
Communities differ in three dimensions: The average income of their residents and the

1.3.

A Model of Tiebout Sorting on Exogenous Community Attributes

rental price of housing, both endogenous, and the effectiveness of the local schools.21
An important omission is of all non-school exogenous amenities like beaches, parks,

In this section, I build a formal model of the Tiebout sorting process described
above. As my interest is in the demand side of the market under full information, I treat the

views, and air quality. I develop here a “best case” for Tiebout choice, where schools are the

distribution of school effectiveness as exogenous and known to all market participants.18 I

only factors in neighborhood desirability. Amenities could either increase or reduce the

demonstrate that Tiebout equilibrium must be stratified as much as the market structure

extent of effectiveness sorting relative to this pure case, though the latter seems more likely.22

allows: Wealthy families always attend schools that are preferred to those attended by low-

If, as the hedonics literature implies, schools are one of the more important determinants of

income families. There can be multiple equilibria, however, and the allocation of effective

neighborhood desirability (see, e.g., Reback, 2001; Bogart and Cromwell, 2000; Figlio and

schools is not uniquely determined by the model’s parameters. Conventional comparative

19

This does not rule out administrative responses to the incentives created by parental choices, as these are a
higher order phenomenon, deriving from competition among schools to attract students rather than from
reactions of school administrators to the realized desirability of their schools. My discussion presumes,
however, that competition does not serve to reduce variation in school effectiveness.

Where most models incorporate within-community voting processes for public good provision (Fernandez
and Rogerson, 1996; Epple and Romano 1996; Epple, Filimon and Romer, 1993), income redistribution
(Epple and Romer, 1991; Epple and Platt, 1998), or zoning rules (Fernandez and Rogerson, 1997; Hamilton,
1975), I simply allow for preferences over the mean income of one’s neighbors. These preferences might
derive either from the effects of community composition on voting outcomes or from reduced-form peer
effects in education.
20 Tiebout equilibria must evolve quickly to provide discipline to school administrators, whose careers are much
shorter than the lifespan of houses. Inelastic supply is probably realistic in the short term, except possibly at
the urban fringe. Nechyba (1997) points out that it is much easier to establish existence of equilibrium with
fixed supply.
21 The inclusion of any exogenous component of community desirability is not standard in multicommunity
models, which, beginning with Tiebout’s (1956) seminal paper, have typically treated communities as ex ante
interchangeable. This leaves no room for managerial effort or quality except as a deterministic function of
community composition, so is inappropriate for analyses of the incentives that the threat of mobility creates
for public-sector administrators.
22 Amenities might draw wealthy families to low-peer-group districts, improving those districts’ peer groups
and reducing the costs borne by other families living there. This could increase effectiveness sorting,
although the effect would be weakened if there were a private school sector. Offsetting this, amenities might
also prevent families from exiting localities with ineffective schools, reducing effectiveness sorting just as
does concern for peer group.

15

16

statics analysis is not meaningful when equilibrium is non-unique, as the parental valuation
parameter affects the set of possible equilibria rather than altering a particular equilibrium.
To better understand the relationships between parental valuations, market concentration,
and the equilibrium allocation, the formal exposition of the model is followed by simulations
of markets under illustrative parameter values.

18

Lucas, 2000; Black, 1999), the existence of relatively unimportant amenities should not much

jurisdiction j is U ij = U ( x i − h j , x j δ + µ j ) , where U is twice differentiable everywhere with

alter the trends identified here.

U 1 and U 2 both positive.25 I make the usual assumption about the utility function:

Turning to the formal exposition, assume that a local housing market—a

Single Crossing Property: U 12U1 − U 11U 2 > 0 everywhere.

metropolitan area—contains a finite number of jurisdictions, J, and a population of N

Single crossing ensures that if any family prefers one school quality-price

families, N >> J . Each jurisdiction, indexed by j, contains n identical houses and is
endowed with an exogenous effectiveness parameter, µ j . No two jurisdictions have

combination to another with lower quality—where quality is q j ≡ x j δ + µ j —all higherincome families do as well; if any family prefers a district to another offering higher quality

identical effectiveness.
Each family must rent a house. There are enough houses to go around but not so
many that there can be empty communities: n( J − 1) < N < nJ .

23

All homes are owned by

absentee landlords, perhaps a previous generation of parents, who have no current use for

education, all lower-income families do also. (This is proved in Appendix D.) As in other
multicommunity models, the single crossing assumption drives the stratification results
outlined below.
Market equilibrium is defined as a set of housing prices and a rule assigning families

them. These owners will rent for any nonnegative price, although they will charge positive
prices if the market will support them. There is no possibility for collusion among landlords.
Housing supply in each community is thus perfectly inelastic: In quantity-price space, it is a

to districts on the basis of their income that is consistent with individual family preferences,
taking all other families’ decisions as fixed:

vertical line extending upward from (n , 0 ) .

{

}

Definition: An equilibrium for a market defined by δ ; J ; µ1 , K , µ J ; and F

{

consists of a set of nonnegative housing prices h1 , K , h J

Family i ’s exogenous income is x i > 0 ; the income distribution is bounded and has

}

and an allocation rule

G : R + a Z J that satisfy the following conditions (where

distribution function F, with F ' ( x ) > 0 whenever 0 < F ( x ) < 1 .24 Families derive utility

x j ≡ ∫ 1 (G( x ) = j )x dF( x )

from school quality and from numeraire consumption, and take community composition
EQ1

and housing prices as given. Let x j denote the mean income of families in community j,

EQ2

∫ 1(G( x ) = j )dF( x ) ):

No district is over-full. For each j, ∫ 1 (G( x ) = j )dF ( x ) ≤ n N .

Nash equilibrium. At the specified prices and with the current distribution of
peer groups, no family would prefer a district other than the one to which it

and let h j be the rental price of local housing. The utility that family i would obtain in
The model is a “musical chairs” game, and the upper constraint serves to tie prices down, while the lower
constraint avoids the need to define the peer group offered by a community with no residents.
24 Of course, the income distribution cannot be continuous for finite N. Relaxing the treatment to allow a
discrete distribution would add notational complexity and introduce some indeterminacy in equilibrium
housing prices, but would not change the basic sorting results.
23

25

I might allow U ij = U ( x i − h j , Q ( x j , µ j )) , with Q1 ≥ 0 and Q 2 > 0 , without changing the basic
results; δ then corresponds to Q1 Q 2 . The key assumption is that all families share the same U and Q
functions, with all differences in their behavior resulting from differences in their budget constraints (i.e.
from x i ).

17

18

(

)

is assigned: U x i − hG( x i ) , x G( x i )δ + µG( x i ) ≥ U (x i − hk , x kδ + µk ) for all i

Note that Theorem 2 does not rule out equilibria in which some families live in

EQ3

and all k.
Normalization of housing prices. h j = 0 whenever

lower- µ than do some higher-income families. I refer to these as unsorted (or imperfectly

∫ 1(G( x ) = j )dF ( x ) < n N

sorted) equilibria. They arise when the peer group advantage of high-income communities

EQ4

No ties in realized quality. For any j, k, x j δ + µ j ≠ x kδ + µk .26

over low-income communities is large enough to overcome deficits in school effectiveness.28

The following results are proved in Appendix D:

For fixed income and effectiveness distributions, unsorted equilibria become harder to

Theorem 1. Equilibrium exists.

maintain as the weight that families place on peer group relative to school quality falls:

Theorem 2. Any equilibrium is perfectly stratified, in the sense that no family lives

Corollary 2.3. Let G be an assignment rule satisfying Corollary 2.1 under which

in a higher-quality, higher-price, or higher-peer-group district than does any higher

there exist communities j and k satisfying µ j < µk but x j > x k . Then for

income family.

C ≡ max

x k
µk − µ j
x j − xk

>0,

Corollary 2.1. In any equilibrium, the n families with incomes greater than
i.

F −1 (1 − n N ) live in the same community, which has higher quality ( xδ + µ ) than

(

housing prices with which G is an equilibrium).

)

any other. The next n families, with incomes in F −1 (1 − 2n N ), F −1 (1 − n N ) , live in
the community ranked second in quality. This continues down the distribution: For

( ( {

each j ≤ J , the families with incomes in F −1 max 1 −

jn

Whenever δ > C , G is an equilibrium allocation (i.e. there exist

}), F −1 (1 − ( j −1)n N ))

N ,0

ii.

Whenever δ < C , G is not an equilibrium allocation.

iii.

If δ = C , G can satisfy requirements EQ1-EQ3 for equilibrium, but
violates EQ4.

live in the community with the j th ranked schools.27
I do not present formal results on the implications of increases in J for effectiveness
Corollary 2.2. If δ = 0 , equilibrium is unique.
Condition EQ4 corresponds to the “stability” notion of Fernandez and Rogerson (1996; 1997).
Arrangements that satisfy EQ1 through EQ3 but not EQ4 are unstable, and perturbations in one of the tied
communities’ effectiveness or peer group would lead to non-negligible differences between the communities
as families adjust. With EQ4, equilibria are locally stable.
27 I neglect families precisely at the boundary between income bins (i.e. those with incomes satisfying
26

sorting, as much depends on the µ j ’s assigned to the new districts. Informally, however,
Corollary 2.3 suggests that for a stable µ distribution, increasing the number of districts

F (x ) = 1 − N for some j). I demonstrate in the Appendix that families at boundary points are
indifferent between the two communities in equilibrium. As the income distribution approaches continuity,
the potential importance of boundary families declines to zero.

It need not be true that unsorted equilibria are less efficient than the perfectly sorted equilibrium: If the
marginal utility of school quality declines quickly enough, it can be more efficient to assign effective schools
to low-income bins than to the wealthiest students. In any case, concern for peer group amounts to an
externality, and there is no assurance that the efficient assignment of families to districts is an equilibrium at
all. It may be efficient to have heterogeneous income distributions at each school, for example, but this is
never a decentralized equilibrium.

19

20

jn

28

constrains the possibility of unsorted equilibria: With more districts, the distance between

another house-district take their “peer group” with them. Regardless of parental valuations,

the average incomes of districts that are adjacent in the quality distribution is smaller. As C

then, families always prefer a high- µ house to one with lower µ . Because willingness-to-

depends on this distance, a higher J reduces the amount by which a low-income district’s

pay for a preferred school is increasing in x, equilibrium is unique, with the ranking of

effectiveness parameter can exceed that of the next-wealthier district before the wealthier

districts by effectiveness is identical to that by the income of the resident family. Panels A

families will bid away houses in the more effective district.

and B of Figure 1.1 graph the equilibrium allocations of effectiveness ( µ j ) and district

This tendency is at the core of my empirical strategy. To clarify it, I present next to a
simulation exercise that demonstrates the impact of market structure (J) on effectiveness
sorting under different assumptions about the importance of peer group to parental

desirability ( x j δ + µ j ) as functions of family income when parents have no concern for
peer group ( δ = 0 , Panel A) and when concern for peer group is moderate ( δ = 1.5 , Panel
B).

preferences (δ), and thus about the “stickiness” of residential assignments. I begin by
describing the allocation of effectiveness in illustrative equilibria, then describe the
simulation and its results. Finally, at the end of this section I return to the basic model to
discuss its allocative implications and the likely effects of endogenizing school effectiveness.
1.3.1. Graphical illustration of market equilibrium

The competitive case serves as a baseline, but it is not a realistic description of choice
in the presence of peer group externalities. I next consider a market with ten equally-sized

districts, a degree of Tiebout choice that, as is discussed below in Section 1.4, corresponds
roughly to the 80th percentile U.S. metropolitan area. Assume that J = 10 , n = N 10 , and

µj =

From Theorem 2 and its corollaries, the income distribution in any equilibrium is
divided into J quantiles, with wealthier quantiles living in more preferred—higher

x j δ + µ j —districts. In Appendix D, I show that this necessary condition is also sufficient

j

10 ,

j = 1, K , 10 . Panel C of Figure 1.1 displays the unique, perfectly sorted

equilibrium when δ = 0 . Families in the j th decile of the income distribution live in the
district with the j th most effective schools.
When parental concern for peer group is introduced, the perfectly sorted equilibrium

for an assignment rule to be an equilibrium allocation. Here, I use these results to construct
possible equilibria under different (δ , J ) combinations.

is no longer unique. It is now possible for ineffective districts to retain wealthy peer groups
in equilibrium, as long as they are not so ineffective that families would prefer a lower- x ,

It is helpful to begin by considering a Tiebout market that approximates perfect
competition. Assume that there are as many districts as there are families, with only a single
house in each district, and suppose that both family income and school effectiveness are
uniformly distributed on [0, 1]. There is no peer group externality, as families that move to

higher- µ district. One imperfectly sorted equilibrium is displayed in Panel D. Note that
district desirability is monotonically increasing in district average income, as Theorem 2
requires that the desirability and income rankings be identical in equilibrium. Effectiveness
is not monotonic in family income, however: Some families live in districts that are less

21

22

effective than those where some poorer families live. Effectiveness sorting nevertheless

peer characteristics to student performance by one. In the imperfectly sorted markets

remains substantial, and effectiveness is highly correlated with peer group average income.

displayed in Panels D and F, however, the magnification effect is smaller: θ * = 0.9 in D and

Finally, we consider the case where the housing market gives parents few options,
with only three equally-sized districts ( J = 3 , n = N 3 ). This corresponds roughly to the
40th percentile of the U.S. distribution. Suppose here that µ j =

j

3

, j = 1, 2, 3 . When there

are no peer effects (Panel E), equilibrium is again unique and is perfectly sorted on

0.5 in F. The simulations below suggest that this tendency for effectiveness sorting and
magnification to depend on the number of districts when parents care about both peer
group and effectiveness holds generally, as long as concern for peer group ( δ ) is moderate.
When δ is large, however, even markets with many districts can have unsorted equilibria,

[

and there is no tendency for E θ * |δ , J

effectiveness.
When we add concern for peer group to the three-district market, there is

]

to increase with J, at least in the ranges considered

here.29

substantially more potential for mis-sortings than even in the ten-district case. The gap in
peer quality between adjacent districts has grown substantially, and families therefore require
a much larger µ return to justify a move from one district to another whose current
residents are lower in the x distribution. Indeed, with the parameter values used here, there

is no allocation of x terciles to districts in which any family would willingly move to a lower-

x district; all six of the possible permutations are equilibria. Panel F illustrates one
possibility. Here, the most effective district is rewarded with the wealthiest students, but the

1.3.2. Simulation of expanding choice
In this subsection, I describe simulations of a hypothetical regional economy under

several combinations of (δ , J ) . As δ grows, the relative importance of school effectiveness
diminishes and the likelihood of unsorted equilibria expands. By the logic above, for any
fixed δ we might expect unsorted equilibria to be less prominent with many districts than
with few.
Where Figure 1.1 used uniform, nonstochastic distributions for both income and

two remaining districts are mis-sorted.
Recall equation (3), which suggested that a naïve estimate of the peer effect is
magnified by effectiveness sorting, with the degree of magnification being

(

) ( )

θ * ≡ cov x j , µ j var x j , the coefficient from a regression of µ j on x j across all
*

districts in the market. θ = 1 in the perfectly sorted markets displayed in Panels A, B, C,
and E of Figure 1.1, indicating that the slope of school-level average test scores with respect
to student characteristics in these markets will overstate the contribution of individual and
23

effectiveness, here I adopt the slightly more realistic assumption that income has a normal
distribution and I draw random effectiveness parameters from the same distribution.30 For

For any δ , there is some J for which effectiveness sorting will increase: The perfectly competitive case in
Panels A and B would be perfectly sorted for any δ . I simulate only markets with J ≤ 10 —the
computational burden increases with the factorial of J—though this is easily enough to reveal the general
trend.
30 Analysis of varying δ subsumes the variance of the µ ’s: Increased variation in school effectiveness is

j
29

equivalent, for the purpose of the sorting process, to increased parental valuation of a district with high
effectiveness relative to one with a desirable peer group (i.e. to a reduction in δ ). A normal (rather than log-

24

each market type, I conducted 5,000 draws, first choosing effectiveness parameters for each

understand the magnification bias in naïve estimates of the peer effect like (3). For each

district and then permuting the assignment of income bins to districts until I obtained an

(δ , J ) combination, I estimated a regression of µ j

equilibrium assignment (i.e. one in which no low-income district was preferable to any high-

from actual data in Section 1.5, pooling all 5,000 simulated markets and including a fixed

income district).31
Figure 1.2 displays the average allocation of school effectiveness in markets with
three and ten equally-sized districts. Panel A depicts the case where parents are unconcerned
about the peer group, as in the left-hand panels of Figure 1.1. Here, families must be
perfectly sorted on school effectiveness in equilibrium, and the average µ ’s depicted in the
figure are simply order statistics from the standard normal distribution. The remaining
panels show progressively higher valuations for the peer group: δ = 0.5, 1.5, and 3 . As δ
grows, progressively less complete sortings become equilibria and average µ j values
collapse toward the overall mean.32 Moreover, the collapse happens more quickly for threedistrict markets than for those with ten districts. This means that when δ is moderate in

on x j analogous to those estimated

(

) ( )

effect for each. The resulting estimates of θ (δ , J ) = cov x j , µ j var x j

are displayed in

Figure 1.3. The trends identified in Figures 1.1 and 1.2 are again clear. First, θ is well above
zero when δ is small, indicating that the residential housing market mechanism rewards
administrators of effective schools with the wealthiest students when parents primarily assess
schools by their effectiveness. When δ is large, θ is close to zero for all J, as no district
structure creates the desired rewards when parents are largely unconcerned with school
effectiveness.
The moderate δ case is the most interesting. Here, we observe more perfect
sorting on µ —and therefore larger slopes of µ with respect to x j —when there are many

Panel C, the gradient of school effectiveness with respect to family income is steeper for

districts than when there are few. That is, ∂θ ∂J > 0 for moderate δ .33 If both peer group

J = 10 than for J = 3 . As δ grows, however, Panel D indicates that the differences

and school effectiveness are important to parents, then, the Tiebout mechanism rewards

between the two sorts of markets shrink toward zero.

effective administrators only when there are many districts. Model (3) suggests that in this

It is clear from Figure 1.2 that effectiveness sorting tends to increase with δ and, for
moderate values like that shown in Panel C, with J. The simulation results can be used to

case the test score gap between high- and low-income schools will tend to be larger in
markets with a great deal of interdistrict competition than in those with less Tiebout choice.
I test for this in the empirical analysis below.

normal) income distribution was chosen to avoid complications from the log-normal distribution’s skew, and
because the x index that I use in the empirical analysis is approximately normally distributed.
31 This strategy treats all possible equilibria as equally likely. It might be more realistic to attach higher
probability to equilibria that are attracting points for larger ranges of initial assignments under some
adjustment process, but this is left for future work.
32 The nonmonotonicity of the δ = 3, J = 10 case arises because parental valuations depend on average
income rather than on the average income rank; peer group differences between income quantiles are thus
larger near the tails. This is not central to the analysis.

33

Figure 1.3 reveals a small effect of Tiebout choice on the effectiveness gradient even when δ = 0 , but this is
sensitive to the simulation assumptions (in particular, to the distribution of effectiveness as the number of
districts grows). The simulations for positive δ —in which equilibrium need not be unique, so that averages
are determined both by the distribution of effectiveness and by the set of equilibria—are much less sensitive.

25

1.3.3. Allocative implications and endogenous school effectiveness
In the model presented above, Tiebout choice hurts low-income students in two
ways. First, it permits increased stratification of students. Because total peer group is in

26

effectiveness across markets . Then, in Section 1.6, I examine the distribution of average test
scores across markets, looking for evidence that interdistrict competition leads to increases
in the average effectiveness of local administrators.

fixed supply, stratification necessarily offers better peers to wealthy students and worse peers
to low-income students. Second, if the market mechanism functions and families sort on

1.4.

effectiveness, it assigns low-income students to schools that are below-average in their

Data
My test of parental valuations requires data describing the distribution of peer groups

effectiveness. This is an unavoidable effect of the Tiebout mechanism, as the flip side of

and outcomes across schools within housing markets that differ in the amount of Tiebout

rewarding effective schools with wealthy students is punishing poor students with relatively

choice. I describe first my measure of market structure, defined over district-level

ineffective schools.

enrollment. I then present evidence that this measure represents a binding constraint on

The model stacks the deck, however, by holding the distribution of effectiveness

fixed. If school administrators respond to incentives, effectiveness sorting will also induce

parents’ ability to exercise Tiebout choice. Finally, I discuss the SAT data that are the
primary source of information on student outcomes across schools.

higher effort and greater effectiveness. This will tend to raise scores for everyone, and the
1.4.1. Measuring market concentration
productivity benefits may offset the allocative costs that Tiebout choice imposes on poor
I define local housing markets as Metropolitan Statistical Areas (MSAs), Census

students.34

Bureau approximations of local housing markets defined by observed commuting patterns.35

My empirical analysis thus has two components. In Section 1.5, I look for evidence
The SAT data that I use to measure student outcomes are taken from the early 1990s.
that effectiveness sorting is more complete in high-choice than in low-choice markets, as the
Consequently, I use 1990 MSA definitions and draw demographic characteristics of each
simulations above suggest it should be if parental valuations attach substantial weight to
MSA from the 1990 Census.
school effectiveness. In that section, I identify effectiveness sorting from the distribution of
student performance within markets, using fixed effects to absorb any differences in average
35
34

There is great need for a model of the supply side of Tiebout choice markets that describes the distribution
of administrators’ responses to incentives. Does competition force the worst districts to catch up to the
average, induce the best districts to pull away from the average, or lead all districts to improve effectiveness
equally? A Mirrlees-type argument suggests that the first is unlikely without market entry, as a district that
enrolls the lowest-income students faces little sanction for further reductions in effectiveness. If this

intuition holds, administrative responses would not offset the inequality-increasing effects of Tiebout choice
identified here.

27

The Census Bureau classifies the largest urbanizations as Consolidated MSAs (CMSAs), and subdivides them
into several component parts, Primary MSAs (PMSAs). I treat several PMSAs within a larger area as distinct
markets, reasoning that a move from, for example, Riverside to Ventura—both cities within the Los Angeles
CMSA, but separated by about 125 miles—is more akin to a migration across metropolitan areas than to a
within-market move. Most MSAs and PMSAs are defined along county boundaries; in New England, where
town boundaries define MSAs, I use the alternative—and slightly larger—New England County Metropolitan
Areas. For reasons of data availability and comparability, the Honolulu and Anchorage MSAs are excluded
from all analyses.

28

MSAs differ substantially in their educational governance structures. While the
median MSA has 9 school districts, there are 25 markets with only a single district each.

and 0.82, respectively); four-fifths of MSAs are concentrated and three-fifths highly
concentrated by these definitions.

(Thirteen of these—including Miami and Fort Lauderdale, by far the largest—are in Florida,

Table 1.1 displays summary statistics for several metropolitan-level demographic

which has large counties and only one district per county.) Boston, with 132 districts,

measures, calculated from county-level tabulations of the 1990 Decennial Census (from the

represents the other extreme; seventeen additional markets have fifty districts or more.36

STF-3C file) aggregated to the MSA level. Means of each variable are presented both for the

The raw count of districts is a crude measure of market concentration, as it does not

full sample of 318 MSAs and within each quartile of the choice distribution. There are

distinguish between the New York PMSA, where the three largest districts have 87 percent

substantial differences across quartiles: Low-choice markets tend to be located in the South,

of enrollment and the remaining 53 districts combine for 13 percent, and the Dallas PMSA,

to be smaller, and to have more Blacks and Hispanics. They are also more likely to be

with the same number of districts but only 44 percent of enrollment in the three largest.

located in states with “Minimum Foundation Plan” financing schemes, a mechanism used by

Following Hoxby (2000a), I calculate a more appropriate index of Tiebout choice as one

37 states to reduce inequality in school resources.37

minus the Herfindahl Index, a concentration measure used by the Federal Trade

1.4.2. Does district structure matter to school-level choice?

Commission (FTC) in antitrust deliberations and defined as the sum of firms’ squared

market shares. Districts’ “market shares” are their enrollments in grades 9-12 divided by the
total over all public school districts in the MSA, calculated using data from the 1990
Common Core of Data (CCD), an annual census of public schools and districts. Letting n jm
be the relevant enrollment of district j in market m and N m the total relevant enrollment in

(

the market, the choice index is c m ≡ 1 − ∑ j n jm N m

)

2

Figure 1.4 displays the index’s distribution. Nearly all U.S. markets are highly

Most of the existing literature, while recognizing that there is heterogeneity across
schools within any given school district, has assumed that public school districts are the
relevant units that compete for students in a Tiebout choice framework (Borland and
Howsen, 1992; Hoxby, 2000a). There are two main reasons for this. First, any local tax and
spending decisions are made at the district level, and this is also where many key education
policies (curriculum, teacher pay scales, etc.) are set. Second, for reasons relating to the
jurisprudence of school desegregation and to mechanisms like “open enrollment” and

concentrated by private market standards: Vertical lines on the figure indicate the FTC’s

magnet schools, there are not always stable, well-defined catchment areas within districts that

thresholds for “concentrated” and “highly concentrated” markets (choice indices below 0.9

link neighborhoods to individual schools, so residential location may not be an important

36

All district counts and enrollment figures are calculated for grades 9-12 only (Urquiola, 1999).

37

Categorizations of state finance plans as of the early 1990s are drawn from Card and Payne (2002).

29

30

determinant of within-district school assignment.38 Nevertheless, many districts limit the

exercise residential choice, we should expect greater stratification by family income in

ability of parents to choose from among the schools in the district except by their location

markets with high district choice indices than in markets with more concentrated school

decisions, and even when parents can choose distance is often a major factor. Thus, Tiebout

governance, and this effect should be robust to the inclusion the school-level choice index.39

choice may operate across neighborhood schools within a large district as well as across

Table 1.2 presents evidence on the relationship between the district-level choice index and

districts. To the extent that peer groups and school-level policies, rather than policies set at

two measures of within-MSA stratification, based on the distribution of household income

the district level, are the primary objects of parental choice, neighborhood sorting within

across districts and the racial composition of schools.

school districts may be a relatively effective form of choice.
In view of this possibility, it is important to ask whether inter-district competition

The first three columns present regression models for the across-district share of
variance of household income, calculated separately for each MSA with at least two

matters to the way that students are assigned to neighborhoods and schools in Tiebout

districts.40 All three models include as explanatory variables the district-level choice index,

equilibrium. Panel B of 1.1 displays measures of the extent of school-level choice by quartile

fixed effects for nine Census-defined geographic divisions, and controls for several MSA-

of the district-level index. MSAs with more district-level choice have more schools, on

level variables that might have independent effects on measured sorting. The second

average, than do low-choice MSAs, but this is largely a function of population; average

column adds to these a control for the school-level choice index, while the third column also

school size is only weakly correlated with district-level choice. Nevertheless, a school-level

controls for several measures of census-tract-level stratification.41 All three estimates

choice index is strongly positively correlated with the district-level index: In MSAs in the

indicate a strong relationship between district-level choice and income stratification across

lowest quartile of district choice, the average school-level index is 0.82, versus 0.96 in MSAs

districts.

in the highest district-level quartile. This relationship is robust to controls for the

There may be a mechanical relationship, however, between measures of across--

demographic characteristics shown in Panel A of Table 1.1, although I do not report the

district sorting and the district structure. To see this, note that areas with more districts—

regression model here.

conditional on market size—necessarily have smaller districts, and random distribution of

The multicommunity model developed above, in which families stratify across

On desegregation remedies, see Welch and Light (1987), Orfield (1983), and Milliken v. Bradley 418 U.S. 717,
1974.

Eberts and Gronberg (1981) and Epple and Sieg (1999) propose similar stratification tests of Tiebout-style
models.

District-level income distributions are drawn from the School District Data Book (SDDB), a tabulation of
1990 Census data at the school district level. I am grateful to Cecilia Rouse for providing access to the
SDDB data.
41 Tract-level data come from the 1990 Census STF-3A files. Census tracts are much smaller than school
districts, with 4,000 residents on average. Tiebout models do not speak to within-jurisdiction sorting, and
invariance of the choice coefficient to tract-level controls offers reassurance that the relationships observed in
Table 1.2 do not derive from a spurious correlation between district structure and MSA residents’ tastes for
micro-neighborhood segregation.

31

32

jurisdictions, suggests a useful test of the hypothesis that district boundaries are important
constraints on the Tiebout choice process. If school districts are a unit over which parents

38

39
40

populations would produce higher measures of segregation across these smaller areas. To

1.4.3. SAT data

avoid the bias that this produces, one would ideally estimate the same regressions for

Neither of the most commonly used datasets with observations on student

measures of across-school stratification. Unfortunately, income data are not available at the

outcomes, the National Education Longitudinal Study (NELS) and the National

school level. Instead, I use data on the racial composition of each school, collected in both

Longitudinal Survey of Youth (NLSY), is suitable for my analysis of the distribution of

the CCD and the Private School Survey (PSS; National Center for Education Statistics,

student outcomes across schools within each MSA. The NELS uses a multi-stage sampling

2000), a census of private schools. I compute from these data a dissimilarity index (Cutler,

procedure and draws data from only three schools in the average MSA.43 The NLSY uses a

Glaeser, and Vigdor, 1999) based on the distribution of white and non-white students across

neighborhood-based sampling design, so may include more schools, but students cannot be

both public and private schools in each MSA.42 Columns D, E, and F of Table 1.2 report

matched to the schools that they attended and in any case are not representative of those

models using this dissimilarity index as the dependent variable. Again, the coefficient on the

schools.

district-level choice index is large, significant, and not much changed by the inclusion of the
school-level choice index and the tract-level segregation measures.

The estimates in Table 1.2 are repeated using several additional stratification

As an alternative, I use a restricted-access data set consisting of observations on
462,424 metropolitan SAT-taker observations from the cohort that graduated from high
school in 1994. The sample includes about one third of SAT-takers from that cohort, and

measures and alternative specifications in Appendix A. The basic result is clear: There is a

represents nearly 20 percent of 1994 high school graduates.44 As students in this sample

strong, robust relationship between the structure of an MSA’s educational governance (at the

generally entered high school in 1990, the MSA demographic data and choice measures

district level) and the degree of student stratification across schools and districts within that

discussed above should accurately describe the environment in which students’ parents made

MSA. District-level market concentration evidently captures real variation in parents’ ability

their locational decisions.

to sort themselves across schools, and it is therefore reasonable to expect markets with less
concentration of district governance to have better-functioning Tiebout marketplaces.

The SAT data are rich, but have a serious limitation: Students self-select into taking
the SAT, and there is evidence that at large geographic scales the SAT-taking rate is
negatively correlated with average performance (Dynarski, 1987). A key source of variation
43

42

The earliest year for which I have been able to obtain electronic PSS data is 1997-1998, so they do not line
up perfectly with the CCD data. Both the CCD and PSS datasets describe the racial composition of the
entire school; when schools include both elementary and secondary grades, I assume that the racial
composition of students in grades 9-12 is the same as that for the school as a whole. The 29 MSAs in which
the CCD is missing racial composition for schools with more than 20% of MSA enrollment are excluded
from the calculations.

44

I nevertheless present estimates for my basic model using the NELS data as a specification test in Section
1.5.
SAT-takers who report their ethnicity were sampled with probability one if they were Black or Hispanic, or if
they were from California or Texas, and with probability one-quarter otherwise. Due to an apparent error in
the College Board’s processing of the file, students who did not report an ethnicity are excluded from the
sample. In data for 1999, in which I have a complete version of the file, these students comprise about 12%
of SAT-takers.

33

34

in SAT-taking rates is the state university system’s preference for the SAT versus its

including a nonresponse category) and the interactions of six ethnicity indicators with two

competitor, the ACT. In “ACT states,” only students who are applying to out-of-state

gender categories (eleven parameters) and with twelve family income bins (66 additional

colleges need take the SAT, inducing significant positive selection into the sample of

parameters).47 The sample is large enough to permit relatively precise estimation of even this

observed SAT scores. To partially remedy this, I discard all observations from the 27 states

flexible model, and effect standard errors are generally below ten SAT points. An index of

with SAT-taking rates below one third.45 The remaining sample consists of 329,205 SAT-

peer quality was constructed by averaging the fitted values (excluding the estimated school

takers from 177 MSAs in “SAT states.” This sample is likely representative of the college-

effect) of this regression over all students at each school.48 This index can be interpreted as

bound population within the areas under consideration, and I do not further adjust for

the peer group’s predicted average SAT performance at a nationally representative school.

sample selection.46 All analyses of the SAT data, however, control for the MSA SAT-taking

By using SAT data to describe each school’s peer group, I necessarily exclude the

rate. Exploratory analyses with more involved selection corrections—reported in Appendix

characteristics of students who do not take the SAT. The average characteristics of SAT-

C—suggest that the resulting estimates are not seriously biased by within-school selection

takers are arguably a more accurate measure of the peer group for college-bound students

into SAT-taking.

than would be averages over the entire student population, as students at many schools are

The size of the SAT database permits precise estimation of school-level measures: I

tracked into college-preparatory and non-college-preparatory courses with little interaction

have at least ten observations per school from schools with 77 percent of enrollment in the

between students in the two groups, and it seems plausible that parents distinguish between

MSAs studied. Only 22 percent of schools (enrolling 10 percent of sample students) in the

the groups in their evaluations of schools. Absent microdata for non-SAT-taking students,

SAT data are private.

however, I am unable to test this restriction.

It is helpful to have a one-dimensional index of peer group quality at each school.

Table 1.3 lists summary statistics for the SAT sample and for that portion of the

To construct this, I estimated a flexible regression of individual SAT scores on student

sample in MSAs in each of the four choice quartiles. High-choice MSAs have substantially

characteristics, controlling for school fixed effects. The model included effects for 100

higher SAT-taking rates and scores than do low-choice MSAs. The differences in average

parental education categories (ten for mother’s education by ten for father’s education, each
SAT-taking rates use 12th-grade enrollment at schools which successfully match to the SAT data as the
denominator, although other definitions produce the same sample. The selection rule is insensitive to the
exact cutoff used: The marginal states, Colorado and Oregon, have rates of 23% and 38%, respectively.
Among states above the cutoff, average scores offer no evidence of differential selection into SAT-taking; see
Appendix C.
46 Roughly 45% of the relevant national cohort enrolled in college after graduation, although only about twothirds of enrollment is at four-year institutions (National Center for Education Statistics, 1999, Tables 101,
173 and 184).
45

35

47
48

The model explains 33 percent of the cross-sectional variance in individual SAT scores (as compared with 22
percent explained by school effects alone).
The individual characteristics coefficients may be biased by endogenous selection into schools. This is not a
problem for my estimation strategy as long as the bias affects all background variables equally: The only role
for these coefficients is to assign relative weights to the individual variables, and the scale of the background
index is irrelevant. Tests reported in Appendix C indicate that the school-level index is quite reliable, and in
any case specification checks reported in Section 1.5 indicate that the results are not particularly sensitive to
the particular peer group measure used.

36

scores, however, are entirely accounted for by differences in students’ background

the schools attended by advantaged and disadvantaged students will tend to be larger when

characteristics.49

Tiebout choice makes it easier for wealthy parents to select effective schools without

Figure 1.5 displays the scatterplot of school average SAT scores against the peer

accepting unwanted peers. As a result, naïve estimates of the peer effect should be larger in

group index for a one-quarter subsample of the schools in the data. Circle sizes indicate the
number of (weighted) observations entering the school-level averages. The figure also

high-choice markets than in low-choice markets.
My first test of this prediction in the SAT data uses nonparametric techniques to

displays the regression of average SAT score on the peer group, controlling for MSA fixed

allow for a nonlinear educational production function. These offer no evidence of

effects, which has slope 1.74. The peer index is scaled so that the effect of individual

substantial nonlinearity, and I next turn to regression estimates of several linear

characteristics on own scores (i.e. β in equation 3) accounts for exactly 1 of this, with the

specifications. I also present estimates from alternative data sets; these are imprecise but

remaining 0.74 deriving from the slope of school effects with respect to peer group (i.e.

completely consistent with those derived from the SAT data. None of the data sets or

from γ + θ , the combination of reduced-form peer effects and effectiveness sorting). In the

specifications studied here supports the hypothesis that effective schools are more likely to

next section, I look for evidence that the slope of this line is steeper in high-choice than in

attract advantaged students in markets where the Tiebout choice index is high.

low-choice MSAs; under the assumption that β and γ do not vary systematically with

1.5.1. Nonparametric estimates

choice, variation in the overall slope is informative about ∂θ ∂c , the effect of choice on

If neither the effect of individual characteristics on own scores ( β ) nor the reduced-

effectiveness sorting. In Section 1.6, I estimate a different potential effect of Tiebout choice

form peer effect ( γ ) varies systematically with the structure of local school governance, and

on the line in Figure 1.5. There, I look for evidence that choice affects its intercept, as it

if sorting on effectiveness is more complete in high-choice than in low-choice markets, a

might if choice is correlated with average effectiveness (i.e. if ∂E[µ|c ] ∂c ≠ 0 ).

version of Figure 1.5 which included data only from markets with high choice indices should
exhibit a steeper slope than that shown, while a version estimated only from low-choice

1.5.

markets should be less steep. In a linear model, this may be confounded if there are

Empirical Results: Choice and Effectiveness Sorting

nonlinearities in the causal peer effect (i.e. in ∂t ∂x ), as market structure influences the

The sorting model in Section 1.3 predicts that if parents choose neighborhoods

dispersion of schools’ peer groups around the MSA average.

largely for the effectiveness of the local schools, equilibrium effectiveness sorting will

The median MSA contains only 19 high schools, not nearly enough to permit

depend on the educational market structure. Specifically, the gap in effectiveness between
49

separate nonparametric estimation for each market. As an alternative, I grouped MSAs into

Recall that low-choice MSAs are disproportionately Black, Hispanic, and in the South.

37

38

[

quartiles by the choice index and estimated separate school-level kernel regressions of test

]

E t jm |x jm = (α +ψ m ) + x jm (β + γ + ϕ 0 ) + x jm c mϕ1 + x jm Z mϕ 2 +

(

(

[

])

)

+ x jmωm + µ jm − E µ jm |x jm + ε jm ,

scores on student characteristics for each quartile. Figure 1.6 displays the estimated
functions, which use an Epanechnikov kernel and a bandwidth of five, about one-tenth of a
school-level standard deviation. The figure offers little evidence of any differences in

(7)

where ωm ≡ θ m* − θ m (c m , Z m ) is the residual from (6), which I assume is independent of the
stratification of peer groups (i.e. of the distribution of x jm − x m ).

reduced-form educational production functions between the high-choice and low-choice
The effect of choice on the extent of effectiveness sorting can thus be estimated as

quartiles, as the quartile functions are quite similar in both their intercepts and slopes.

the coefficient on the interaction of peer group ( x jm ) with the choice index ( c m ) in a
1.5.2. Regression estimates of linear models

regression for school average test scores. The terms on the second line of (7) are

The quartile analysis in Figure 1.6 offers no natural way to control for MSA variables
that might have independent effects on the housing market or on the causal importance of

unobserved residuals, and standard errors must be adjusted to account for their nonclassical
structure.

the peer group. Here, I develop and estimate a more parametric version of the hypothesis of
interest. Drawing on the indication in Figure 1.6 that there is no substantial nonlinearity in

Table 1.4 contains the main empirical results of the chapter. It presents OLS

the peer effect, I revert to the earlier linear model, letting m index housing markets:

t jm = x jm (β + γ ) + µ jm + ε jm , with

[

Basic results

(4)

estimates of model (7), using MSA fixed effects to absorb the effect of variations in ψ m .
Standard errors permit arbitrary heteroskedasticity and are clustered at the MSA level to

]

E µ jm |x jm = ψ m + x jmθ m* .

(5)

accommodate the within-MSA autocorrelation implied by the random coefficient ωm .

A well-sorted market assigns high- x jm students to high- µ jm schools, and corresponds to a

Schools are weighted by the sum of individual SAT-taker observations’ inverse sampling

high value of θ m* . In general, for fixed parental valuations, δ , the expected sort may vary

probabilities, with an adjustment at the MSA level to weight MSAs in proportion to their 17year-old populations.

both with choice and with other metropolitan characteristics, Z m :

[

]

θ (c m , Z m ; δ ) = E θ m* |c m , Z m ; δ = ϕ0 + c mϕ1 + Z mϕ 2 .

Column A displays a very restricted version of model (7) that excludes all

(6)

The discussion in Section 1.3 suggests that if the peer group is not too important to parents,
effectiveness sorting will be more complete when there are more jurisdictions, so ϕ1 > 0 .

Combining (4), (5), and (6), we obtain an estimable equation:
39

interactions between the peer quality index and metropolitan area characteristics. (That is, it
forces ϕ1 = ϕ 2 = 0 ; this is the model depicted in Figure 1.5.) It indicates that when all MSAs

in the sample are pooled, the gradient of school average SAT scores with respect to the
characteristics of SAT-takers is 1.74. One standard deviation of school-average student
40

background is 48 points. This corresponds to an 84 point difference in expected average

Column C allows the racial and ethnic composition of SAT-takers to have an

SAT scores, 0.88 standard deviations of this variable. This, of course, reflects the combined

independent effect on average SAT scores. If there are cultural biases in SAT scores, for

influence of individual characteristics ( β ), peer effects ( γ ) and an average of the θ * ’s, the

example, individual ethnicity may have a different effect than does the composition of the

within-MSA gradients of school effectiveness with respect to peer group.

peer group. The coefficients on racial composition variables are large and significant, but

Column B adds a single interaction of the peer group with a choice index. The
estimate of ϕ1 is small and indistinguishable from zero. The remaining columns add

again their inclusion has essentially no effect on the parameter of interest, the interaction of
average peer quality with Tiebout choice.
Column D tests a different aspect of the specification, the assumption that the

additional interactions of x jm with several metropolitan-level controls that might capture
other determinants of the sorting process, the distribution of school quality, the reducedform peer effect, or the sample selection process. Moving from left to right, these controls
include the MSA-level SAT-taking rate and indicators for six census divisions; the log of the
MSA population; and two combinations of additional demographic, income distribution, and
institutional controls. In each specification, the ϕ1 point estimate is negative, although it is
only significantly different from zero in columns C and D.

background characteristics predicting SAT scores are identical to those indexing willingnessto-pay for desirable schools. To test this, I allow willingness-to-pay to depend on students’
self-reported family income, estimating the interaction between income and Tiebout choice
while including the peer quality index to absorb peer effects. The interaction coefficient
here is again negative and insignificant.
Columns E and F explore the impact of varying the sample definition. In Column
E, the basic model is estimated on public schools only, while in Column F the 18 MSAs that

All of the models in Table 1.4 are based on a particular specification of the
educational production function, (7), which may not be correct. Table 1.5 reports the results
of several alternative specifications, each using the control variables from Column E of

have only a single district are excluded. The choice-peer group interaction coefficient is
again negative in each of these specifications, significantly so (and with a substantially larger
point estimate than in the basic specifications) in the latter case.

Table 1.4. Column A repeats the relevant coefficients from that specification. In Column B,
the peer effect is allowed to depend on the standard deviation of student characteristics as
well as on their average level. The standard deviation term enters significantly, indicating
that heterogeneous schools produce substantially higher scores than do homogenous schools
with the same average student background. The choice-peer group average interaction is
slightly more negative than in Column A.

Although results are not presented here, I have estimated several additional
specifications of the basic empirical test. The absence of a positive choice effect does not
seem to derive from the particular weighting of the data used here—one might prefer to
weight MSAs equally, or by the number of SAT-takers, rather than by their high-school-age
populations—nor from the inclusion in the sample of schools with too few SAT-takers to
permit accurate estimation of the school mean. In addition, Appendix B presents several

41

42

instrumental variables estimates of (7); there is no indication that endogeneity of the choice

and including a fixed effect for each MSA.50 As in the SAT data, peer effects and

index biases the estimates presented here.

effectiveness sorting are together substantial, inflating the school-level background index

Evidence from the NELS and from high school completion rates

coefficient by 90 percent relative to the coefficient of a within-school regression of

The SAT data are uniquely valuable for my empirical strategy, both because they

individual scores on own characteristics. When the peer group measure is interacted with

span a large fraction of metropolitan high schools and because they describe an outcome

the choice index—in Column B, and again with additional controls in the remaining

that is an important factor in families’ evaluations of schools. Nevertheless, it remains

columns—the coefficient is indistinguishable from zero, with a negative point estimate in

possible that selection into SAT-taking biases the above results. To assess their validity, I

every specification.
Panel B repeats this analysis, this time with the score earned by students when they

estimate the basic model using test score data from the National Education Longitudinal
Study (NELS) and high school completion rates from the Common Core of Data (CCD).

were in the 12th grade.51 Again, estimates of the choice effect are imprecise but are—with

Neither of these has nearly the breadth of the SAT data, so the estimates presented here are

one statistically insignificant exception—of the opposite sign from that predicted by the

not as precise as those above, but the point estimates are reassuringly similar.

economic model.

The remaining panels present models for measures relating to school continuation

The NELS sampled about 23 eighth grade students from each of 815 public and 237
private schools in 1988, following up with portions of this original sample at two-year

rates, defined as one minus the cumulative dropout rate. In Panel C, the dependent variable

intervals thereafter. Using a confidential version of the NELS data and school addresses

is the fraction of students from the NELS 8th grade sample who were still in school at the

from the CCD and the Private School Survey, I am able to match 700 schools (534 public

time of the 12th follow-up survey four years later. The background index used is the same as

and 166 private) in the NELS sample to the MSAs in which they are located.

that used in Panel B; it is a strong predictor of continuation rates but there is no evidence

The first panel of Table 1.6 presents estimates using the composite test scores that

that it is a stronger predictor in high-choice markets.
The final panel leaves the NELS data, reporting models for high school completion

th

students earned during the original wave of the NELS, when they were in 8 grade. (I
continue to use the secondary choice index in this analysis; it correlates 0.98 with an

rates of the cohort entering 9th grade in the fall of 1993. Data on this outcome come from a

elementary index.) Column A presents the coefficient from a regression of school average

district-level compilation of four years of CCD data. There are several limitations to the

scores on an index of student quality, pooling all metropolitan schools in the NELS sample
50
51

43

The background measure is a weighted average of variables characterizing students’ race and their parents’
education, again using weights chosen to best predict student test scores within schools.
Peer group and test score averages are still for the 8th grade school, as once students transfer to high schools
the NELS sample is no longer representative of the schools attended.

44

CCD completion rate variable: It is measured at the district level rather than the school; it

important background variables; or the included variables may be imperfectly measured—

covers only public schools; it is missing for a great many districts who failed to report one of

likely a particular problem for family income in the SAT data, which high school students are

the component variables; and it may be unreliable if districts cannot distinguish mobility

not likely to report reliably. Any of these would attenuate the estimated gradient of school

from dropout. Moreover, the CCD contains very little information about student

average student outcomes with respect to peer group characteristics

background, and I therefore use the SAT data student quality index, aggregated to the

The reliability of x jm is likely to be higher, however, in markets where schools are

district, to measure student characteristics. I drop MSAs that are not in SAT states or where

more stratified. One reason is that stratification implies a higher true variance of the peer

available completion rate data cover less than two thirds of public enrollment. This leaves a

group, and therefore a larger signal component of the signal-to-noise ratio. A second reason

sample of 931 districts from 50 MSAs. In spite of the serious limitations in the CCD data,

is that schools in more stratified markets are likely to be more internally homogenous; as the

the pattern of results in Panel D is quite similar to that in Panel C. Again, the student quality

sampling variance of the school average depends linearly on the within-school variance of

index is a strong predictor of completion rates, but its coefficient is (insignificantly) smaller

individual characteristics, more internally homogenous schools imply more reliable school-

in high-choice than in low-choice MSAs.

level averages. A final reason to suspect a stratification-reliability relationship is that

Given the lack of precision in the NELS and CCD estimates, it is somewhat
surprising how well they line up with those in Table 1.4. As before, the choice effect is

unobserved peer group characteristics are likely to be more strongly associated with
observed characteristics in markets that are more heavily stratified.

indistinguishable from zero, but point estimates suggest that effectiveness sorting is slightly

In single-MSA regressions of test scores on student characteristics, the above

less complete in high-choice markets. There is nothing to indicate that the SAT-based results

arguments imply greater attenuation of the peer group coefficient in MSAs with less

are an aberration.

stratified schools. As choice is positively correlated with stratification, this produces a

Possible biases in estimates of (7)

tendency toward larger estimated coefficients (i.e. less bias toward zero) in high-choice

Several identifiable factors may bias the coefficient on the peer group-Tiebout choice

MSAs. In fact, I do not estimate separate regressions for each MSA, but the general effect is

interaction in specifications like (7). I discuss two here; each can produce an upward bias in

the same: Unreliability of the peer group measure produces an upward bias in the effect of

ϕ1 .

choice on the peer group gradient, and therefore in the interaction coefficient ϕ1 .
A second possible source of bias in ϕ1 is economic. There is some evidence that the

The first source of bias is statistical. There are several reasons to suspect
measurement error in the peer group variable: There may not be enough observations at any

educational labor market is more liquid in MSAs that have many districts competing for

particular high school to accurately estimate the school-level average; the data may omit

teachers’ talent than in those with more concentrated governance (Luizer and Thornton,
46

45

1986). This may make it easier for a high- x jm school to attract good teachers in a high-

choice process driven in substantial part by parental pursuit of effective schools? The

choice market than in one with less choice, where teachers are likely to be assigned to

answer appears to be no. Note that the within-MSA gradient of school average SAT scores

schools by bureaucratic rules rather than by the market. Any such effect would imply a

with respect to student characteristics is 1.74 (from Column A of the same table). Even at

positive effect of choice on the reduced-form peer effect— γ in equations (1) and (4)—

the upper limit of the confidence interval, a move from unified governance to complete

which will appear as a positive contribution to ϕ1 .52

decentralization accounts for just over ten percent of this gradient.
We can imagine as a thought experiment fully decentralizing school governance in

Either of these effects would imply upward bias in estimates of ϕ1 relative to the
effect of interest. To the extent that they are thought to be important, the results presented
in Table 1.4, 1.5, and 1.6 should be seen as upper bounds on the effect of Tiebout choice on

Miami-Dade County, which is served by a single district.53 Figure 1.7 displays the actual
distribution of peer groups and school average SAT scores in Miami, as well as the
counterfactual distribution that might be observed if the Miami choice index were changed

parental effectiveness sorting.

to one and if the effect of choice were at the upper limit of its confidence interval.54 The
Calibration of results: Can we reject meaningful effects?
None of the estimates presented in this section supports the hypothesis that effective
schools are more likely to attract the best peer groups in markets with fragmented school
governance than in those where Tiebout choice is more difficult to exercise. Point estimates

actual and counterfactual distributions of school averages are nearly identical. If the
counterfactual reflects a substantial increase in sorting on school effectiveness, it must be
that effectiveness is responsible for a very small share of the across-school variation in SAT

scores.

of the choice-peer group interaction are almost uniformly negative, suggesting that
effectiveness sorting is less complete in high-choice than in low-choice markets. These
estimates are imprecise, however, and most cannot reject a zero effect. It is worth
considering whether the confidence regions exclude the sorts of effects that we would

Recall, moreover, that this thought experiment assumes a choice coefficient at the
upper limit of the confidence interval. At the point estimate, choice reduces the gradient of
SAT scores with respect to student quality. The models in Table 1.4 reject a sizable—by any
reasonable standard—effect of choice on the test score gradient. The estimated effects are

expect if school effectiveness were a prime determinant of parental location decisions.
Consider the specification in Column E of Table 1.4. Would a true effect of

The district’s web site indicates that the county is partitioned into school attendance areas. These can be
changed easily, however, and indeed were under the supervision of federal judges for desegregation purposes
from 1970 through 2001 (Welch and Light, 1987).
54 Note that decentralization of Miami’s schools would probably change the allocation of peers as well as their
distribution across schools. If, as Table 1.2 indicates, choice causes increased stratification, the counterfactual
Miami market would exhibit more dispersion along the horizontal axis in Figure 1.7. The figure ignores any
such effect, and simply considers whether decentralization would lead to increased dispersion of SAT scores
conditional on the observed peer group allocation.
53

+0.20—the upper bound of a 95% confidence region for ϕ1 —be consistent with a Tiebout
52

Note that this effect has nothing to do with parents’ use of their power to choose: It arises from teachers
moving to schools with students who are easy to teach, rather than from parents moving to districts with

good teachers.

47

48

difficult to reconcile with a sorting process in which school effectiveness is an important part

inclusion does not substantially alter the estimated effect of choice: It remains negative and

of both location decisions and educational production.

significant.55

1.6.

Empirical Results: Choice and Average SAT Scores

is not very large: A one standard deviation (0.28) increase in the choice index corresponds

The results presented in Section 1.5 offer no evidence that the allocation of effective

with a reduction in mean scores of only about four points, about one-eighth of an MSA-level

The negative effect of Tiebout choice on average SAT scores indicated by Table 1.7

schools is systematically different in high-choice than in low-choice markets. If Tiebout

standard deviation. Moreover, in some alternative specifications not reported here, the

choice does not increase the probability that effective schools attract students from

coefficient estimate is statistically insignificant, though still negative. When MSAs are

advantaged backgrounds, it is not clear how it can provide incentives that will lead

weighted equally, for example, rather than by the number of SAT takers or by the 17-year-

administrators to exert greater effort. The above results thus suggest that the argument

old population (not shown, but similar to the SAT-taker weighting in Table 1.7), the choice

(Brennan and Buchanan, 1980; Hoxby, 2000a) that average school performance should be

effect is about one third as large as that shown here and confidence intervals do not reject

higher in markets with decentralized governance may not hold. The SAT data permit a

zero. Nevertheless, there is no indication that Tiebout choice is associated with higher SAT

direct test of this prediction, however.

scores once student background is controlled.56 Moreover, the coefficient on the

Table 1.7 presents regression models for the average level of SAT scores across

background index across MSAs—1.58 in Column C, and slightly higher in later columns—is

MSAs. Column A includes only the choice index as a regressor. It enters with a positive

nearly identical to that found within MSAs (Table 1.4, Column A). This is consistent with

coefficient, implying that fully decentralized MSAs produce average SAT scores about forty

the claim that both coefficients measure primarily the peer effect ( γ ), which might be the

points higher than do those with only a single district. Recall, however, that there are large

same across MSAs as within, rather than effectiveness sorting ( θ ), which we would expect

differences between high-choice and low-choice MSAs in both SAT-taking rates and student

to see within but not across MSAs.

characteristics (from Tables 1.1 and 1.3). Columns B, C, and D add controls for the SATtaking rate and the average background index of SAT-takers. The positive correlation

The results on SAT scores across MSAs thus support those on the distribution of
scores within MSAs: The evidence does not indicate that Tiebout choice provides incentives

between choice and performance seems to result entirely from the omission of students’
55

background characteristics; when they are included in Column C, the coefficient becomes
negative and significant. The remaining columns add additional MSA-level regressors. Their

56

Note that in Column F, which controls for several MSA demographic characteristics, the coefficient on the
SAT-taking rate finally takes on its expected sign.

Hoxby (2000a), finds a positive effect of choice on average NELS scores across MSAs, one that is larger for
high-income than for low-income students. The SAT sample might be thought analogous to her “not-lowincome” group. Hoxby’s positive effect is not seen here, either in the OLS results in Table 1.7 or in
instrumental variables specifications (in Appendix B) similar to hers. See Chapter 2 for further discussion of
her results.

49

50

to school administrators to improve productivity, as productive administrators appear no

to student characteristics varies systematically with Tiebout choice, as would be expected if

more likely to be rewarded for it in high-choice than in low-choice MSAs.

effectiveness allocations were more stratified in high-choice markets. Even at the upper
extreme of the estimated confidence intervals, the SAT gap between more- and less-desirable

1.7.

Conclusion

schools is not meaningfully larger in markets with decentralized governance than in those

This chapter has used the Tiebout choice process—the choice of school

with less Tiebout choice. Several specification tests and alternative data sets fail to reveal

characteristics via housing decisions—as a lens through which to study the strength of

important biases in the basic models. Consistent with the results on within-market sorting, I

parental preferences for effective schools relative to those for other neighborhood or school

also find no evidence that Tiebout choice increases average SAT scores across markets, as

characteristics. Earlier work on Tiebout mobility presumes that parents use their location

would be expected if choice increases competitive pressure for administrators to run

decisions to choose effective schools; one lesson of the analysis here is that the potential

effective schools.

importance of peer group externalities to community desirability can create coordination
failures in which ineffective schools are preferred to more effective competitors.
The motivation for the empirical approach is a model of the Tiebout marketplace in

I see four possible explanations for the pattern of results. First, it may be that I have
mis-measured the extent of Tiebout choice by focusing on a district-level choice index where
in fact the relevant measure of parents’ exit options is at the school level. Second, parents

which housing prices ration access to desirable schools. As is common in multicommunity

may have no concern whatever for the peer group, and may choose schools purely for their

models, equilibrium is characterized by maximum stratification of families across school

effectiveness. (Recall that there is no necessary connection between market structure and

districts, with the wealthiest families residing in the most-preferred communities. Preferred

effectiveness sorting in this case.) Third, parents’ concern for the peer group may be so

districts need not have particularly effective schools, however, when peer group enters into

large that it dominates effectiveness in their choices, so that again there is no effect of choice

parental valuations, as wealthy families can be “stuck” in ineffective schools by their

on effectiveness sorting. Finally, it may be that the sorts of policies that I call “school

unwillingness to abandon the peer group offered. For parental valuations that place

effectiveness,” those not dependent on the peer group, are relatively unimportant

substantial weight on school effectiveness, this becomes less likely as Tiebout choice

determinants of student outcomes (or that they do not vary substantially across schools), and

increases parents’ exit options.

thus that effectiveness sorting and differences in average effectiveness across markets are not

In so far as student test scores depend on school effectiveness, effectiveness sorting
is observable as an increase in the slope of school average scores with respect to student
characteristics. I find no evidence that the gradient of school-level SAT scores with respect
51

observable in the pattern of average SAT scores.

The first two of these are not particularly plausible. I present strong evidence, in
Table 1.2 and in Appendix A, that the district-level choice index is an important determinant
52

of student stratification, even when possible confounding factors are controlled. It seems

the better school were that choice separable from the residential location decision.

that parents are sorting on some characteristics of school districts, though not on anything

Moreover, voucher programs that encourage the entry of new competitors may produce

that serves to increase student performance conditional on individual and peer

more options for parents than even the most decentralized of district governance structures,

characteristics.

reducing the potential for coordination failures and increasing the probability that even

It similarly seems unlikely that parents have zero concern for peer group. In the

parents who value the peer group highly will choose effective schools. It thus seems likely

presence of direct or indirect peer effects on student learning, parents would be irrational to

that the character of equilibrium will depend crucially on the particular institutions of any

ignore peer group in their evaluations of schools, and anecdotal evidence suggests that they

choice program. Further research with large-scale voucher programs will be needed to

do not do so. The likelihood that parents have imperfect information only reinforces this

determine whether administrators of effective schools are rewarded by increased demand in

judgment, as the most widely available indicator of school quality, the average test score,

the choice regimes that these policies create.

loads heavily on the peer group, while value added is much more difficult to observe.
The alternative hypotheses that are consistent with the above results, that parental
valuations place a great deal of weight on peer group relative to effectiveness or that
administrative and instructional effectiveness is simply unimportant to the distribution of
educational outcomes, seem more plausible. I interpret the chapter’s results as cautious
support for the first of these, though the second would equally well explain the results and in
any case their implications for the productivity benefits of Tiebout choice are the same.
In the absence of parental sorting on school effectiveness, there is little theoretical
support for the claim that Tiebout choice markets create incentives for school administrators
to exert greater effort to raise student performance. Caution is required, however, in
generalizing from this chapter’s results to choice markets that do not link school assignment
to residential location. Under Tiebout choice, parents may have to give up desired
neighborhood amenities—views, parks, air quality, or characteristics of neighbors—to obtain
a more effective school. They may be unwilling to do this even though they would choose
53

54

Table 1.2.

Effect of district-level choice index on income and racial stratification

Tables and Figures for Chapter 1.

(A)
0.07
(0.01)

(B)
0.09
(0.01)

(C)
0.10
(0.01)

White/Non-White
Dissimilarity Index
(School Level)
(D)
(E)
(F)
0.15
0.10
0.10
(0.03) (0.03) (0.02)

ln(Population) / 100

0.77

(0.22)

1.29
(0.29)

0.22
(0.27)

4.39
(0.83)

2.21
(1.13)

-0.64
(0.90)

Pop: Frac. Black

0.09
0.09
(0.022) (0.022)

-0.06
(0.026)

0.33
(0.09)

0.30

(0.09)

0.12
(0.08)

Pop: Frac. Hispanic

0.02
(0.01)

0.01
(0.01)

0.01
(0.01)

0.03
(0.06)

0.04
(0.06)

0.11
(0.05)

ln(mean HH income)

0.07
0.07
(0.018) (0.018)

0.03
(0.016)

0.15
(0.07)

0.17
(0.06)

-0.06
(0.05)

Gini coeff., HH income

0.36
(0.11)

0.33
(0.11)

0.15
(0.10)

1.88
(0.42)

2.03
(0.42)

0.20
(0.33)

Pop: Frac. BA+

-0.06
-0.07
(0.033) (0.033)

-0.05
(0.036)

-0.47
(0.13)

-0.40
(0.13)

0.32
(0.12)

Foundation plan state / 100

-0.04
(0.40)

-0.04
(0.39)

0.24

(0.35)

-2.95
(1.60)

-3.00
(1.58)

0.24
(1.15)

0.10
(0.036)

0.10
(0.031)

-0.37
(0.13)

-0.38
(0.10)

Table 1.1.
Summary statistics for U.S. MSAs

Across-District Share of
Variance, HH Income

Dependent Variable:

All MSAs
Mean
(A)
Panel A: Basic Descriptive Statistics
N
318
Choice index (district level)
0.66
ln(Population)
12.7
ln(Mean HH inc.)
10.5
Gini coeff., HH inc.
0.36
Fraction Black
10%
Fraction Hispanic
7%
Fraction college grads
20%
In foundation plan state
74%
South
38%
Private enrollment share
8%
Panel B: Districts and schools (public, grades 9-12)
# of districts
14.7

# of schools
33.5
# of students (thousands)
25.8
Average district enrollment
3,053
Average school enrollment
709
Choice index (school level)
0.89

S.D.
(B)

Mean by Choice Quartile
Least
Most
Choice
Q3
Q2 Choice
(C)
(D)
(E)
(F)

5%

81
0.25
12.1

10.4
0.37
12%
9%
20%
89%
65%
7%

76
0.66
12.3
10.4
0.36
11%
7%
19%
82%
47%
8%

83
0.81
12.8
10.5
0.35
9%
7%
20%
75%

30%
8%

78
0.92
13.5
10.6
0.35
8%
5%
20%
50%
12%
9%

17.9
40.6
40.1
6,103
242
0.08

3.3
16.5
13.4
6,557
754
0.82

7.7

20.1
14.1
2,247
692
0.88

13.5
38.1
30.2
2,015
680
0.92

34.5
59.1
45.3
1,303
710
0.96

0.28
1.0
0.2
0.02
10%
13%
6%

Choice

School-level choice index
Census tract- level segregation measures:
Isolation index (white/non-white)

Sources: Common Core of Data, 1990; 1990 Decennial Census STF-3C; Card and Payne (1998). Choice
quartiles are index values 0-0.5 (Q4); 0.5-0.75 (Q3); 0.75-0.875 (Q2); and 0.875-1 (Q1).

N
2
R

0.10
(0.03)

-0.27
(0.09)

Dissimilarity index (white/non-white)

-0.07
(0.03)

1.06
(0.10)

Across share of variance, education

-0.15
(0.046)

-0.38
(0.15)

Across share of variance, HH inc.

0.37
(0.05)

0.40
(0.16)

293
0.63

293
0.64

293
0.74

289
0.63

289
0.64

289
0.82

Notes : Observations are MSAs/PMSAs. Regressions are unweighted. Dependent variable has mean (S.D.)

0.041 (0.038) in columns A-C; 0.413 (0.151) in columns D-F. Columns A through C exclude 25 one-district
MSAs. Dissimilarity index is calculated over public and private schools; 29 MSAs in which racial composition is
missing for schools with more than 20% of public enrollment are excluded. All columns include fixed effects
for nine census divisions.

55

56

Table 1.3.
Summary statistics for SAT sample

Table 1.4.
Effect of Tiebout choice on the school-level SAT score-peer group gradient
By Choice Quartile

All MSAs

# of observations
# of schools
# of MSAs

MSA SAT-taking rate
Individual-level
SAT score
Black
Hispanic
Asian
Female

Father's education
Mother's education
Family income ($1,000s)
Student background index
School-level
# of SAT observations
Sum of SAT weights
S.D., SAT score
S.D., student background

(A)
(B)
329,025
5,727
177

Least
Choice
(C)
42,286
755
42

Most
Q3
Q2
Choice
(D)
(E)
(F)

30,298 117,274 139,167
619
1,648
2,705
37
47
51

Mean S.D.
39.5% 10.6%

32.8%

Mean
37.8% 38.7%

997
12%
10%
9%
54%
14.3
13.8
47.0
997
57
102
171
62

201

47.0%

2.7
2.5
25.9
81

973
19%
17%
8%
56%
13.9
13.6
40.7
974

995
11%
12%
4%
55%
14.3
13.9
45.6
997

997

13%
13%
14%
55%
14.3
13.8
47.3
996

1004
9%
6%
7%
54%
14.3
13.9
48.7
1004

67
90
31
16

56
101
168
65

49

89
170
65

71
101
172
65

51
105
171
59

Notes : See text for description of SAT sample. Individual-level measures weight observations by inverse
sampling probability. Schools are unweighted for school-level measures. Individual- and school-level
standard deviations in Column B are computed over individuals and schools, not over MSA means. Choice
quartiles are index values 0-0.5 (Q4); 0.5-0.75 (Q3); 0.75-0.875 (Q2); and 0.875-1 (Q1).

(A)
1.74
(0.04)
Interaction of student background average with:
* Choice index
Avg. student background index

(B)
1.72
(0.17)

(C)
1.49
(0.15)

(D)
0.09
(0.27)

(E)
-2.35
(2.34)

(F)
0.76
(2.45)

0.02
(0.20)

-0.41
(0.13)
2.08
(0.51)

-0.34
(0.12)
1.94
(0.46)
0.09
(0.02)

-0.09
(0.15)
0.99
(0.44)
0.04
(0.02)
-0.33
(0.36)
0.03
(0.18)
0.18
(0.21)
3.02
(1.71)
1.56
(0.52)
-0.02
(0.06)

n

n

y

y

y

-0.11
(0.17)
1.16
(0.49)
0.03
(0.03)
-2.33
(1.09)
-1.60
(0.80)
-0.02
(0.20)
1.89
(1.90)
2.31
(0.64)
-0.03
(0.05)
-1.16
(0.65)
0.02
(0.03)
1.14
(0.82)
y

0.77
0.74

0.77

0.74

0.78
0.74

0.78
0.75

0.78
0.75

0.78
0.75

* MSA SAT-taking rate
* ln(Population)
* Pop: Frac. Black
* Pop: Frac. Hispanic
* ln(mean HH inc.)
* Gini, HH inc.
* Pop: Frac. BA+
* Foundation plan state
* Pop: Frac. White2
* ln(Density)
* Pop: Frac. LTHS
* Census division FEs
2

R
R2, within MSAs

Notes : Sample in each column is 5,727 schools in 177 MSAs. Dependent variable is the weighted mean SAT
score at the school. Within MSAs, observations are weighted by the estimated number of SAT-takers at the
school (i.e. by the sum of individual sampling weights); these are adjusted at the MSA level to make total
MSA weights proportional to the 17-yr-old population. All models include 177 MSA fixed effects, and
standard errors are clustered at the MSA level.

57

58

Table 1.5.
Effect of Tiebout choice on the school-level SAT score-peer group gradient:
Alternative specifications

(A)
Mean peer quality * choice

Full Sample
(B)
(C)

S.D.(peer quality)

Base

Public
Schools
Only
(E)

Multi-District
Markets Only
(F)

-0.10
(0.17)

-0.46
(0.17)

0.53
(0.08)

Mean family inc. ($1,000s) * choice

-0.16
(0.44)

Peers: Fr. Black

126.5
(21.9)

Peers: Fr. Hispanic

74.1
(14.5)

Peers: Fr. Asian

82.1
(22.7)

Peers: Fr. other race

38.9
(20.3)

N
R2
2
R , within MSAs

(D)

-0.09 -0.15 -0.11
(0.15) (0.15) (0.13)

5,727
0.77
0.74

5,139 5,727
0.78 0.80
0.75 0.77

No
Basic Preferred
Full

Controls Controls Controls Controls
(A)
(B)
(C)
(D)
(E)
Panel A: NELS 8th grade score (205 MSAs; 707 schools; 23.3 students per school)
Avg. student background index
1.90
1.97
2.62
7.76
7.24
(0.08) (0.15)
(0.30)
(12.75) (13.04)
* choice

-0.09
-0.42
-0.57
(0.20)
(0.25)
(0.40)
Panel B: NELS 12th grade score (202 MSAs; 682 schools; 12.1 students per school)
Avg. student background index
1.47
1.62
2.26
10.65

(0.14) (0.19)
(0.42)
(12.13)

-0.56
(0.56)
3.94
(13.82)

* choice

-0.19
-0.37
0.14
-0.04
(0.32)
(0.44)
(0.53)
(0.57)
Panel C: NELS 8th-12th grade continuation rate (202 MSAs; 682 schools; 12.1 students per school)
Avg. student background index / 100
2.53
3.19
3.03
26.46
55.22
(0.49) (2.55)
(1.73)
(44.64) (49.82)
* choice

5,690
0.78
0.75

4,453
0.80
0.75

5,476
0.78
0.75

Notes : Dependent variable in all columns is school mean SAT score. All models include 177 MSA fixed
effects and main effects of the peer quality index (or mean family income, in Column D), as well as
interactions with the "MSA Characteristics" used in Table 1.4, Column E . Observations are schools,
weighted within MSAs by the sum of individual weights and across MSAs by the 17-year-old population; see
text. Standard errors are clustered at the MSA level. Sample size varies due to availability of regressors:
S.D.(peer quality) is set to missing when there are 5 or fewer observations; mean family income is calculated
over students who report non-missing values. Column E excludes private schools, while Column F excludes
18 MSAs with only a single district.

59

Table 1.6.
Effect of Tiebout choice on the school-level SAT score-peer group gradient:
Evidence from the NELS and the CCD

-0.84
-1.28

(3.04)
(1.99)
Panel D: CCD 9th-12th grade completion rate (50 MSAs; 931 school districts)
Avg. student background index / 1,000 1.99
2.79
5.34
(0.21) (2.45)
(2.00)
* choice

-0.90
(2.56)

-5.43
(2.47)

-0.19
(2.27)

-0.69
(2.34)

-28.06
(10.59)

-33.35
(14.37)

-7.08
(4.29)

-6.34
(4.25)

Notes : Specifications are similar to those in Table 1.4, columns A, B, C, E, and F, although the MSA SATtaking rate is excluded from all models. All models control for MSA fixed effects and all standard errors are
clustered at the MSA level. Sample for Panel A is schools in the original NELS 8th grade sample; Panels B
and C restrict sample to those schools with students in the 1988-1992 NELS panel. Student Background
Index in Panels A-C is fitted value from a within-school regression of composite test scores (8th grade in A;
12th in B and C) on student race, gender, and parental education measures, averaged to the school level and
dropping the school fixed effects. Sample in Panel D is public school districts in SAT-sample MSAs with nonmissing completion data (from the Common Core of Data) for at least two thirds of metropolitan enrollment.
Student quality in this panel is the index constructed from the SAT data, averaged over schools in the district.

60

Figure 1.1.
Schematic: Illustrative allocations of effective schools in Tiebout equilibrium, by size of peer
effect and number of districts

Table 1.7.
Effect of Tiebout choice on average SAT scores across MSAs
(A)
40.7
(9.2)

Choice index
MSA SAT-taking rate
MSA SAT-taking rate

(B)

36.5
(10.1)

(C)
-16.6
(5.0)

28.7
(27.4)

(D)
-16.1
(5.2)

(E)
-26.3
(5.1)

(F)
-16.1
(5.0)

(G)
-14.1
(5.1)

(H)
-13.7
(5.8)

-3.5
3.8
-88.4
(13.1) (19.1) (17.8)

39.1
(68.5)

25.2
(72.9)

2

-157.2 -141.9
(81.6) (86.3)

Avg. bkgd. index, SAT-takers

1.58 1.58
1.79
1.75
(0.06) (0.07) (0.06) (0.12)

1.78
(0.12)

1.80
(0.13)

0.6

(1.1)

0.1
(1.1)

-0.2
(1.3)

Pop: Frac. Black

39.9
(22.2)

41.5
(22.0)

50.3
(41.8)

Pop: Frac. Hispanic

51.0
(14.1)

58.1
(14.5)

63.2
(29.7)

ln(Population)

5.9
(0.9)

ln(mean HH inc.)

-4.3
(8.6)

Gini, HH inc.

-2.8
(8.6)

-5.0
(9.8)

-180.9 -170.2 -178.5
(61.8) (61.6) (70.7)

Pop: Frac. BA+
Foundation plan state
Pop: Frac. White

164.6
(27.0)

162.6
(26.8)

163.8
(33.1)

-3.2
(2.4)

-2.9
(2.3)

-2.8
(2.4)

2

4.7
(24.1)

ln(Density)

1.0
(1.6)

Pop: Frac. LTHS

2.4
(33.0)
y

Census division FEs

n

R2

0.10

n
0.11

n

n

0.80

0.80

y
0.87

y

y

0.93

0.93

0.93

Panel A: Infinitesimal districts, with
no concern for peer group (δ = 0)

Panel B: Infinitesimal districts, with
moderate concern for peer group (δ = 1.5)
2.5

2.5

2

2

District
Effectiveness /
Desirability

1.5

(µ j )

1

0.5

0

0

0.2

0.4

0.6

Family background (x ij )

(x j δ + µ j )

1

0.5

0

District
Desirability

1.5

0.8

1

Effectiveness

(µ j )

0

Panel C: Ten districts, with
no concern for peer group (δ = 0)

0.2

0.4

0.6

Family background (x ij )

0.8

1

Panel D: Ten districts, with
moderate concern for peer group (δ = 1.5)
2.5

2.5

2

2
1.5
1

1

0.5

0.5

0

0
0

0.2

0.4

0.6

District
Desirability

1.5

District
Effectiveness /
Desirability

0.8

Effectiveness

0

1

Panel E: Three districts, with
no concern for peer group (δ = 0)

0.2

0.4

0.6

0.8

1

Panel F: Three districts, with
moderate concern for peer group (δ = 1.5)
2.5

2.5

2

2
1.5
1

District
Desirability

1.5

District
Effectiveness /
Desirability

1

Effectiveness

0.5

0.5

0

0
0

0.2

0.4

0.6

0.8

0

1

0.2

0.4

0.6

0.8

1

Notes: Each panel illustrates one possible equilibrium in a market characterized by the listed market structure and parental
valuations. In each panel, income is uniformly distributed and effectiveness parameters are equally spaced on the [0, 1]
interval. See text for details.

Notes : Dependent variable is the weighted mean SAT score at the MSA level; there are 177 MSAs in the
sample. MSAs are weighted by the sum of SAT-taker weights.

61

62

Figure 1.2.
Simulations: Average effectiveness of equilibrium schools in 3- and 10-district markets, by
income and importance of peer group
Panel A: No concern for peer group (δ=0)

Panel B: Small concern for peer group (δ=0.5)
2

0

-1

10 districts

1

1

No concern for peer group
(δ =0)

0
-1
-2

-2
0

0.2

0.4

0.6

Income Percentile

0.8

0

1

Panel C: Moderate concern for peer group (δ=1.5)

0.2

0.4

0.6

Income Percentile

0.8

1

Panel D: Large concern for peer group (δ=3)

2

Avg. Effectiveness of
Equilibrium School

2

1
0

1
0

Small concern for peer group
(δ =0.5)

0.75
with respect to peer group

1

θ (δ, J ): Average slope of effectiveness

3 districts

Avg. Effectiveness of
Equilibrium School

Avg. Effectiveness of
Equilibrium School

2

Avg. Effectiveness of
Equilibrium School

Figure 1.3.
Simulations: Slope of effectiveness with respect to average income in Tiebout
equilibrium, by market structure and importance of peer group

Moderate concern for peer
group (δ =1.5)

0.5

0.25
Large concern for peer group
(δ =3)

-1

-1

0

-2

-2
0

0.2

0.4

0.6

Income Percentile

0.8

1

0

0.2

0.4

0.6

Income Percentile

0.8

1

Notes : Each horizontal segment in each figure represents the average of 5,000 draws, where income has a standard normal
distribution and effectiveness parameters for each income bin are drawn from the same distribution, then permuted to find an
equilibrium assignment. See text for details.

63

2

3

4

5

6
7
# of districts (J )

8

9

Notes : Each point is the coefficient of a separate "within" regression of school effectiveness ( µ ) on average
income, estimated on 5,000 simulated markets with a fixed effect for each market. See text for details.

64

10

Figure 1.5.
Student characteristics and average SAT scores, school level

Figure 1.4.
Distribution of district-level choice indices across 318 U.S. metropolitan areas

1400
30

Concentrated
25

1200

Highly
Concentrated
Average SAT Score

Number of MSAs

20

15

1000

10
800
5

0
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Choice Index, District Level
(0=local monopoly; 1=infinitesimal districts)

600
800

900

1000

1100

Student Background Index
Notes : Each point represents a single school; a randomly selected 25% subsample of schools is shown here. Circle areas
are proportional to the sum of SAT-taker weights at the school. The dark line represents a weighted regression on the full
sample with fixed effects for 177 MSAs; the line has slope 1.74.

65

66

Figure 1.7.
"Upper limit" effect of fully decentralizing Miami's school governance on the
across-school distribution of SAT scores

Figure 1.6.
Nonparametric estimates of the school-level SAT score-peer group relationship, by
choice quartile

1500
Observed scores

1200

Average SAT Score

1300

Kernel Mean SAT Score

1100

1000

900

700
850

900

950

1000

1050

Fitted trend line

900

500
800

1100

Average Background Index

Notes : Figure displays kernel estimates (using an Epanechnikov kernel and a bandwidth of 5 points) of the school-level
conditional mean SAT score as a function of the school average background index in each of 4 quartiles of the districtlevel Tiebout choice index. Schools are weighted by the number of SAT-takers, with weights adjusted so that MSAlevel total weights are proportional to 17-year-old populations. Estimates are not displayed for background index
values below the first percentile or above the 99th percentile of the school-level distribution.

67

1100

700

Least-Choice Quartile
3rd Quartile
2nd Quartile

Greatest-Choice Quartile

800

Upper limit effect of move to
maximum choice

Counterfactual trend line

900

1000
Average student background

1100

Notes : Hollow circles are observed average SATs at schools in the Miami PMSA; circle areas are proportional to
the square root of the number of SAT-takers at the school. "Fitted trend line" represents fitted values from the
model in Table 1.4, Column E. "Counterfactual trend line" represents the fitted values after complete
decentralization of Miami school governance (i.e. after the choice index goes from 0 to 1), if the choicebackground index interaction effect is assumed to be at the upper limit of the estimated 95% confidence region
from that model. Shaded circles represent counterfactual SAT averages for the schools that observed Miami peer
groups might attend under these assumptions.

68

public and private students where Hoxby’s sample includes only public schools; and use

Chapter 2.

somewhat different control variables and weighting strategies than does Hoxby’s analysis.

Does Competition Among Public Schools Really
Benefit Students? A Reappraisal of Hoxby (2000a)

uncovering the sources of the divergent conclusions. I begin by building a sample and

This chapter presents a reanalysis of Hoxby’s data, conducted with an eye toward

specification that mirrors as closely as possible that described in her published paper. Even
with the restricted-access National Education Longitudinal Study (NELS) data that Hoxby
uses, however, I am unable to replicate her exact sample or point estimates. Using one of

2.1.

Introduction

Hoxby’s two instruments—I have been unable to obtain or replicate her “larger streams”

Hoxby (2000a) argues that in metropolitan areas where governance of schools is
divided among many small school districts, each with a local monopoly, the need to attract

variable for use in the current analysis—I estimate a small, insignificant negative effect of
choice on public school students’ test scores.

residents may constrain school administrators from their self-interested tendencies to

I go on to consider the robustness of the NELS-based analysis to four potentially

inefficient production. Unlike some previous empirical tests of forms of Brennan and

important modifications of the basic replication specification. I find several causes for

Buchanan’s (1980) Leviathan Hypothesis, Hoxby finds significant positive effects of

concern about the validity of Hoxby’s conclusions, as estimates of models similar to hers

jurisdictional fragmentation on student outcomes, which she interprets as evidence in

appear to be quite sensitive to the exact sample and specification and to have substantially

support of the claim that schools respond to “Tiebout”-style competition (Tiebout, 1956).

greater sampling variability than her reported standard errors suggest.

Hoxby’s results appear to conflict with the conclusion in the previous chapter that

First, I propose an alternative instrument intended to exploit the same source of

choice among jurisdictions is unlikely to create incentives for schools to become more

exogenous variation used by Hoxby’s “streams” instruments. My proposed instrument, a

effective. The most direct conflicts are with Table 1.7, which indicates a significant negative

measure of the degree of choice in 1942, is substantially more powerful than the streams

effect of “Tiebout choice” on average SAT scores across metropolitan areas, and with Table

variables, while arguably equally valid. Like less precise estimates using Hoxby’s “smaller

B2 (in the Appendix), which presents similar but mostly insignificant estimates from

streams” instrument, the 1942 choice instrument indicates essentially zero effect of choice

instrumental variables specifications similar to Hoxby’s. However, there are potentially

on student test scores.

important differences between the two analyses: The SAT regressions are conducted at the
metropolitan area level, in contrast to Hoxby’s individual-level regression; include both

Second, I note potentially important coding errors in the data set used to link NELS
schools to the metropolitan areas in which they are located. When these coding errors are

69

70

repaired—using information on the demographic characteristics of schools’ zip codes as an

hers that allows for school effects, and also consider less parametric estimators that are

independent source of information on the schools’ locations—the estimated choice effect

robust to more general forms of residual autocorrelation. All of my autocorrelation-robust

becomes substantially smaller (more negative) for all specifications considered.

estimators produce substantially larger standard errors than are implied by the classical

Third, I address the implications of Hoxby’s restriction of her sample to students
enrolled in public schools. Hoxby notes (Table 6) a significant negative effect of public

assumptions. They indicate that even Hoxby’s point estimate of the choice effect may be
indistinguishable from zero when its sampling error is estimated appropriately.

school competition on private enrollment rates. Hsieh and Urquiola (2002) point out that if

I conclude that Hoxby’s positive estimated effect of interdistrict competition on

the marginal private school student is positively selected, the effect of choice on average

school productivity is not robust, and that a fair read of the NELS evidence suggests that

public-sector student performance is an upward-biased estimate of choice’s effect on school

any such effect is likely small and indistinguishable from zero. I do not find evidence of

productivity. I test for this by including a control for the MSA private enrollment rate in

endogeneity of the choice index to school quality, suggesting that the more precise negative

Hoxby’s base model, and also by estimating her specification on a sample that includes both

(but insignificant) OLS effect of school choice on student outcomes should be preferred to

public and private schools. The first test offers supportive evidence of the hypothesized

less precise IV estimates. As I am unable to duplicate Hoxby’s precise sample, however, I

bias, as the point estimate of the choice effect is smaller in models that control for the

cannot be sure that these results would hold up in that sample. Similarly, as I consider here

private enrollment rate. The second test is less conclusive, shrinking the estimated effect

only one of Hoxby’s specifications, I cannot speak to the effect of the current adjustments

when the streams instrument is used but producing slightly larger estimates in other

on the other specifications in her paper. An implementation of Hoxby’s specification in the

specifications. One explanation may be that the NELS sample, with fewer than two private

SAT data supports my conclusions from the NELS, and indicates that the significance of the

schools per metropolitan area, is simply too small to estimate metropolitan private

effects indicated in Tables 1.7 and B2 may also be sensitive to the precise specification used.

enrollment shares reliably.
Finally, I study the sampling error of the coefficients in individual student regression
models similar to those that Hoxby estimates. Regression errors of students within the same

2.2.

Data and Methods
Hoxby studies the cross-sectional relationship between student outcomes and the

metropolitan area, district, or school may be correlated, and classical assumptions therefore

degree of competition among public education providers. She considers two measures of

probably understate the variance of coefficient estimates. Hoxby proposes an error

intergovernmental competition within a metropolitan area—essentially, the number of

components model in which there are metropolitan- and district-level error components, but

schools and the number of districts per student, adjusted for the uniformity of school and

no component coming from the school itself. I implement a variance estimator similar to
71

72

district sizes. Her primary discussion and her main results relate to a choice index defined

minimizing the number of decisions required of the analyst. The NELS data offer

over districts, however, and I study that index exclusively.

substantially better geocoding than the NLSY and a stronger link of students to their

Hoxby argues that the current district choice index is potentially endogenous to

schools. Within the NELS, the 8th grade scores permit the most straightforward analysis:

school quality, if consolidation of school districts has been less prevalent in areas with poor

The NELS is a panel beginning with an initial sample of 8th graders in 1988, with several

schools.1 She proposes that measures of the topographical character of the area, which may

sample “freshenings” thereafter, and later years of the data offer a multiplying array of

have influenced the initial design of district boundaries when the area was first partitioned,

weights and options for matching students to metropolitan areas.

are valid instruments for current choice. She implements this with two measures of the

I attempt to define control variables similar to those used by Hoxby. Like her, I

number of rivers and streams flowing through each metropolitan area. One is derived from

draw district-level demographic characteristics from the School District Data Book (SDDB),

a publicly available electronic data source, the Geographic Names Information System

a tabulation of data from the 1990 Census along school district boundaries. However, where

(GNIS), and the other from a hand count of larger rivers on printed maps. Only the first of

Hoxby derives metropolitan area demographic characteristics from the City and County

2

these variables is available for the current analysis. Inspired by Hoxby’s basic reliance on

Data Book (CCDB)—which reports 1980 census demographic characteristics—I use instead

initial conditions as sources of exogenous variation, I construct a choice index from the

county-level tabulations of 1990 Census data from the Summary Tape File 3A.3

number of districts that existed in 1942 in the area covered by a current metropolitan area.

The metropolitan area definitions used at all points in my analysis are the Office of

This predates the post-war wave of consolidation that inspires Hoxby’s argument for

Management and Budget’s Metropolitan Statistical Area (MSA) definitions of June 30, 1990,

endogeneity of current choice, and plausibly leverages the initial conditions that are the

used to characterize metropolitan areas in 1990 census data. Each enumerated sub-area

source of topographic variables’ power.

(PMSA) within the largest urban agglomerations is treated as a distinct metropolitan area.

Hoxby’s data on student outcomes are drawn from two sources, the NELS and the

I use data from the 1990 Common Core of Data (CCD), an annual census of public

National Longitudinal Survey of Youth (NLSY). She analyzes several outcome measures

schools and school districts, to construct Hoxby’s Herfindahl-based index of choice among

from each data set: 8th, 10th, and 12th grade test scores from the NELS, and educational

districts. Heeding Urquiola’s (1999) warning about the distinction between elementary and

attainment and long-run income from the NLSY. I focus here on 8th grade reading scores
from the NELS. This decision is intended to facilitate the replication of Hoxby’s sample by
1 Figure B1 (in the Appendix) provides a time series of district consolidation that casts doubt on the claim that
this is an important source of endogeneity to current school quality.
2 This is evidently the more powerful of Hoxby’s instruments (see her Table 2, reported in Table 2.1 below).

3 A more exact replication would use the 1980 characteristics. I rely on 1990 data for two reasons. First, this
seems a more appropriate measure of MSA characteristics relevant to 8th grade students in 1988. Second, I am
unable to determine how Hoxby calculates one of her control variables, the “ethnic homogeneity index,” from
the CCDB, which does not seem to tabulate ancestry.

73

74

secondary districts, which parents cannot be said to choose between, I construct the choice
index using enrollment in grades 9-12 only.

It is worth noting that this analysis has relied upon the description of Hoxby’s
methods contained in her published paper, and that there may be differences between

The NELS data are matched to metropolitan areas in two ways. First, following
what appears to be Hoxby’s approach, the district codes on the restricted-access NELS file
are used to match NELS public schools to the 1990 CCD, which contains MSA codes for

metropolitan districts. This yields a base replication sample of 11,480 students with valid 8th
grade reading scores and demographic characteristics, only slightly more than Hoxby’s

Hoxby’s original data and my replication sample beyond those described above.5
2.2.1.

Econometric framework
I develop here a much-simplified version of Hoxby’s notation that suffices to

describe the issues of present concern. Let i index students; s schools, d school districts, and
m metropolitan areas. Hoxby’s basic model can be expressed as:

reported sample of 10,790, which I am unable to replicate despite repeated efforts.4 A

Aisdm = X isdm β + e isdm ,

second MSA match exploits variables on the NELS school file that provide detailed
demographic characteristics of the school’s zip code. Two of these—exact counts of

(1)

where X isdm is a vector that includes the metropolitan area choice index C m and other

housing units and population—uniquely identify zip codes in the STF 3B tabulation of the

control variables that may vary at any of the four levels considered here; Aisdm is a student

1990 census, the source of the NELS data. As most zip codes lie either entirely within an

outcome; and e isdm is an error term that may be correlated with C m but is not correlated with

MSA or entirely outside any MSA, this uniquely assigns the vast majority of schools in the

the remainder of X isdm or with a vector of instruments Z m .

NELS. The zip code match indicates that the CCD incorrectly codes the MSA for 24 NELS
schools. Section 2.4 explores the implications of this for the estimated model.

Hoxby estimates β by traditional least squares regression and by instrumental
variables, using student-level observations in either case. Unless the determinants of student
performance are completely specified in X, errors in (1) would not, in general, be expected
to be independent across students in the same school, district, or even metropolitan area.
Thus, we might write

My sample includes students from 197 MSAs, substantially fewer than the 211 that Hoxby reports. There
seem to be a number of typographical and coding errors in Hoxby’s MSA counts, however. Her 12th grade
NELS sample, for example, is reported as representing 316 MSAs in her Table 3 but only 209 in Table 4. One
can obtain Hoxby’s 211 figure (for the 8th grade sample) by matching the full set of NELS schools to the CCD
and counting all unique MSA codes, including those for a failed NELS-CCD match and for a non-metropolitan
school in the CCD. When missing value and invalid or duplicative MSA codes (the Denver PMSA, e.g., is
variously coded as 2080 and 342080) are eliminated, this drops to 203. It drops further to the above 197 figure
when only NELS schools that provide valid observations for the 8th grade sample are included. The count can
be increased somewhat using different MSA definitions or, as I discuss below, by repairing some invalid codes
on the CCD. I have not found a sample definition, however, that produces more than 205 MSAs.
4

75

e isdm = µ m + θ dm + ψ sdm + ε isdm ,

5 Two differences seem especially likely. First, there are apparently several versions of the SDDB data in
circulation. The data used here were generously provided by Cecilia Rouse, who obtained them from the
National Center for Education Statistics’ original contractor, and seem to be more complete than are other
extant versions. I am unsure what version Hoxby used. A second potential difference derives from the
construction of the choice index—I am unsure which measure of enrollment Hoxby used for this purpose.

76

(2)

with each component identically and independently distributed across markets, districts,

In most of the results presented here, I do not account for the non-classical error

schools, or individuals and independence of components across aggregation levels. In this

structure but instead report conventionally calculated standard errors. Surprisingly, these are

generalized random effects model, the most efficient estimator is maximum likelihood,

quite similar to those that Hoxby reports from her random effects model. In Section 2.6, I

although feasible generalized least squares is asymptotically equivalent. OLS (or IV) that

explore the standard error calculation, using first an implementation of the Moulton

does not take account of the error structure is nevertheless consistent, although traditional

estimator and second the less parametric “cluster” estimator that does not impose the

estimators of the sampling error of the resulting coefficients are biased. The true sampling

structure of (2) and (4) but allows for arbitrary correlation among observations within a

variance of OLS is given by Moulton (1986):

single MA. All of my autocorrelation-robust standard errors are substantially larger than are

( )

var βˆ ols = (X ' X )−1 X ' ΩX (X ' X )−1 , where

(3)

Ω ≡ var (e ) = σ µ2 Q m + σ θ2 Q dm + σ ψ2 Q sdm + σ ε2 I N .

(4)

the conventional estimates, and indeed they suggest that even Hoxby’s relatively large point
estimate of the choice effect may be indistinguishable from zero when standard errors are
appropriately calculated. The cluster estimator, in particular, is quite well behaved, very

I N is an N-by-N identity matrix, while Q m , Q dm , and Q sdm are block-diagonal matrices

nearly duplicating the more parametric, more involved Moulton-style estimators.

consisting of blocks of ones within each metropolitan area, district, or school, respectively
and zero elsewhere. (That is, Q m ≡ WW ' , where W is an N-by-M matrix of indicators for

2.3.

M metropolitan areas, and Q dm and Q sdm are defined similarly.) The extension is

Replication
Table 2.1 reports Hoxby’s “first stage” model (from her Table 2) and analogous

straightforward to

models derived from the replication sample.7 Although the instrument sets and samples are

( ) [

var βˆ iv = X ' Z (Z ' Z )−1 Z ' X

]

−1

[

X ' Z (Z ' Z )−1 Z ' ΩZ (Z ' Z )−1 Z ' X X ' Z (Z ' Z )−1 Z ' X

]

−1

. (5)

Moulton (1986) proposes a feasible estimator of (3) that simply replaces the variance

component terms in (4) with consistent estimates (σˆ µ2 , σˆ θ2 , σˆ ψ2 , σˆ ε2 ) . Hoxby writes that she
uses Moulton’s formula allowing for error components coming from the MSA and district,
implicitly imposing σ ψ ≡ 0 .
2

6

6 There are several available estimators of the error component variances in (4), and neither Moulton nor
Hoxby specifies which is to be used. These parameters may be estimated from the the contrast between
individual and group-mean residual variances; from the residual variance of between and within estimators;

from the covariances among observations within groups; or from an optimal minimum distance estimator
using the entire empirical covariance matrix ee ' . In finite samples, these will produce slightly different
estimates of var βˆ ols . My implementation uses the first of these, as described in Greene’s (2000) discussion
of random effects in unbalanced panels.
7 The models in Table 2.1 are estimated on the universe of MSAs, and are therefore slightly different than the
actual first stages to the IV regressions shown later, which are estimated on individual student observations
from a subset of MSAs. As in the student-level sample, my MSA-level sample size differs from Hoxby’s. One
possible explanation is that Hoxby reports the first stage from her NLSY sample, which is matched to different
metropolitan definitions than are the NELS data. There are 314 MSAs according to the 1983 county-based
definitions; when additional codes for failed matches and non-metropolitan are included one might obtain
Hoxby’s reported count of 316.

( )

77

78

slightly different, the basic results are similar.8 Note that in Hoxby’s model, reported in

fails to reject equality of the two replication models. As Hoxby’s IV model clearly rejects the

Column 1, the “smaller streams” instrument accounts for a much larger share of the variance

OLS point estimates, this suggests that her results may derive primarily from the larger

of the choice index than does the “larger streams” instrument, which is not available for the

streams instrument. This variable is not available for the current analysis, and the reader

current analysis. Moreover, in the replication model excluding the latter variable, in Column

should keep in mind the possibility that conclusions from the replication analysis may not

B, the former variable’s coefficient is similar to that reported by Hoxby. Finally, note that

generalize to the model actually estimated by Hoxby.

the 1942 choice index is a substantially more powerful predictor of 1990 choice than are
either of the streams variables.

The estimates in columns F and G use the 1942 district structure as an instrument
for the 1990 choice index, first alone and then in combination with the smaller streams

Table 2.2 reports basic replication estimates of OLS and IV models for the NELS 8th

variable. These indicate negative, though statistically insignificant, effects. Hausman tests

grade reading score. Columns A and B report Hoxby’s reported coefficient and standard

for these models fail to reject equality of the choice coefficient with that indicated by OLS,

error on the district choice index, estimated by OLS and IV respectively. Column C reports

although tests of the full coefficient vector do reject. Recall, though, that the replication

th

her coefficients from the IV model for 12 grade reading scores, the baseline model in her

standard errors are calculated under classical assumptions, assuming iid errors, and likely

paper and the only one for which control coefficients are reported. Columns D through G

overstate the precision of the estimates.9

report OLS and three different IV estimates of the 8th grade reading model using the
replication sample. The control variable coefficients in the replication sample are broadly
th

similar to those reported by Hoxby from her model for 12 grade scores.
Columns D and E both report negative point estimates of the choice effect, each

2.4.

Sensitivity to Geographic Match
There are several inconsistencies and apparent coding errors in the CCD

metropolitan area variable. The precision and accuracy of coefficient estimates can be

notably smaller than is indicated for the corresponding model from Hoxby’s paper in

improved by removing the measurement error that these coding errors produce. One error,

Columns A and B. The divergence of OLS estimates suggests that this is largely due to

duplicate codes for some MSAs, is mentioned above (see footnote 4), and is corrected in the

differences in the sample and in control variables rather than to the absence of the “larger

base replication sample analyzed in Section 2.3. Others require more caution. In this

streams” instrument in column E. However, the difference between OLS and IV estimates

section I report estimates of the choice effect from samples that repair apparently erroneous

is much smaller in the replication sample than in Hoxby’s results, and indeed a Hausman test
One major point of divergence is the population coefficient. Hoxby reports that her population measure is
scaled in thousands, so I multiply her coefficient by 1,000 to obtain the effect-per-ten million reported in Table
2.1. It seems clear that Hoxby’s coefficient is actually scaled similarly to mine.

9 Given this, it is surprising that the replication standard errors in columns D and E are so similar in magnitude
to those reported by Hoxby for corresponding models (columns A and B), which she describes as “us[ing]
formulas (Moulton, 1986) for data grouped by districts and metropolitan areas.” I revisit the standard error
question in Section 2.6, where I obtain replication standard errors using Moulton’s formulas that are
substantially larger than those reported in Table 2.2.

79

80

8

MSA codes in the CCD. I use two sources of independent information on schools’

metropolitan status. Correcting the MSA code in these cases affects 8 NELS schools.10

locations: The county codes contained in the CCD file, which are sometimes inconsistent

Row 3 of Table 2.3 indicates that this produces larger (more positive) estimates of the choice

with the reported MSA codes but which I take to be generally more accurate, and

effect in the IV specifications.

demographic characteristics of schools’ zip codes from the NELS school survey, which can

Row 4 of Table 2.3 adds to the sample four schools that are non-metropolitan

be linked to MSAs—through the zip code tabulation of 1990 Census data, the STF-3B, from

according to the CCD but whose NELS zip code information places them within MSA

which they are drawn—with ambiguity in only a small fraction of cases.

counties. Row 5 adds an additional school in New England, where counties are not

The first row of Table 2.3 repeats the choice index coefficients from Table 2.2, while

sufficient to establish metropolitan status because MSAs are based on towns rather than

the remaining rows report the estimated coefficients as corrections to the CCD MSA codes

counties, for which the zip code location and the CCD school address agree that it is indeed

are gradually implemented, beginning with the clearest errors and proceeding to less obvious

within a metropolitan area.11 These two final sample alterations account for less than one

cases. The results indicate that the estimated choice effect is quite sensitive to the exact

percent of the sample, but nevertheless have large downward effects on the estimated choice

sample used, and that it is smaller in the repaired sample than in that used in Table 2.2.

coefficient. Most notable is the final alteration, which affects less than 0.2% to the sample

The most obvious coding error in the CCD consists of obsolete MSA codes. Thus,

but reduces the streams estimate of the choice effect by 0.27.

for example, the Kansas City, Kansas school district is coded as being in MSA 3755.

2.5.

Although the 1990 CCD purports to report 1990 MSA codes, there is no MSA numbered

Are Estimates From the Public Sector Biased?

Hoxby’s analysis is limited to NELS students attending public schools, as are the

3755 in 1990. However, there is a 1983 PMSA with this number, the Kansas City, KS
PMSA, which in 1990 is demoted (with the Kansas City, MO PMSA) into the Kansas City,

replications presented thus far. This can create sample selection bias, which would—under

KS-MO MSA, number 3760. Repairing errors of this sort adds 11 NELS schools and 216

reasonable assumptions—be expected to bias the choice effect upward relative to the effect

students to the replication sample. Row 2 of Table 2.3 indicates, however, that this has

of interest in Hoxby’s paper, the response of public school administrators to competitive

negligible effects on the estimated coefficients.

pressures. This point has been made convincingly by Hsieh and Urquiola (2002) in the

There are several additional districts in the CCD for which the MSA codes are valid
but inconsistent with the reported county location. One example is the Baker County
School District in Florida, reported as in the Jacksonville MSA despite Baker County’s non-

10 A potential explanation for these inconsistencies is that some districts may span counties, serving areas both
inside and outside a metropolitan area. For this reason, I checked the CCD county codes against the NELS zip
code location—which should describe the location of the school itself rather than that of the district
headquarters—before overruling the CCD MSA assignment, although in practice these never disagreed.
11 Unfortunately, the confidentiality of the geocode NELS data preclude a description of the specific changes.

81

82

context of a Chilean school choice program; I merely summarize a simplified version of their
argument in the current notation.

There are two obvious ways to correct Hoxby’s specification for this bias. First, in
the spirit of so-called “Heckman corrections” (Heckman, 1979; Card and Payne, 2002), one
can control directly for functions of the MSA private enrollment rate in models for public

Suppose that X isdm in equation (1) contains all school- and district-level variables

school students. Second, one can estimate the model on a sample that includes private

that differ systematically between the public and private sectors, so that
E[θ dm + ψ sdm |s is a public school ] = E[θ dm + ψ sdm |s is a private school ] = 0 .

(6)

It still may be the case that students who self-select into private schools differ systematically

school students. If the sample selection bias is in the hypothesized direction, either strategy
should produce a smaller (more negative) estimate of the effect of interdistrict competition.
Private enrollment rates are readily measured from the 1990 STF files that provide

from those who choose the public sector. Let f m ≡ E[ε isdm |i attends public school; m ] be
the average ε isdm of public school students in MSA m. Let γˆ public be an unbiased estimator of

the choice effect on average public school scores, and let γ denote the true effect of choice

[

]

on public school productivity. It is clear that E γˆ public = γ +

∂f m
, so that γˆ public will be an
∂c m

upward biased estimate of γ if choice draws high- ε isdm students into the public sector or
low- ε isdm students in the opposite direction, while γˆ public will be downward-biased if the

MSA demographic characteristics. Table 2.4 presents estimates that control for the
metropolitan private enrollment share, on the base replication sample in Panel A and on the
repaired sample (from Row 5 of Table 2.3) in Panel B. In each case, the point estimate of
the private enrollment share variable is quite large and negative, indicating positive selection
into private schools, though this effect is never significant. More importantly, in all eight
cases (two samples by four model specifications) the estimated choice effect is substantially
smaller—to again insignificantly—when the private enrollment share variable is controlled.

selectivity effect goes the opposite direction.

The second correction, implemented in Table 2.5, is made possible by the inclusion

In her Tables 5 and 6, Hoxby demonstrates a significant negative effect of choice on
private enrollment rates. She interprets this as evidence that “choice among public schools
is a substitute for choice of private schools,” and suggests that a higher level of the district
choice index reduces the tendency for “families with a strong taste for education [to] leave
the public sector by shifting their children into private schools …,” (p. 1233). If taste for

of private schools in the NELS sample. This introduces two complications, however. First,
because private schools are not included in the CCD, they must be assigned to MSAs on the
basis of their zip code.12 Second, the SDDB district demographic variables are unavailable
for private schools. Hoxby argues that the coefficient of interest should not be sensitive to
the exclusion of these variables. The first two rows of Table 2.5, which present

education is positively correlated with ε isdm , this suggests that ∂f m ∂c m is positive and

specifications on the repaired public school sample both with and without district-level

therefore that γˆ public is an upward-biased estimate of the effect of competition among public

controls, indicate that this is not entirely true, as three of the four choice effect estimates are

schools on public school productivity.

12

83

For consistency, I use only the zip-code-matched repaired sample of public schools for this analysis.

84

A dissertation submitted in partial satisfaction of the requirements for the degree of doctor of philosophy

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về