Essays in the Economics of Education
UMI Number: 3183857
by
Jesse Morris Rothstein
A.B. (Harvard University) 1995
A dissertation submitted in partial satisfaction of the
requirements for the degree of
Copyright 2003 by
Rothstein, Jesse Morris
All rights reserved.
Doctor of Philosophy
in
Economics
in the
GRADUATE DIVISION
of the
UNIVERSITY OF CALIFORNIA, BERKELEY
UMI Microform 3183857
Committee in charge:
Professor David Card, Chair
Professor John M. Quigley
Professor Steven Raphael
Spring 2003
Copyright 2005 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company
300 North Zeeb Road
P.O. Box 1346
Ann Arbor, MI 48106-1346
Abstract
Essays in the Economics of Education
by
Jesse Morris Rothstein
Doctor of Philosophy in Economics
Essays in the Economics of Education
University of California, Berkeley
Professor David Card, Chair
Copyright 2003
by
Three essays consider implications of the strong association between student
background characteristics and academic performance.
Jesse Morris Rothstein
Chapter One considers the incentives that school choice policies might create for the
efficient management of schools. These incentives would be diluted if parents prefer
schools with desirable peer groups to those with inferior peers but better policies and
instruction. I model a “Tiebout choice” housing market in which schools differ in both peer
group and effectiveness. If parental preferences depend primarily on school effectiveness,
we should expect both that wealthy parents purchase houses near effective schools and that
decentralization of educational governance facilitates this residential sorting. On the other
hand, if the peer group dominates effectiveness in parental preferences, wealthy families will
still cluster together in equilibrium but not necessarily at effective schools. I use a large
sample of SAT-takers to examine the distribution of student outcomes across schools within
metropolitan areas that differ in the structure of educational governance, and find little
evidence that parents choose schools for characteristics other than peer groups.
1
This result suggests that competition may not induce improvements in educational
productivity, and indeed I do not obtain Hoxby’s (2000a) claimed relationship between
school decentralization and student performance. I address this discrepancy in Chapter
Two. Using Hoxby’s own data and specification, as described in her published paper, I am
unable to replicate her positive estimate, and I find several reasons for concern about the
validity of her conclusions.
Chapter Three considers the role of admissions tests in predictions of student
collegiate performance. Traditional predictive validity studies suffer from two important
shortcomings. First, they do not adequately account for issues of sample selection. Second,
they ignore a wide class of student background variables that covary with both test scores
To Joanie, for everything.
and collegiate success. I propose an omitted variables estimator that is consistent under
restrictive but sometimes plausible sample selection assumptions. Using this estimator and
data from the University of California, I find that school-level demographic characteristics
account for a large portion of the SAT’s apparent predictive power. This result casts doubt
on the meritocratic foundations of exam-based admissions rules.
2
i
3. College Performance Predictions and the SAT
Contents
List of Figures
iv
List of Tables
v
Preface
vi
Acknowledgements
x
1. Good Principals or Good Peers? Parental Valuation of School Characteristics,
Tiebout Equilibrium, and the Incentive Effects of Competition among
Jurisdictions
1
1.1. Introduction .........................................................................................................................1
1.2. Tiebout Sorting and the Role of Peer Groups: Intuition...........................................10
1.3. A Model of Tiebout Sorting on Exogenous Community Attributes ........................15
1.3.1. Graphical illustration of market equilibrium
21
1.3.2. Simulation of expanding choice
24
1.3.3. Allocative implications and endogenous school effectiveness
27
1.4. Data .....................................................................................................................................28
1.4.1. Measuring market concentration
28
1.4.2. Does district structure matter to school-level choice?
30
1.4.3. SAT data
34
1.5. Empirical Results: Choice and Effectiveness Sorting.................................................37
1.5.1. Nonparametric estimates
38
1.5.2. Regression estimates of linear models
39
1.6. Empirical Results: Choice and Average SAT Scores ..................................................49
1.7. Conclusion..........................................................................................................................51
Tables and Figures for Chapter 1..............................................................................................55
2. Does Competition Among Public Schools Really Benefit Students? A
Reappraisal of Hoxby (2000)
References
128
Appendices
135
Appendix A. Choice and School-Level Stratification.......................................................135
Appendix B. Potential Endogeneity of Market Structure................................................137
Appendix C. Selection into SAT-Taking............................................................................141
Appendix D. Proofs of Results in Chapter 1, Section 3...................................................144
Tables and Figures for Appendices ........................................................................................153
69
2.1. Introduction .......................................................................................................................69
2.2. Data and Methods.............................................................................................................72
2.2.1. Econometric framework
76
2.3. Replication..........................................................................................................................78
2.4. Sensitivity to Geographic Match.....................................................................................80
2.5. Are Estimates From the Public Sector Biased? ............................................................82
2.6. Improved Estimation of Appropriate Standard Errors...............................................85
2.7. Conclusion..........................................................................................................................88
Tables and Figures for Chapter 2..............................................................................................90
ii
97
3.1. Introduction .......................................................................................................................97
3.2. The Validity Model .........................................................................................................100
3.2.1. Restriction of range corrections
101
3.2.2. The logical inconsistency of range corrections
102
3.3. Data ...................................................................................................................................104
3.3.1. UC admissions processes and eligible subsample construction
106
3.4. Validity Estimates: Sparse Model.................................................................................107
3.5. Possible Endogeneity of Matriculation, Campus, and Major ...................................110
3.6. Decomposing the SAT’s Predictive Power .................................................................114
3.7. Discussion ........................................................................................................................119
Tables and Figures for Chapter 3............................................................................................122
iii
List of Figures
1.1
1.2
1.3
1.4
1.5
1.6
1.7
List of Tables
Schematic: Illustrative allocations of effective schools in Tiebout
equilibrium, by size of peer effect and number of districts ........................................62
Simulations: Average effectiveness of equilibrium schools in 3and 10-district markets, by income and importance of peer group...........................63
Simulations: Slope of effectiveness with respect to average income in
Tiebout equilibrium, by market structure and importance of peer group................64
Distribution of district-level choice indices across 318 U.S.
metropolitan areas.............................................................................................................65
Student characteristics and average SAT scores, school level ....................................66
Nonparametric estimates of the school-level SAT score-peer group
relationship, by choice quartile........................................................................................67
“Upper limit” effect of fully decentralizing Miami’s school governance
on the across-school distribution of SAT scores .........................................................68
3.1
Conditional expectation of SAT given HSGPA, three samples...............................127
B1
C1
D1
Number of school districts over time ..........................................................................160
SAT-taking rates and average SAT scores across MSAs ...........................................161
Illustration of single-crossing: Indifference curves in q-h space.............................161
1.1
1.2
1.3
1.4
1.5
Summary statistics for U.S. MSAs ..................................................................................55
Effect of district-level choice index on income and racial stratification...................56
Summary statistics for SAT sample................................................................................57
Effect of Tiebout choice on the school-level SAT score-peer group gradient........58
Effect of Tiebout choice on the school-level SAT score-peer group
gradient: Alternative specifications................................................................................59
Effect of Tiebout choice on the school-level SAT score-peer group
gradient: Evidence from the NELS and the CCD......................................................60
Effect of Tiebout choice on average SAT scores across MSAs.................................61
1.6
1.7
2.1
2.2
First-stage models for the district-level choice index ..................................................90
Basic models for NELS 8th grade reading score, Hoxby (2000b)
and replication ...................................................................................................................91
Effect of varying the sample definition on the estimated choice effect ...................92
Models that control for the MSA private enrollment share........................................93
Estimated choice effect when sample includes private schools .................................94
Alternative estimators of the choice effect sampling error, base
replication sample .............................................................................................................95
Estimates of Hoxby’s specification on SAT data .........................................................96
2.3
2.4
2.5
2.6
2.7
3.1
3.2
3.3
3.4
Summary statistics for UC matriculant and SAT-taker samples ..............................122
Basic validity models, traditional and proposed models............................................123
Specification checks ........................................................................................................124
Individual and school characteristics as determinants of SAT scores
and GPAs .........................................................................................................................125
Accounting for individual and school characteristics in FGPA prediction............126
3.5
A1
A2
Evidence on choice-stratification relationship: Additional measures.....................153
Alternative measures of Tiebout choice: Effects on segregation and
stratification .....................................................................................................................154
Effect of district-level choice on tract-level income and racial stratification .........155
First-stage models for MSA choice index....................................................................156
2SLS estimates of effect of Tiebout choice.................................................................157
Sensitivity of individual and school average SAT variation to
assumed selection parameter .........................................................................................158
Stability of school mean SAT score and peer group background
characteristics over time.................................................................................................158
Effect of Tiebout choice on the school-level SAT score-peer group
gradient: Estimates from class rank-reweighted sample...........................................159
A3
B1
B2
C1
C2
C3
iv
v
insights into the underlying processes and new ways of thinking about the available policy
Preface
options.
The first two chapters consider parents’ choice of schools for their children. The
It is a well-established fact that students’ socioeconomic background has substantial
predictive power for their educational outcomes. Children whose parents are highly
claim that parental choice can create incentives for schools to become more productive is a
educated, whose households are stable, and whose families have high incomes substantially
tenet of the neoclassical analysis of education. It relies crucially on the assumption that
outperform their less advantaged peers on every measure of educational output.
parents will choose effective, productive schools. This is far from obvious—if peer effects
With nearly as long a pedigree is the idea that these family background effects may
are important, parents may be perfectly rational in preferring wealthy, ineffective schools to
operate above the individual level. The school-level association between average student
competitors that are less advantaged but more effective, and even if there are no peer effects,
background and average performance is typically much stronger than is the same association
the strong association between school average test scores and student composition may
at the individual level. The interpretation of school-level correlations is nevertheless
make it difficult for parents to assess a school’s effectiveness. But if parents, in practice
controversial: They may arise because academic outcome measures are noisy, implying that
even if not by intent, choose schools primarily on the basis of their student composition
group means are more reliable than are individual scores; because students with
rather than for their effectiveness, the incentives created for school administrators will be
unobservably attentive parents disproportionately attend schools that enroll observably
diluted.
advantaged students; because the system of education funding assigns greater resources to
Chapter One develops this idea and implements tests of the hypothesis that school
schools in wealthy neighborhoods; or because there really are peer effects in educational
effectiveness is an important determinant of residential choices among local-monopoly
production.
school districts. I model a “Tiebout”-style housing market in which house prices ration
For many purposes, however, one need not know why it is that schools with
access to desirable schools, which may be desirable either because they are particularly
advantaged students outscore those with disadvantaged students; the fact that they do is
effective or because they enroll a desirable set of students. I develop observable implications
itself of substantial importance. This dissertation focuses on two such topics: The
of these two hypotheses for the degree of stratification of student test scores across schools,
competitive impacts of school choice programs, and the design of college admissions rules.
and I look for evidence of these implications in data on the joint distribution of student
In each case, when I incorporate into the standard analysis the key fact that student
characteristics and SAT scores. I find strong evidence that schools are an important
composition may function as a signal of student performance (and vice versa), I obtain new
component of the residential choice and that housing markets create sorting by family
income across schools. Tests of the hypothesis that this sorting is driven by parental pursuit
vi
vii
of effective schools, however, come up empty. This suggests that residential choice
implement an omitted variables estimator that is unbiased under restrictive, but sometimes
processes–and possibly, although the analogy is not particularly strong, non-residential
plausible, assumptions about the selection process.
choice programs like vouchers—are unlikely to create incentives for schools to become
more effective.
A second shortcoming of the validity literature is more fundamental. In a world in
which student background characteristics are known to be correlated with academic success
This result conflicts with a well-known recent result from Hoxby (2000a), who
(i.e. with both SAT scores and collegiate grades), it is quite difficult to interpret validity
argues that metropolitan areas with less centralized educational governance, and therefore
estimates that fail to take account of these background characteristics. A study can identify a
more competition among local school districts, produce better student outcomes at lower
test as predictively valid without being informative about whether the test provides an
cost. In Chapter Two, I attempt to get to the bottom of the discrepancy. I reanalyze a
independent measure of academic preparedness or simply proxies for the excluded
portion of Hoxby’s data, and find reason to suspect the validity of her conclusions. I am
background characteristics.
unable to reproduce her results, which appear to be quite sensitive to the exact sample and
In University of California data, I find evidence that observable background
specification used. I find suggestive evidence, however, that her estimates, from a sample of
characteristics—particularly those describing the composition of the school, rather than the
public school students, are upward biased by selection into private schools. Moreover, an
individual’s own background—are strong predictors of both SAT scores and collegiate
investigation of the sampling variability of Hoxby’s estimates leads to the conclusion that her
performance, and that much of the SAT’s apparent predictive power derives from its
standard errors are understated, and that even her own point estimates of the competitive
association with these background characteristics. This suggests that the SAT may not be a
effect are not significantly different from zero.
crucial part of the performance-maximizing admissions rule, as the background variables
Chapter Three turns to a wholly different, but not unrelated, topic, the role of
themselves provide nearly all the information contained in SAT scores. It also suggests that
admissions exam scores in the identification of well-prepared students in the college
existing predictive validity evidence does not establish the frequent claim that the SAT is a
admissions process. The case for using such exams is often made with “validity” studies,
meritocratic admissions tool, unless demographic characteristics are seen as measures of
which estimate the correlation between test scores and eventual collegiate grades, both with
student merit.
and without controls for high school grade point average. I argue that there are two
fundamental problems with these studies as they are often carried out. First, they do not
adequately account for the biases created by estimation from a selected sample of students
whose collegiate grades are observable because they were granted admission. I propose and
viii
ix
that in Chapter 3 by the Center for Studies in Higher Education. David Card and Alan
Acknowledgements
Krueger provided the SAT data used throughout. Cecilia Rouse provided the hard-to-obtain
I am very much indebted to David Card, for limitless advice and support throughout
School District Data Book used in Chapters 1 and 2. Saul Geiser and Roger Studley of the
my graduate school career. The research here has benefited in innumerable ways from his
University of California Office of the President provided the student records that permitted
many suggestions, as have I. It is hard to imagine a better advisor.
the research in Chapter 3. The usual disclaimer applies: Any opinions, findings,
I am grateful to the members of my various committees—Alan Auerbach, John
conclusions or recommendations expressed are my own and do not necessarily reflect the
Quigley, Steve Raphael, Emmanuel Saez, and Eugene Smolensky—for reading drafts that
views of the National Science Foundation, the Fisher Center, the Center for Studies in
were far too long and too unpolished, and for nevertheless finding many errors and
Higher Education, the College Board, the UC Office of the President, or any of my
omissions.
advisors.
I have benefited from discussions with David Autor, Jared Bernstein, Ken Chay,
Last, but not least, there is a sense in which Larry Mishel deserves substantial credit
Tom Davidoff, John DiNardo, Nada Eissa, Jonah Gelbach, Alan Krueger, David Lee,
for my Ph.D., as without his determined efforts at persuasion, I would never have pursued it
Darren Lubotsky, Rob McMillan, Jack Porter, and Diane Whitmore, and from participants at
in the first place.
several seminars where I have presented versions of the work contained here. I also thank
my various officemates over the last five years, particularly Liz Cascio, Justin McCrary, Till
von Wachter, and Eric Verhoogen, for many helpful conversations. All of the research
contained here has been much improved by my interactions with those mentioned above,
and with others who I have surely neglected here.
One must live while conducting research. I thank my family and friends for putting
up with me these last five years and for helping me to stay sane throughout. I hope that I
have not been too unbearable.
Much of my graduate career was supported under a National Science Foundation
Graduate Research Fellowship. In addition, the research in Chapters 1 and 2 was partially
supported by the Fisher Center for Real Estate and Urban Economics at U.C. Berkeley and
x
xi
The potential effects of school choice programs depend critically on what
Chapter 1.
characteristics parents value in schools. Hanushek, for example, notes that parents might
not choose effective schools over others that are less effective but offer “pleasant
Good Principals or Good Peers? Parental
Valuation of School Characteristics, Tiebout
Equilibrium, and the Incentive Effects of
Competition among Jurisdictions
surroundings, athletic facilities, [and] cultural advantages,” (1981, p. 34). To the extent that
parents choose productive schools, market discipline can induce greater productivity from
school administrators and teachers. If parents primarily value other features, however,
market discipline may be less successful. Hanushek cautions: “If the efficiency of our school
systems is due to poor incentives for teachers and administrators coupled with poor decisionmaking by consumers, it would be unwise to expect much from programs that seek to
1.1.
Introduction
strengthen ‘market forces’ in the selection of schools,” (1981, p. 34-35; emphasis added).
Many analysts have identified principal-agent problems as a major source of
underperformance in public education. Public school administrators need not compete for
customers and are therefore free of the market discipline that aligns producer incentives with
consumer demand in private markets. Chubb and Moe, for example, argue that the interests
of parents and students “tend to be far outweighed by teachers’ unions, professional
organizations, and other entrenched interests that, in practice, have traditionally dominated
Moreover, if students’ outcomes depend importantly on the characteristics of their
classmates (i.e. if so-called “peer effects” are important components of educational
production), even rational, fully informed, test-score-maximizing parents may prefer schools
with poor management but desirable peer groups to better managed competitors that enroll
less desirable students, and administrators may be more reliably rewarded for enrolling the
right peer group than for offering effective instruction.
the politics of education,” (1990, p. 31).1 One proposed solution—advocated by Friedman
(1962) and others—is to allow dissatisfied parents to choose another school, and to link
school administrators’ compensation to parents’ revealed demand. This would strengthen
parents relative to other actors, and might “encourage competition among schools, forcing
them into higher productivity,” (Hoxby, 1994, p. 1).
1
Chubb and Moe also identify the school characteristics that parents would presumably choose, given more
influence: “strong leadership, clear and ambitious goals, strong academic programs, teacher professionalism,
shared influence, and staff harmony,” (p. 187). See also Hanushek (1986) and Hanushek and Raymond
(2001).
The mechanisms typically proposed to increase parental choice—vouchers, charter
schools, etc.—are not at present sufficiently widespread to permit decisive empirical tests
either of parental revealed preferences or of their ultimate effects on school productivity.2
Economists have long argued, however, that housing markets represent a long established,
potentially informative form of school choice (Tiebout, 1956; Brennan and Buchanan, 1980;
2
1
Hsieh and Urquiola (2002) study a large-scale voucher program in Chile, but argue that effects on school
productivity cannot be distinguished from the allocative efficiency effects of student stratification.
2
Oates, 1985; Hoxby, 2000a). Parents exert some control over their children’s school
A second issue is that there is little or no threat of market entry when competition is
assignment via their residential location decisions, and can exit undesirable schools by
among geographically-based school districts. In the absence of entry, administrators of
moving to a neighborhood served by a different school district. As U.S. metropolitan areas
undesirable districts are not likely to face substantial declines in enrollment. Indeed, a
vary dramatically in the amount of control over children’s school assignment that the
reasonable first approximation is that total (public) school and district enrollments are
residential decision affords to parents, one can hope to infer the effect of so-called Tiebout
invariant to schools’ relative desirability.5 Instead, Tiebout choice works by rewarding the
choice by comparing student outcomes across metropolitan housing markets (Borland and
administrator of a preferred school with a better student body and with wealthier and more
Howsen, 1992; Hoxby, 2000a).3
motivated parents. There are obvious benefits for educational personnel in attracting an
In this chapter, I use data on school assignments and outcomes of students across
schools within different metropolitan housing markets to assess parents’ revealed
advantaged population, and I assume throughout this chapter that the promise of such
rewards can create meaningful incentives for school administrators.
preferences. To preview the results, I find little evidence that parents use Tiebout choice to
My analysis of parental choices focuses on the possibility that parents may choose
select effective schools over those with desirable peers, or that schools are on average more
schools partly on the basis of the peer group offered. Although existing research does not
effective in markets that offer more choice.
conclusively establish the causal contribution of peer group characteristics to student
In modeling the effects of parental preferences on equilibrium outcomes under
outcomes (see, e.g., Coleman et al., 1966; Hanushek, Kain, and Rivkin, 2001; Katz, Kling,
Tiebout choice, it is important to account for two key issues that do not arise under choice
and Liebman, 2001), anecdotal evidence suggests that parents may place substantial weight
programs like vouchers. The first is that residential choice rations access to highly-
on the peer group in their assessments of schools and neighborhoods. Realtor.com, a web
demanded schools by willingness-to-pay for local housing.4 As a result, both schools and
site for house hunters, offers reports on several neighborhood characteristics that parents
districts in high-choice markets (those with many competing school districts) are more
apparently value. These include a few variables that may be interpreted as measures of
stratified than in low-choice markets. Increased stratification can have allocative efficiency
school resources or effectiveness (e.g. class size and the number of computers); detailed
consequences that confound estimates of the effect of choice on productive efficiency.
socioeconomic data (e.g. educational attainment and income); and the average SAT score at
the local high school. Given similar average scores, test-score maximizers should prefer
3
4
Hoxby argues that this sort of analysis can “demonstrate general properties of school choice that are helpful
for thinking about reforms,” (2000a, p. 1209). Belfield and Levin (2001) review other, similar studies.
Small-scale voucher programs may not have to ration desired schools, or may be able to use lotteries for this
purpose. One imagines that broader programs will use some form of price system, perhaps by allowing
parents to “top up” their vouchers (Epple and Romano, 1998).
3
5
Poor school management can, of course, lead parents to choose private schools, lowering public enrollment.
Similarly, areas with bad schools may disproportionately attract childless families. These are likely secondorder effects. The private option, in any case, is not the mechanism by which residential choice works but an
alternative to it: Inter-jurisdictional competition has been found to lower private enrollment rates (Urquiola,
1999; Hoxby, 2000a).
4
demographically unfavorable schools, as these must add more value to attain the same
identical peer groups. I allow a continuous distribution of student characteristics, which
outcomes as their competitors with more advantaged students.6 While it is possible that
forces parents to trade off peer group against effectiveness in their school choices. This
parents use the demographic data in this way, it seems more likely that home buyers prefer
seems a more accurate characterization of Tiebout markets, as the median U.S. metropolitan
wealthier neighborhoods, even conditional on average student performance (Downes and
area has fewer than a dozen school districts from which to choose. It leads to a substantially
Zabel, 1997).7
different understanding of the market dynamics, as Hoxy’s assumption of competing schools
With several school characteristics over which parents may choose, understanding
which schools are chosen and which administrators are rewarded requires a model of
with identical peer groups eliminates the “stickiness” that concern for peer group can create
and that is the primary focus here.
residential choice. I build on the framework of so-called multicommunity models in the
As in other multicommunity models, equilibrium in my model exhibits complete
local public finance literature (Ross and Yinger, 1999), but I introduce a component of
stratification: High-income families live in districts that are preferred to (and have higher
school desirability that is exogenous to parental decisions, “effectiveness,” which is thought
housing prices than) those where low-income families live. That this must hold regardless of
of as the portion of schools’ effects on student performance that does not depend on the
what parents value points to a fundamental identification problem in housing price-based
characteristics of enrolled students. Parental preferences among districts depend on both
estimates of parental valuations: 8 Peer group and, by extension, average student
peer group and effectiveness, and I consider the implications of varying the relative weights
performance are endogenous to unobserved determinants of housing prices. One
of these characteristics for the rewards that accrue in equilibrium to administrators of
estimation strategy that accommodates this endogeneity is that taken by Bayer, McMillan,
effective schools.
and Reuben (2002), who estimate a structural model for housing prices and community
Hoxby (1999b) also models Tiebout choice of schools, but she assumes a discrete
composition in San Francisco.
distribution of student types and allows parents to choose only among schools offering
I adopt a different strategy: I compare housing markets that differ in the strength of
the residential location-school assignment link, and I develop simple reduced-form
This does not rely on assumptions about the peer effect: The effect of individual characteristics on own test
scores, distinct from any spillover effects, is not attributable to the school, and test-score-maximizing parents
should penalize the average test scores of schools with advantaged students to remove this effect (Kain,
Staiger, and Samms, 2002).
7 Postsecondary education offers additional evidence of strong preferences over the peer group: Colleges
frequently trumpet the SAT scores of their incoming students—the peer group—while data on graduates’
achievements relative to others with similar initial qualifications, which would arguably be more informative
about the college’s contribution, are essentially non-existent. Along these lines, Tracy and Waldfogel (1997)
find that popular press rankings of business schools reflect the quality of incoming students more than the
schools’ contributions to students’ eventual salaries (but see also Dale and Krueger, 1999, who obtain
somewhat conflicting results at the undergraduate level).
6
implications of parental valuations for the across-school distribution of student
characteristics and educational outcomes as a function of the strength of this link. This
across-market approach has the advantage that it does not rely on strong exclusion
restrictions or distributional assumptions. My primary assumptions are that the causal effect
8
5
Shepard (1999) reviews hedonic studies of housing markets
6
of individual and peer characteristics on student outcomes does not vary systematically with
moving to the next lower peer group district and thus reduces the probability that wealthy
the structure of educational governance; that the peer effect can be summarized with a small
families will be trapped in districts with ineffective schools.
number of moments of the within-school distribution of student characteristics; and that
Effectiveness sorting should be observable as a magnification of the causal peer
school effectiveness acts to shift the average student outcome independent of the set of
effect, as it creates a positive correlation between the peer group and an omitted variable—
students enrolled.
school effectiveness—in regression models for student outcomes.9 This provides my
Like Baker, McMillan, and Reuben (2002), I identify parental valuations by the
identification: I look for evidence that the apparent peer effect, the reduced-form gradient
location of clusters of high income families: If parental preferences over communities depend
of school average test scores with respect to student characteristics, is larger in high-choice
exclusively on the effectiveness of the local schools, the most desirable—and therefore
than in low-choice markets. If parents select schools for effectiveness, wealthy parents
wealthiest—communities are necessarily those with the most effective schools. If peer
should be better able to obtain effective schools in markets where decentralized governance
group matters at all to parents, however, there can be “unsorted” equilibria in which
facilitates the choice of schools through residential location, and student performance should
communities with ineffective schools have the wealthiest residents and are the most
be more tightly associated with peer characteristics in these markets. If parents instead select
preferred. These equilibria result from coordination failures: The wealthy families in
schools primarily for the peer group, there is no expectation that wealthy students will attend
ineffective districts would collectively have the highest bids for houses assigned to more
effective schools in equilibrium, regardless of market structure, and the peer group-student
effective schools, but no individual family is willing to move alone to a district with
performance relationship should not vary systematically with Tiebout choice.
undesirable peers.
I use a unique data set consisting of observations on more than 300,000
The more importance that parents attach to school effectiveness, the more likely we
metropolitan SAT takers from the 1994 cohort, matched to the high schools that students
are to observe equilibria in which wealthy students attend more effective schools than do
attended. The size of this sample permits accurate estimation of both peer quality and
lower-income students. Moreover, if parental concern for peer group is not too large, the
average performance for the great majority of high schools in each of 177 metropolitan
model predicts that this equilibrium effectiveness sorting will tend to be more complete in
housing markets. I find no evidence that the association between peer group and student
high-choice markets, those with many small school districts, than in markets with more
performance is stronger in high-choice than in low-choice markets. This result is robust to
centralized governance. This is because higher choice markets divide the income
9
distribution into smaller bins, which reduces the cost (in peer quality) that families pay for
7
Willms and Echols (1992, 1993) are the first authors of whom I am aware to note the importance of the
distinction between preferences for peer group and for effective schools. They use hierarchical linear
modeling techniques (Raudenbush and Willms, 1995; Raudenbush and Bryk, 2002), and estimate school
effectiveness as the residual from a regression of total school effects on peer group. This is appropriate if
there is no effectiveness sorting; otherwise, it may understate the importance of effectiveness in output and in
parental choices.
8
nonlinearity in the causal effects of the peer group as well as to several specifications of the
does matter for student performance, but that it does not matter greatly to parental
educational production function. Moreover, although there is no other suitable data set with
residential choices.11 This could be because effectiveness is swamped by the peer group in
nearly the coverage of the SAT sample, the basic conclusions are supported by models
parental preferences or because it is difficult to observe directly. In either case,
estimated both on administrative data measuring high school completion rates and on the
administrators who pursue unproductive policies are unlikely to be disciplined by parental
National Education Longitudinal Study (NELS) sample.
exit and Tiebout choice can create only weak incentives for productive school management.
This result calls the incentive effects of Tiebout choice into question, as it indicates
that administrators of effective schools are no more likely to be rewarded with high demand
1.2.
for local housing in high-choice than in low-choice markets. To explore this further, I
estimate models for the effect of Tiebout choice on mean scores across metropolitan areas.
Tiebout Sorting and the Role of Peer Groups: Intuition
In this section I describe the Tiebout choice process and its observable implications
in the context of a very simple educational technology with peer effects. Let
t ij = x ij β + x j γ + µ j + ε ij
Consistent with the earlier results, I find no evidence that high-choice markets produce
higher average SAT scores. Together with the within-market estimates, this calls into
question Hoxby’s (1999a, 2000a) conclusion that Tiebout choice induces higher productivity
from school administrators.10
There are three plausible explanations for the pattern of findings presented here.
(1)
be a reduced-form representation of the production function, where t ij is the test score (or
other outcome measure) of student i when he or she attends school j ; x ij is an index of the
student’s background characteristics; x j is the average background index among students at
First, it may be that school and district policies are not responsible for a large share of the
school j ; and µ j —which need not be orthogonal to x j —measures the “effectiveness” of
extant across-school variation in student performance. We would not then expect to
school j, its policies and practices that contribute to student performance.12
observe effectiveness sorting, regardless of its extent, in the distribution of student SAT
scores. Second, the number of school districts may not capture variation in parents’ ability
In fact, the main empirical approach cannot well distinguish between the case where parents value
effectiveness to the exclusion of all else and that where they ignore effectiveness entirely, as in either case
effectiveness sorting may not depend on the market structure. The former hypothesis seems implausible on
prior grounds, however.
12 In the empirical application in Section 1.5, I allow for more general technologies in which the effects of
individual or peer characteristics are arbitrarily nonlinear or higher moments of the peer group distribution
enter the production function. The key assumption is that all families agree on the relative importance of
peer group and school effectiveness. This rules out some forms of interactions between x ij and ( x j , µ j )
11
to exercise Tiebout choice. Results presented in Section 1.4.2 offer suggestive evidence
against this interpretation, but do not rule it out. A final explanation is that effectiveness
10
Hoxby (2000a) argues that market structure is endogenous to school quality. Instrumenting for it and using
relatively sparse data from the NELS and the National Longitudinal Survey of Youth, she finds a positive
effect of choice on mean scores across markets. I discuss the endogeneity issue in Appendix B, and consider
several instrumentation strategies. As none indicate substantial bias in OLS results, the main discussion here
treats market structure as exogenous. Chapter 2 investigates Hoxby’s results in greater detail.
in (1). The assumption of similar preference structures is common in studies of consumer demand, and in
particular underlies both the multicommunity and hedonic literatures. If it is violated, of course, the
motivating question of whether parents prefer good principals or good peers is not well posed.
9
In view of the vast literature documenting the important role of family background
10
unions.14 It is worth noting that the relative magnitude of µ j may be quite modest. Family
characteristics—e.g. ethnicity, parental income and education—in student achievement
background variables typically explain the vast majority of the differences in average student
(Coleman et al., 1966; Phillips et al., 1998; Bowen and Bok, 1998), I assume that x ij is
test scores across schools, potentially leaving relatively little room for efficiency (or school
positively correlated with willingness-to-pay for educational quality. In the empirical analysis
“value added”) effects.15 Nevertheless, most observers believe that public school efficiency
below, I also estimate specifications that allow willingness-to-pay to depend on family
is important, that it exerts a non-trivial role on the educational outcomes of students, and
income while other characteristics have direct effects on student achievement.
that it varies substantially across schools.
Since model (1) excludes school resources, the term x j γ potentially captures both
The potential efficiency-enhancing effects of increased Tiebout choice operate
conventional peer group effects and other indirect effects associated with the family
through the assumption that parents prefer schools with µ j -promoting policies. To the
background characteristics of students at school j . For example, wealthy parents may be
extent that this is true, Tiebout choice induces a positive correlation between µ j and x j ,
more likely to volunteer in their children’s schools, or to vote for increased tax rates to
since high- x i families will outbid lower- x i families for homes near the most preferred
support education. They may also be more effective at exerting “voice” to manage agent
behavior, even without the exit option that school choice policies provide (Hirschman,
schools. Thus, active Tiebout choice can magnify the apparent impact of peer groups on
student outcomes in analyses that neglect administrative quality. Formally,
1970). Finally, student composition may operate as an employment amenity for teachers and
[
teachers that can be hired for any fixed salary (Antos and Rosen, 1975).13
]
[
]
E t j |x j = x j (β + γ ) + E µ j |x j ,
administrators, reducing the salaries that the school must pay and increasing the quality of
(2)
or, simplifying to a linear projection,
[
]
E * t j |x j = x j (β + γ + θ * ),
The effectiveness parameter in (1), µ j , encompasses the effects of any differences
(3)
across schools that do not depend on the characteristics of students that they enroll. It may
include, for example, the ability and effort levels of local administrators, their choice of
curricula, or their effectiveness in resisting the demands of bureaucrats and teacher’s
13
The distinction between direct and indirect effects of school composition is not always clear in discussions of
peer effects. Studies that use transitory within-school variation in the composition of the peer group (Hoxby,
2000b; Angrist and Lang, 2002; Hanushek, Kain, and Rivkin, 2001) likely estimate only the direct peer effect,
while those that use the assignment of students to schools (Evans, Oates, and Schwab, 1992; Katz, Kling, and
Liebman, 2001) likely estimate something closer to the full reduced-form effect of school composition.
11
More precisely, ability and effort of school personnel is included in µ only to the extent that a good peer
group does not enable a school to bid the best employees away from low- x schools. A wealthy, involved
population may not ensure high-quality, high-effort staff if agency problems produce district hiring policies
that do not reflect parents’ preferences (Chubb and Moe, 1990), or if it is difficult to enforce contracts over
unobservable components of administrator actions (Hoxby, 1999b).
15 In the SAT data used here, a regression of school mean scores on average student characteristics has an R2 of
0.74. The correlation is substantially stronger in California’s school accountability data (Technical Design
Group, 2000). Of course, these raw correlations may overstate the causal importance of peer group if there is
effectiveness sorting.
14
12
(
)
( )
where θ * ≡ cov x j , µ j var x j represents the degree of effectiveness sorting in the local
concerned only with school effectiveness, high- µ schools attract high- x families regardless
market. (For notational simplicity, I neglect the intercept in both test scores and school
of the market structure, and θ * need not vary with local competition. Similarly, when
effectiveness.) The stronger are parental preferences for effective schools (relative to
parental concern for peer group is large enough, even in highly competitive markets high- x
schools with other desired attributes), the more actively will high- x i families seek out
families are not drawn to high- µ schools, and again θ * is largely independent of market
neighborhoods in effective districts, and the larger will θ * tend to be in Tiebout equilibrium.
The weaker are parental preferences for µ j relative to other factors, the smaller will θ
structure.
This idea forms the basis of my empirical strategy. In essence, I compare the sorting
*
parameter θ * in equation (3) across metropolitan housing markets with greater and lesser
tend to be.
Importantly, one would expect the degree of local competition in public schooling
[
degrees of residential school choice. Let θ = θ (c , δ ) = E θ * |c , δ
]
be the average
(i.e. the number of school districts in the local area among which parents can choose) to
effectiveness sorting of markets characterized by the parameters c and δ , where c is the
affect the magnitude of θ * whenever parents care both about peer groups and school
degree of jurisdictional competition (i.e. the number of competing districts from which
effectiveness. The reasoning is simple: If there are only a small number of local districts and
parents can choose, adjusted for their relative sizes) and δ is the importance that parents
parents value the peer group, they may be “stuck” with a high- x /low- µ school, even in
place on peer group relative to effectiveness.17 The argument above, supported by the
housing market equilibrium, by their unwillingness to sacrifice peer group in a move to a
theoretical model developed in the next section, predicts that ∂θ ∂c > 0 for moderate values
more effective school district. These coordination failures are less likely in markets with
of δ but that ∂θ ∂c = 0 when δ is zero or large (i.e. when parents care only about
more interjurisdictional competition, as in these markets there are always alternative districts
that are relatively similar in the peer group offered, and parents are able to select effective
schools without paying a steep price in reduced peer quality.16
When parental concern for peer group is moderate, then, a high degree of public
school choice is needed to ensure that high- µ schools attract high- x families, and θ * tends
to be larger in high-choice than in low-choice markets. On the other hand, when parents are
16
effectiveness or only about peer group). To the extent that θ tends to increase with choice,
then, we can infer that parents’ peer group preferences are small enough to prevent a
breakdown in high-choice markets of the sorting mechanism that rewards high- µ
administrators with high- x students. On the other hand, if θ is no larger in high-choice
17
θ ( c , δ ) is treated as a random variable, as there can be multiple equilibria in these markets. My empirical
strategy assumes that δ is constant across markets, and that a sample of markets with the same c parameter
*
will trace out the distribution of θ . An equilibrium selection model in which families could somehow
coordinate on the most efficient equilibrium would violate this assumption.
*
In the high choice limit, this is analogous to Hoxby’s (1999b) model of choice among schools with identical
peers.
13
14
than in low-choice cities it is more difficult to draw inferences about parental valuations,
My model is a much simplified version of so-called “multicommunity” models. I
which may be characterized either by very small or very large δ . In either case, however, we
maintain the usual assumptions that the number of communities is fixed and finite, and that
can expect little effect of expansions of Tiebout choice on school efficiency, as in the former
access to desirable communities is rationed through the real estate market.19 There is no
even markets with only a few districts can provide market discipline and in the latter no
private sector that would de-link school quality from residential location. Although some
plausible amount of governmental fragmentation will create efficiency-enhancing incentives
authors (i.e. Epple and Zelenitz, 1981) include a supply side of the housing market, I assume
for school administrators.
that communities are endowed with perfectly inelastic stocks of identical houses. 20
Communities differ in three dimensions: The average income of their residents and the
1.3.
A Model of Tiebout Sorting on Exogenous Community Attributes
rental price of housing, both endogenous, and the effectiveness of the local schools.21
An important omission is of all non-school exogenous amenities like beaches, parks,
In this section, I build a formal model of the Tiebout sorting process described
above. As my interest is in the demand side of the market under full information, I treat the
views, and air quality. I develop here a “best case” for Tiebout choice, where schools are the
distribution of school effectiveness as exogenous and known to all market participants.18 I
only factors in neighborhood desirability. Amenities could either increase or reduce the
demonstrate that Tiebout equilibrium must be stratified as much as the market structure
extent of effectiveness sorting relative to this pure case, though the latter seems more likely.22
allows: Wealthy families always attend schools that are preferred to those attended by low-
If, as the hedonics literature implies, schools are one of the more important determinants of
income families. There can be multiple equilibria, however, and the allocation of effective
neighborhood desirability (see, e.g., Reback, 2001; Bogart and Cromwell, 2000; Figlio and
schools is not uniquely determined by the model’s parameters. Conventional comparative
19
This does not rule out administrative responses to the incentives created by parental choices, as these are a
higher order phenomenon, deriving from competition among schools to attract students rather than from
reactions of school administrators to the realized desirability of their schools. My discussion presumes,
however, that competition does not serve to reduce variation in school effectiveness.
Where most models incorporate within-community voting processes for public good provision (Fernandez
and Rogerson, 1996; Epple and Romano 1996; Epple, Filimon and Romer, 1993), income redistribution
(Epple and Romer, 1991; Epple and Platt, 1998), or zoning rules (Fernandez and Rogerson, 1997; Hamilton,
1975), I simply allow for preferences over the mean income of one’s neighbors. These preferences might
derive either from the effects of community composition on voting outcomes or from reduced-form peer
effects in education.
20 Tiebout equilibria must evolve quickly to provide discipline to school administrators, whose careers are much
shorter than the lifespan of houses. Inelastic supply is probably realistic in the short term, except possibly at
the urban fringe. Nechyba (1997) points out that it is much easier to establish existence of equilibrium with
fixed supply.
21 The inclusion of any exogenous component of community desirability is not standard in multicommunity
models, which, beginning with Tiebout’s (1956) seminal paper, have typically treated communities as ex ante
interchangeable. This leaves no room for managerial effort or quality except as a deterministic function of
community composition, so is inappropriate for analyses of the incentives that the threat of mobility creates
for public-sector administrators.
22 Amenities might draw wealthy families to low-peer-group districts, improving those districts’ peer groups
and reducing the costs borne by other families living there. This could increase effectiveness sorting,
although the effect would be weakened if there were a private school sector. Offsetting this, amenities might
also prevent families from exiting localities with ineffective schools, reducing effectiveness sorting just as
does concern for peer group.
15
16
statics analysis is not meaningful when equilibrium is non-unique, as the parental valuation
parameter affects the set of possible equilibria rather than altering a particular equilibrium.
To better understand the relationships between parental valuations, market concentration,
and the equilibrium allocation, the formal exposition of the model is followed by simulations
of markets under illustrative parameter values.
18
Lucas, 2000; Black, 1999), the existence of relatively unimportant amenities should not much
jurisdiction j is U ij = U ( x i − h j , x j δ + µ j ) , where U is twice differentiable everywhere with
alter the trends identified here.
U 1 and U 2 both positive.25 I make the usual assumption about the utility function:
Turning to the formal exposition, assume that a local housing market—a
Single Crossing Property: U 12U1 − U 11U 2 > 0 everywhere.
metropolitan area—contains a finite number of jurisdictions, J, and a population of N
Single crossing ensures that if any family prefers one school quality-price
families, N >> J . Each jurisdiction, indexed by j, contains n identical houses and is
endowed with an exogenous effectiveness parameter, µ j . No two jurisdictions have
combination to another with lower quality—where quality is q j ≡ x j δ + µ j —all higherincome families do as well; if any family prefers a district to another offering higher quality
identical effectiveness.
Each family must rent a house. There are enough houses to go around but not so
many that there can be empty communities: n( J − 1) < N < nJ .
23
All homes are owned by
absentee landlords, perhaps a previous generation of parents, who have no current use for
education, all lower-income families do also. (This is proved in Appendix D.) As in other
multicommunity models, the single crossing assumption drives the stratification results
outlined below.
Market equilibrium is defined as a set of housing prices and a rule assigning families
them. These owners will rent for any nonnegative price, although they will charge positive
prices if the market will support them. There is no possibility for collusion among landlords.
Housing supply in each community is thus perfectly inelastic: In quantity-price space, it is a
to districts on the basis of their income that is consistent with individual family preferences,
taking all other families’ decisions as fixed:
vertical line extending upward from (n , 0 ) .
{
}
Definition: An equilibrium for a market defined by δ ; J ; µ1 , K , µ J ; and F
{
consists of a set of nonnegative housing prices h1 , K , h J
Family i ’s exogenous income is x i > 0 ; the income distribution is bounded and has
}
and an allocation rule
G : R + a Z J that satisfy the following conditions (where
distribution function F, with F ' ( x ) > 0 whenever 0 < F ( x ) < 1 .24 Families derive utility
x j ≡ ∫ 1 (G( x ) = j )x dF( x )
from school quality and from numeraire consumption, and take community composition
EQ1
and housing prices as given. Let x j denote the mean income of families in community j,
EQ2
∫ 1(G( x ) = j )dF( x ) ):
No district is over-full. For each j, ∫ 1 (G( x ) = j )dF ( x ) ≤ n N .
Nash equilibrium. At the specified prices and with the current distribution of
peer groups, no family would prefer a district other than the one to which it
and let h j be the rental price of local housing. The utility that family i would obtain in
The model is a “musical chairs” game, and the upper constraint serves to tie prices down, while the lower
constraint avoids the need to define the peer group offered by a community with no residents.
24 Of course, the income distribution cannot be continuous for finite N. Relaxing the treatment to allow a
discrete distribution would add notational complexity and introduce some indeterminacy in equilibrium
housing prices, but would not change the basic sorting results.
23
25
I might allow U ij = U ( x i − h j , Q ( x j , µ j )) , with Q1 ≥ 0 and Q 2 > 0 , without changing the basic
results; δ then corresponds to Q1 Q 2 . The key assumption is that all families share the same U and Q
functions, with all differences in their behavior resulting from differences in their budget constraints (i.e.
from x i ).
17
18
(
)
is assigned: U x i − hG( x i ) , x G( x i )δ + µG( x i ) ≥ U (x i − hk , x kδ + µk ) for all i
Note that Theorem 2 does not rule out equilibria in which some families live in
EQ3
and all k.
Normalization of housing prices. h j = 0 whenever
lower- µ than do some higher-income families. I refer to these as unsorted (or imperfectly
∫ 1(G( x ) = j )dF ( x ) < n N
sorted) equilibria. They arise when the peer group advantage of high-income communities
EQ4
No ties in realized quality. For any j, k, x j δ + µ j ≠ x kδ + µk .26
over low-income communities is large enough to overcome deficits in school effectiveness.28
The following results are proved in Appendix D:
For fixed income and effectiveness distributions, unsorted equilibria become harder to
Theorem 1. Equilibrium exists.
maintain as the weight that families place on peer group relative to school quality falls:
Theorem 2. Any equilibrium is perfectly stratified, in the sense that no family lives
Corollary 2.3. Let G be an assignment rule satisfying Corollary 2.1 under which
in a higher-quality, higher-price, or higher-peer-group district than does any higher
there exist communities j and k satisfying µ j < µk but x j > x k . Then for
income family.
C ≡ max
x k
µk − µ j
x j − xk
>0,
Corollary 2.1. In any equilibrium, the n families with incomes greater than
i.
F −1 (1 − n N ) live in the same community, which has higher quality ( xδ + µ ) than
(
housing prices with which G is an equilibrium).
)
any other. The next n families, with incomes in F −1 (1 − 2n N ), F −1 (1 − n N ) , live in
the community ranked second in quality. This continues down the distribution: For
( ( {
each j ≤ J , the families with incomes in F −1 max 1 −
jn
Whenever δ > C , G is an equilibrium allocation (i.e. there exist
}), F −1 (1 − ( j −1)n N ))
N ,0
ii.
Whenever δ < C , G is not an equilibrium allocation.
iii.
If δ = C , G can satisfy requirements EQ1-EQ3 for equilibrium, but
violates EQ4.
live in the community with the j th ranked schools.27
I do not present formal results on the implications of increases in J for effectiveness
Corollary 2.2. If δ = 0 , equilibrium is unique.
Condition EQ4 corresponds to the “stability” notion of Fernandez and Rogerson (1996; 1997).
Arrangements that satisfy EQ1 through EQ3 but not EQ4 are unstable, and perturbations in one of the tied
communities’ effectiveness or peer group would lead to non-negligible differences between the communities
as families adjust. With EQ4, equilibria are locally stable.
27 I neglect families precisely at the boundary between income bins (i.e. those with incomes satisfying
26
sorting, as much depends on the µ j ’s assigned to the new districts. Informally, however,
Corollary 2.3 suggests that for a stable µ distribution, increasing the number of districts
F (x ) = 1 − N for some j). I demonstrate in the Appendix that families at boundary points are
indifferent between the two communities in equilibrium. As the income distribution approaches continuity,
the potential importance of boundary families declines to zero.
It need not be true that unsorted equilibria are less efficient than the perfectly sorted equilibrium: If the
marginal utility of school quality declines quickly enough, it can be more efficient to assign effective schools
to low-income bins than to the wealthiest students. In any case, concern for peer group amounts to an
externality, and there is no assurance that the efficient assignment of families to districts is an equilibrium at
all. It may be efficient to have heterogeneous income distributions at each school, for example, but this is
never a decentralized equilibrium.
19
20
jn
28
constrains the possibility of unsorted equilibria: With more districts, the distance between
another house-district take their “peer group” with them. Regardless of parental valuations,
the average incomes of districts that are adjacent in the quality distribution is smaller. As C
then, families always prefer a high- µ house to one with lower µ . Because willingness-to-
depends on this distance, a higher J reduces the amount by which a low-income district’s
pay for a preferred school is increasing in x, equilibrium is unique, with the ranking of
effectiveness parameter can exceed that of the next-wealthier district before the wealthier
districts by effectiveness is identical to that by the income of the resident family. Panels A
families will bid away houses in the more effective district.
and B of Figure 1.1 graph the equilibrium allocations of effectiveness ( µ j ) and district
This tendency is at the core of my empirical strategy. To clarify it, I present next to a
simulation exercise that demonstrates the impact of market structure (J) on effectiveness
sorting under different assumptions about the importance of peer group to parental
desirability ( x j δ + µ j ) as functions of family income when parents have no concern for
peer group ( δ = 0 , Panel A) and when concern for peer group is moderate ( δ = 1.5 , Panel
B).
preferences (δ), and thus about the “stickiness” of residential assignments. I begin by
describing the allocation of effectiveness in illustrative equilibria, then describe the
simulation and its results. Finally, at the end of this section I return to the basic model to
discuss its allocative implications and the likely effects of endogenizing school effectiveness.
1.3.1. Graphical illustration of market equilibrium
The competitive case serves as a baseline, but it is not a realistic description of choice
in the presence of peer group externalities. I next consider a market with ten equally-sized
districts, a degree of Tiebout choice that, as is discussed below in Section 1.4, corresponds
roughly to the 80th percentile U.S. metropolitan area. Assume that J = 10 , n = N 10 , and
µj =
From Theorem 2 and its corollaries, the income distribution in any equilibrium is
divided into J quantiles, with wealthier quantiles living in more preferred—higher
x j δ + µ j —districts. In Appendix D, I show that this necessary condition is also sufficient
j
10 ,
j = 1, K , 10 . Panel C of Figure 1.1 displays the unique, perfectly sorted
equilibrium when δ = 0 . Families in the j th decile of the income distribution live in the
district with the j th most effective schools.
When parental concern for peer group is introduced, the perfectly sorted equilibrium
for an assignment rule to be an equilibrium allocation. Here, I use these results to construct
possible equilibria under different (δ , J ) combinations.
is no longer unique. It is now possible for ineffective districts to retain wealthy peer groups
in equilibrium, as long as they are not so ineffective that families would prefer a lower- x ,
It is helpful to begin by considering a Tiebout market that approximates perfect
competition. Assume that there are as many districts as there are families, with only a single
house in each district, and suppose that both family income and school effectiveness are
uniformly distributed on [0, 1]. There is no peer group externality, as families that move to
higher- µ district. One imperfectly sorted equilibrium is displayed in Panel D. Note that
district desirability is monotonically increasing in district average income, as Theorem 2
requires that the desirability and income rankings be identical in equilibrium. Effectiveness
is not monotonic in family income, however: Some families live in districts that are less
21
22
effective than those where some poorer families live. Effectiveness sorting nevertheless
peer characteristics to student performance by one. In the imperfectly sorted markets
remains substantial, and effectiveness is highly correlated with peer group average income.
displayed in Panels D and F, however, the magnification effect is smaller: θ * = 0.9 in D and
Finally, we consider the case where the housing market gives parents few options,
with only three equally-sized districts ( J = 3 , n = N 3 ). This corresponds roughly to the
40th percentile of the U.S. distribution. Suppose here that µ j =
j
3
, j = 1, 2, 3 . When there
are no peer effects (Panel E), equilibrium is again unique and is perfectly sorted on
0.5 in F. The simulations below suggest that this tendency for effectiveness sorting and
magnification to depend on the number of districts when parents care about both peer
group and effectiveness holds generally, as long as concern for peer group ( δ ) is moderate.
When δ is large, however, even markets with many districts can have unsorted equilibria,
[
and there is no tendency for E θ * |δ , J
effectiveness.
When we add concern for peer group to the three-district market, there is
]
to increase with J, at least in the ranges considered
here.29
substantially more potential for mis-sortings than even in the ten-district case. The gap in
peer quality between adjacent districts has grown substantially, and families therefore require
a much larger µ return to justify a move from one district to another whose current
residents are lower in the x distribution. Indeed, with the parameter values used here, there
is no allocation of x terciles to districts in which any family would willingly move to a lower-
x district; all six of the possible permutations are equilibria. Panel F illustrates one
possibility. Here, the most effective district is rewarded with the wealthiest students, but the
1.3.2. Simulation of expanding choice
In this subsection, I describe simulations of a hypothetical regional economy under
several combinations of (δ , J ) . As δ grows, the relative importance of school effectiveness
diminishes and the likelihood of unsorted equilibria expands. By the logic above, for any
fixed δ we might expect unsorted equilibria to be less prominent with many districts than
with few.
Where Figure 1.1 used uniform, nonstochastic distributions for both income and
two remaining districts are mis-sorted.
Recall equation (3), which suggested that a naïve estimate of the peer effect is
magnified by effectiveness sorting, with the degree of magnification being
(
) ( )
θ * ≡ cov x j , µ j var x j , the coefficient from a regression of µ j on x j across all
*
districts in the market. θ = 1 in the perfectly sorted markets displayed in Panels A, B, C,
and E of Figure 1.1, indicating that the slope of school-level average test scores with respect
to student characteristics in these markets will overstate the contribution of individual and
23
effectiveness, here I adopt the slightly more realistic assumption that income has a normal
distribution and I draw random effectiveness parameters from the same distribution.30 For
For any δ , there is some J for which effectiveness sorting will increase: The perfectly competitive case in
Panels A and B would be perfectly sorted for any δ . I simulate only markets with J ≤ 10 —the
computational burden increases with the factorial of J—though this is easily enough to reveal the general
trend.
30 Analysis of varying δ subsumes the variance of the µ ’s: Increased variation in school effectiveness is
j
29
equivalent, for the purpose of the sorting process, to increased parental valuation of a district with high
effectiveness relative to one with a desirable peer group (i.e. to a reduction in δ ). A normal (rather than log-
24
each market type, I conducted 5,000 draws, first choosing effectiveness parameters for each
understand the magnification bias in naïve estimates of the peer effect like (3). For each
district and then permuting the assignment of income bins to districts until I obtained an
(δ , J ) combination, I estimated a regression of µ j
equilibrium assignment (i.e. one in which no low-income district was preferable to any high-
from actual data in Section 1.5, pooling all 5,000 simulated markets and including a fixed
income district).31
Figure 1.2 displays the average allocation of school effectiveness in markets with
three and ten equally-sized districts. Panel A depicts the case where parents are unconcerned
about the peer group, as in the left-hand panels of Figure 1.1. Here, families must be
perfectly sorted on school effectiveness in equilibrium, and the average µ ’s depicted in the
figure are simply order statistics from the standard normal distribution. The remaining
panels show progressively higher valuations for the peer group: δ = 0.5, 1.5, and 3 . As δ
grows, progressively less complete sortings become equilibria and average µ j values
collapse toward the overall mean.32 Moreover, the collapse happens more quickly for threedistrict markets than for those with ten districts. This means that when δ is moderate in
on x j analogous to those estimated
(
) ( )
effect for each. The resulting estimates of θ (δ , J ) = cov x j , µ j var x j
are displayed in
Figure 1.3. The trends identified in Figures 1.1 and 1.2 are again clear. First, θ is well above
zero when δ is small, indicating that the residential housing market mechanism rewards
administrators of effective schools with the wealthiest students when parents primarily assess
schools by their effectiveness. When δ is large, θ is close to zero for all J, as no district
structure creates the desired rewards when parents are largely unconcerned with school
effectiveness.
The moderate δ case is the most interesting. Here, we observe more perfect
sorting on µ —and therefore larger slopes of µ with respect to x j —when there are many
Panel C, the gradient of school effectiveness with respect to family income is steeper for
districts than when there are few. That is, ∂θ ∂J > 0 for moderate δ .33 If both peer group
J = 10 than for J = 3 . As δ grows, however, Panel D indicates that the differences
and school effectiveness are important to parents, then, the Tiebout mechanism rewards
between the two sorts of markets shrink toward zero.
effective administrators only when there are many districts. Model (3) suggests that in this
It is clear from Figure 1.2 that effectiveness sorting tends to increase with δ and, for
moderate values like that shown in Panel C, with J. The simulation results can be used to
case the test score gap between high- and low-income schools will tend to be larger in
markets with a great deal of interdistrict competition than in those with less Tiebout choice.
I test for this in the empirical analysis below.
normal) income distribution was chosen to avoid complications from the log-normal distribution’s skew, and
because the x index that I use in the empirical analysis is approximately normally distributed.
31 This strategy treats all possible equilibria as equally likely. It might be more realistic to attach higher
probability to equilibria that are attracting points for larger ranges of initial assignments under some
adjustment process, but this is left for future work.
32 The nonmonotonicity of the δ = 3, J = 10 case arises because parental valuations depend on average
income rather than on the average income rank; peer group differences between income quantiles are thus
larger near the tails. This is not central to the analysis.
33
Figure 1.3 reveals a small effect of Tiebout choice on the effectiveness gradient even when δ = 0 , but this is
sensitive to the simulation assumptions (in particular, to the distribution of effectiveness as the number of
districts grows). The simulations for positive δ —in which equilibrium need not be unique, so that averages
are determined both by the distribution of effectiveness and by the set of equilibria—are much less sensitive.
25
1.3.3. Allocative implications and endogenous school effectiveness
In the model presented above, Tiebout choice hurts low-income students in two
ways. First, it permits increased stratification of students. Because total peer group is in
26
effectiveness across markets . Then, in Section 1.6, I examine the distribution of average test
scores across markets, looking for evidence that interdistrict competition leads to increases
in the average effectiveness of local administrators.
fixed supply, stratification necessarily offers better peers to wealthy students and worse peers
to low-income students. Second, if the market mechanism functions and families sort on
1.4.
effectiveness, it assigns low-income students to schools that are below-average in their
Data
My test of parental valuations requires data describing the distribution of peer groups
effectiveness. This is an unavoidable effect of the Tiebout mechanism, as the flip side of
and outcomes across schools within housing markets that differ in the amount of Tiebout
rewarding effective schools with wealthy students is punishing poor students with relatively
choice. I describe first my measure of market structure, defined over district-level
ineffective schools.
enrollment. I then present evidence that this measure represents a binding constraint on
The model stacks the deck, however, by holding the distribution of effectiveness
fixed. If school administrators respond to incentives, effectiveness sorting will also induce
parents’ ability to exercise Tiebout choice. Finally, I discuss the SAT data that are the
primary source of information on student outcomes across schools.
higher effort and greater effectiveness. This will tend to raise scores for everyone, and the
1.4.1. Measuring market concentration
productivity benefits may offset the allocative costs that Tiebout choice imposes on poor
I define local housing markets as Metropolitan Statistical Areas (MSAs), Census
students.34
Bureau approximations of local housing markets defined by observed commuting patterns.35
My empirical analysis thus has two components. In Section 1.5, I look for evidence
The SAT data that I use to measure student outcomes are taken from the early 1990s.
that effectiveness sorting is more complete in high-choice than in low-choice markets, as the
Consequently, I use 1990 MSA definitions and draw demographic characteristics of each
simulations above suggest it should be if parental valuations attach substantial weight to
MSA from the 1990 Census.
school effectiveness. In that section, I identify effectiveness sorting from the distribution of
student performance within markets, using fixed effects to absorb any differences in average
35
34
There is great need for a model of the supply side of Tiebout choice markets that describes the distribution
of administrators’ responses to incentives. Does competition force the worst districts to catch up to the
average, induce the best districts to pull away from the average, or lead all districts to improve effectiveness
equally? A Mirrlees-type argument suggests that the first is unlikely without market entry, as a district that
enrolls the lowest-income students faces little sanction for further reductions in effectiveness. If this
intuition holds, administrative responses would not offset the inequality-increasing effects of Tiebout choice
identified here.
27
The Census Bureau classifies the largest urbanizations as Consolidated MSAs (CMSAs), and subdivides them
into several component parts, Primary MSAs (PMSAs). I treat several PMSAs within a larger area as distinct
markets, reasoning that a move from, for example, Riverside to Ventura—both cities within the Los Angeles
CMSA, but separated by about 125 miles—is more akin to a migration across metropolitan areas than to a
within-market move. Most MSAs and PMSAs are defined along county boundaries; in New England, where
town boundaries define MSAs, I use the alternative—and slightly larger—New England County Metropolitan
Areas. For reasons of data availability and comparability, the Honolulu and Anchorage MSAs are excluded
from all analyses.
28
MSAs differ substantially in their educational governance structures. While the
median MSA has 9 school districts, there are 25 markets with only a single district each.
and 0.82, respectively); four-fifths of MSAs are concentrated and three-fifths highly
concentrated by these definitions.
(Thirteen of these—including Miami and Fort Lauderdale, by far the largest—are in Florida,
Table 1.1 displays summary statistics for several metropolitan-level demographic
which has large counties and only one district per county.) Boston, with 132 districts,
measures, calculated from county-level tabulations of the 1990 Decennial Census (from the
represents the other extreme; seventeen additional markets have fifty districts or more.36
STF-3C file) aggregated to the MSA level. Means of each variable are presented both for the
The raw count of districts is a crude measure of market concentration, as it does not
full sample of 318 MSAs and within each quartile of the choice distribution. There are
distinguish between the New York PMSA, where the three largest districts have 87 percent
substantial differences across quartiles: Low-choice markets tend to be located in the South,
of enrollment and the remaining 53 districts combine for 13 percent, and the Dallas PMSA,
to be smaller, and to have more Blacks and Hispanics. They are also more likely to be
with the same number of districts but only 44 percent of enrollment in the three largest.
located in states with “Minimum Foundation Plan” financing schemes, a mechanism used by
Following Hoxby (2000a), I calculate a more appropriate index of Tiebout choice as one
37 states to reduce inequality in school resources.37
minus the Herfindahl Index, a concentration measure used by the Federal Trade
1.4.2. Does district structure matter to school-level choice?
Commission (FTC) in antitrust deliberations and defined as the sum of firms’ squared
market shares. Districts’ “market shares” are their enrollments in grades 9-12 divided by the
total over all public school districts in the MSA, calculated using data from the 1990
Common Core of Data (CCD), an annual census of public schools and districts. Letting n jm
be the relevant enrollment of district j in market m and N m the total relevant enrollment in
(
the market, the choice index is c m ≡ 1 − ∑ j n jm N m
)
2
Figure 1.4 displays the index’s distribution. Nearly all U.S. markets are highly
Most of the existing literature, while recognizing that there is heterogeneity across
schools within any given school district, has assumed that public school districts are the
relevant units that compete for students in a Tiebout choice framework (Borland and
Howsen, 1992; Hoxby, 2000a). There are two main reasons for this. First, any local tax and
spending decisions are made at the district level, and this is also where many key education
policies (curriculum, teacher pay scales, etc.) are set. Second, for reasons relating to the
jurisprudence of school desegregation and to mechanisms like “open enrollment” and
concentrated by private market standards: Vertical lines on the figure indicate the FTC’s
magnet schools, there are not always stable, well-defined catchment areas within districts that
thresholds for “concentrated” and “highly concentrated” markets (choice indices below 0.9
link neighborhoods to individual schools, so residential location may not be an important
36
All district counts and enrollment figures are calculated for grades 9-12 only (Urquiola, 1999).
37
Categorizations of state finance plans as of the early 1990s are drawn from Card and Payne (2002).
29
30
determinant of within-district school assignment.38 Nevertheless, many districts limit the
exercise residential choice, we should expect greater stratification by family income in
ability of parents to choose from among the schools in the district except by their location
markets with high district choice indices than in markets with more concentrated school
decisions, and even when parents can choose distance is often a major factor. Thus, Tiebout
governance, and this effect should be robust to the inclusion the school-level choice index.39
choice may operate across neighborhood schools within a large district as well as across
Table 1.2 presents evidence on the relationship between the district-level choice index and
districts. To the extent that peer groups and school-level policies, rather than policies set at
two measures of within-MSA stratification, based on the distribution of household income
the district level, are the primary objects of parental choice, neighborhood sorting within
across districts and the racial composition of schools.
school districts may be a relatively effective form of choice.
In view of this possibility, it is important to ask whether inter-district competition
The first three columns present regression models for the across-district share of
variance of household income, calculated separately for each MSA with at least two
matters to the way that students are assigned to neighborhoods and schools in Tiebout
districts.40 All three models include as explanatory variables the district-level choice index,
equilibrium. Panel B of 1.1 displays measures of the extent of school-level choice by quartile
fixed effects for nine Census-defined geographic divisions, and controls for several MSA-
of the district-level index. MSAs with more district-level choice have more schools, on
level variables that might have independent effects on measured sorting. The second
average, than do low-choice MSAs, but this is largely a function of population; average
column adds to these a control for the school-level choice index, while the third column also
school size is only weakly correlated with district-level choice. Nevertheless, a school-level
controls for several measures of census-tract-level stratification.41 All three estimates
choice index is strongly positively correlated with the district-level index: In MSAs in the
indicate a strong relationship between district-level choice and income stratification across
lowest quartile of district choice, the average school-level index is 0.82, versus 0.96 in MSAs
districts.
in the highest district-level quartile. This relationship is robust to controls for the
There may be a mechanical relationship, however, between measures of across--
demographic characteristics shown in Panel A of Table 1.1, although I do not report the
district sorting and the district structure. To see this, note that areas with more districts—
regression model here.
conditional on market size—necessarily have smaller districts, and random distribution of
The multicommunity model developed above, in which families stratify across
On desegregation remedies, see Welch and Light (1987), Orfield (1983), and Milliken v. Bradley 418 U.S. 717,
1974.
Eberts and Gronberg (1981) and Epple and Sieg (1999) propose similar stratification tests of Tiebout-style
models.
District-level income distributions are drawn from the School District Data Book (SDDB), a tabulation of
1990 Census data at the school district level. I am grateful to Cecilia Rouse for providing access to the
SDDB data.
41 Tract-level data come from the 1990 Census STF-3A files. Census tracts are much smaller than school
districts, with 4,000 residents on average. Tiebout models do not speak to within-jurisdiction sorting, and
invariance of the choice coefficient to tract-level controls offers reassurance that the relationships observed in
Table 1.2 do not derive from a spurious correlation between district structure and MSA residents’ tastes for
micro-neighborhood segregation.
31
32
jurisdictions, suggests a useful test of the hypothesis that district boundaries are important
constraints on the Tiebout choice process. If school districts are a unit over which parents
38
39
40
populations would produce higher measures of segregation across these smaller areas. To
1.4.3. SAT data
avoid the bias that this produces, one would ideally estimate the same regressions for
Neither of the most commonly used datasets with observations on student
measures of across-school stratification. Unfortunately, income data are not available at the
outcomes, the National Education Longitudinal Study (NELS) and the National
school level. Instead, I use data on the racial composition of each school, collected in both
Longitudinal Survey of Youth (NLSY), is suitable for my analysis of the distribution of
the CCD and the Private School Survey (PSS; National Center for Education Statistics,
student outcomes across schools within each MSA. The NELS uses a multi-stage sampling
2000), a census of private schools. I compute from these data a dissimilarity index (Cutler,
procedure and draws data from only three schools in the average MSA.43 The NLSY uses a
Glaeser, and Vigdor, 1999) based on the distribution of white and non-white students across
neighborhood-based sampling design, so may include more schools, but students cannot be
both public and private schools in each MSA.42 Columns D, E, and F of Table 1.2 report
matched to the schools that they attended and in any case are not representative of those
models using this dissimilarity index as the dependent variable. Again, the coefficient on the
schools.
district-level choice index is large, significant, and not much changed by the inclusion of the
school-level choice index and the tract-level segregation measures.
The estimates in Table 1.2 are repeated using several additional stratification
As an alternative, I use a restricted-access data set consisting of observations on
462,424 metropolitan SAT-taker observations from the cohort that graduated from high
school in 1994. The sample includes about one third of SAT-takers from that cohort, and
measures and alternative specifications in Appendix A. The basic result is clear: There is a
represents nearly 20 percent of 1994 high school graduates.44 As students in this sample
strong, robust relationship between the structure of an MSA’s educational governance (at the
generally entered high school in 1990, the MSA demographic data and choice measures
district level) and the degree of student stratification across schools and districts within that
discussed above should accurately describe the environment in which students’ parents made
MSA. District-level market concentration evidently captures real variation in parents’ ability
their locational decisions.
to sort themselves across schools, and it is therefore reasonable to expect markets with less
concentration of district governance to have better-functioning Tiebout marketplaces.
The SAT data are rich, but have a serious limitation: Students self-select into taking
the SAT, and there is evidence that at large geographic scales the SAT-taking rate is
negatively correlated with average performance (Dynarski, 1987). A key source of variation
43
42
The earliest year for which I have been able to obtain electronic PSS data is 1997-1998, so they do not line
up perfectly with the CCD data. Both the CCD and PSS datasets describe the racial composition of the
entire school; when schools include both elementary and secondary grades, I assume that the racial
composition of students in grades 9-12 is the same as that for the school as a whole. The 29 MSAs in which
the CCD is missing racial composition for schools with more than 20% of MSA enrollment are excluded
from the calculations.
44
I nevertheless present estimates for my basic model using the NELS data as a specification test in Section
1.5.
SAT-takers who report their ethnicity were sampled with probability one if they were Black or Hispanic, or if
they were from California or Texas, and with probability one-quarter otherwise. Due to an apparent error in
the College Board’s processing of the file, students who did not report an ethnicity are excluded from the
sample. In data for 1999, in which I have a complete version of the file, these students comprise about 12%
of SAT-takers.
33
34
in SAT-taking rates is the state university system’s preference for the SAT versus its
including a nonresponse category) and the interactions of six ethnicity indicators with two
competitor, the ACT. In “ACT states,” only students who are applying to out-of-state
gender categories (eleven parameters) and with twelve family income bins (66 additional
colleges need take the SAT, inducing significant positive selection into the sample of
parameters).47 The sample is large enough to permit relatively precise estimation of even this
observed SAT scores. To partially remedy this, I discard all observations from the 27 states
flexible model, and effect standard errors are generally below ten SAT points. An index of
with SAT-taking rates below one third.45 The remaining sample consists of 329,205 SAT-
peer quality was constructed by averaging the fitted values (excluding the estimated school
takers from 177 MSAs in “SAT states.” This sample is likely representative of the college-
effect) of this regression over all students at each school.48 This index can be interpreted as
bound population within the areas under consideration, and I do not further adjust for
the peer group’s predicted average SAT performance at a nationally representative school.
sample selection.46 All analyses of the SAT data, however, control for the MSA SAT-taking
By using SAT data to describe each school’s peer group, I necessarily exclude the
rate. Exploratory analyses with more involved selection corrections—reported in Appendix
characteristics of students who do not take the SAT. The average characteristics of SAT-
C—suggest that the resulting estimates are not seriously biased by within-school selection
takers are arguably a more accurate measure of the peer group for college-bound students
into SAT-taking.
than would be averages over the entire student population, as students at many schools are
The size of the SAT database permits precise estimation of school-level measures: I
tracked into college-preparatory and non-college-preparatory courses with little interaction
have at least ten observations per school from schools with 77 percent of enrollment in the
between students in the two groups, and it seems plausible that parents distinguish between
MSAs studied. Only 22 percent of schools (enrolling 10 percent of sample students) in the
the groups in their evaluations of schools. Absent microdata for non-SAT-taking students,
SAT data are private.
however, I am unable to test this restriction.
It is helpful to have a one-dimensional index of peer group quality at each school.
Table 1.3 lists summary statistics for the SAT sample and for that portion of the
To construct this, I estimated a flexible regression of individual SAT scores on student
sample in MSAs in each of the four choice quartiles. High-choice MSAs have substantially
characteristics, controlling for school fixed effects. The model included effects for 100
higher SAT-taking rates and scores than do low-choice MSAs. The differences in average
parental education categories (ten for mother’s education by ten for father’s education, each
SAT-taking rates use 12th-grade enrollment at schools which successfully match to the SAT data as the
denominator, although other definitions produce the same sample. The selection rule is insensitive to the
exact cutoff used: The marginal states, Colorado and Oregon, have rates of 23% and 38%, respectively.
Among states above the cutoff, average scores offer no evidence of differential selection into SAT-taking; see
Appendix C.
46 Roughly 45% of the relevant national cohort enrolled in college after graduation, although only about twothirds of enrollment is at four-year institutions (National Center for Education Statistics, 1999, Tables 101,
173 and 184).
45
35
47
48
The model explains 33 percent of the cross-sectional variance in individual SAT scores (as compared with 22
percent explained by school effects alone).
The individual characteristics coefficients may be biased by endogenous selection into schools. This is not a
problem for my estimation strategy as long as the bias affects all background variables equally: The only role
for these coefficients is to assign relative weights to the individual variables, and the scale of the background
index is irrelevant. Tests reported in Appendix C indicate that the school-level index is quite reliable, and in
any case specification checks reported in Section 1.5 indicate that the results are not particularly sensitive to
the particular peer group measure used.
36
scores, however, are entirely accounted for by differences in students’ background
the schools attended by advantaged and disadvantaged students will tend to be larger when
characteristics.49
Tiebout choice makes it easier for wealthy parents to select effective schools without
Figure 1.5 displays the scatterplot of school average SAT scores against the peer
accepting unwanted peers. As a result, naïve estimates of the peer effect should be larger in
group index for a one-quarter subsample of the schools in the data. Circle sizes indicate the
number of (weighted) observations entering the school-level averages. The figure also
high-choice markets than in low-choice markets.
My first test of this prediction in the SAT data uses nonparametric techniques to
displays the regression of average SAT score on the peer group, controlling for MSA fixed
allow for a nonlinear educational production function. These offer no evidence of
effects, which has slope 1.74. The peer index is scaled so that the effect of individual
substantial nonlinearity, and I next turn to regression estimates of several linear
characteristics on own scores (i.e. β in equation 3) accounts for exactly 1 of this, with the
specifications. I also present estimates from alternative data sets; these are imprecise but
remaining 0.74 deriving from the slope of school effects with respect to peer group (i.e.
completely consistent with those derived from the SAT data. None of the data sets or
from γ + θ , the combination of reduced-form peer effects and effectiveness sorting). In the
specifications studied here supports the hypothesis that effective schools are more likely to
next section, I look for evidence that the slope of this line is steeper in high-choice than in
attract advantaged students in markets where the Tiebout choice index is high.
low-choice MSAs; under the assumption that β and γ do not vary systematically with
1.5.1. Nonparametric estimates
choice, variation in the overall slope is informative about ∂θ ∂c , the effect of choice on
If neither the effect of individual characteristics on own scores ( β ) nor the reduced-
effectiveness sorting. In Section 1.6, I estimate a different potential effect of Tiebout choice
form peer effect ( γ ) varies systematically with the structure of local school governance, and
on the line in Figure 1.5. There, I look for evidence that choice affects its intercept, as it
if sorting on effectiveness is more complete in high-choice than in low-choice markets, a
might if choice is correlated with average effectiveness (i.e. if ∂E[µ|c ] ∂c ≠ 0 ).
version of Figure 1.5 which included data only from markets with high choice indices should
exhibit a steeper slope than that shown, while a version estimated only from low-choice
1.5.
markets should be less steep. In a linear model, this may be confounded if there are
Empirical Results: Choice and Effectiveness Sorting
nonlinearities in the causal peer effect (i.e. in ∂t ∂x ), as market structure influences the
The sorting model in Section 1.3 predicts that if parents choose neighborhoods
dispersion of schools’ peer groups around the MSA average.
largely for the effectiveness of the local schools, equilibrium effectiveness sorting will
The median MSA contains only 19 high schools, not nearly enough to permit
depend on the educational market structure. Specifically, the gap in effectiveness between
49
separate nonparametric estimation for each market. As an alternative, I grouped MSAs into
Recall that low-choice MSAs are disproportionately Black, Hispanic, and in the South.
37
38
[
quartiles by the choice index and estimated separate school-level kernel regressions of test
]
E t jm |x jm = (α +ψ m ) + x jm (β + γ + ϕ 0 ) + x jm c mϕ1 + x jm Z mϕ 2 +
(
(
[
])
)
+ x jmωm + µ jm − E µ jm |x jm + ε jm ,
scores on student characteristics for each quartile. Figure 1.6 displays the estimated
functions, which use an Epanechnikov kernel and a bandwidth of five, about one-tenth of a
school-level standard deviation. The figure offers little evidence of any differences in
(7)
where ωm ≡ θ m* − θ m (c m , Z m ) is the residual from (6), which I assume is independent of the
stratification of peer groups (i.e. of the distribution of x jm − x m ).
reduced-form educational production functions between the high-choice and low-choice
The effect of choice on the extent of effectiveness sorting can thus be estimated as
quartiles, as the quartile functions are quite similar in both their intercepts and slopes.
the coefficient on the interaction of peer group ( x jm ) with the choice index ( c m ) in a
1.5.2. Regression estimates of linear models
regression for school average test scores. The terms on the second line of (7) are
The quartile analysis in Figure 1.6 offers no natural way to control for MSA variables
that might have independent effects on the housing market or on the causal importance of
unobserved residuals, and standard errors must be adjusted to account for their nonclassical
structure.
the peer group. Here, I develop and estimate a more parametric version of the hypothesis of
interest. Drawing on the indication in Figure 1.6 that there is no substantial nonlinearity in
Table 1.4 contains the main empirical results of the chapter. It presents OLS
the peer effect, I revert to the earlier linear model, letting m index housing markets:
t jm = x jm (β + γ ) + µ jm + ε jm , with
[
Basic results
(4)
estimates of model (7), using MSA fixed effects to absorb the effect of variations in ψ m .
Standard errors permit arbitrary heteroskedasticity and are clustered at the MSA level to
]
E µ jm |x jm = ψ m + x jmθ m* .
(5)
accommodate the within-MSA autocorrelation implied by the random coefficient ωm .
A well-sorted market assigns high- x jm students to high- µ jm schools, and corresponds to a
Schools are weighted by the sum of individual SAT-taker observations’ inverse sampling
high value of θ m* . In general, for fixed parental valuations, δ , the expected sort may vary
probabilities, with an adjustment at the MSA level to weight MSAs in proportion to their 17year-old populations.
both with choice and with other metropolitan characteristics, Z m :
[
]
θ (c m , Z m ; δ ) = E θ m* |c m , Z m ; δ = ϕ0 + c mϕ1 + Z mϕ 2 .
Column A displays a very restricted version of model (7) that excludes all
(6)
The discussion in Section 1.3 suggests that if the peer group is not too important to parents,
effectiveness sorting will be more complete when there are more jurisdictions, so ϕ1 > 0 .
Combining (4), (5), and (6), we obtain an estimable equation:
39
interactions between the peer quality index and metropolitan area characteristics. (That is, it
forces ϕ1 = ϕ 2 = 0 ; this is the model depicted in Figure 1.5.) It indicates that when all MSAs
in the sample are pooled, the gradient of school average SAT scores with respect to the
characteristics of SAT-takers is 1.74. One standard deviation of school-average student
40
background is 48 points. This corresponds to an 84 point difference in expected average
Column C allows the racial and ethnic composition of SAT-takers to have an
SAT scores, 0.88 standard deviations of this variable. This, of course, reflects the combined
independent effect on average SAT scores. If there are cultural biases in SAT scores, for
influence of individual characteristics ( β ), peer effects ( γ ) and an average of the θ * ’s, the
example, individual ethnicity may have a different effect than does the composition of the
within-MSA gradients of school effectiveness with respect to peer group.
peer group. The coefficients on racial composition variables are large and significant, but
Column B adds a single interaction of the peer group with a choice index. The
estimate of ϕ1 is small and indistinguishable from zero. The remaining columns add
again their inclusion has essentially no effect on the parameter of interest, the interaction of
average peer quality with Tiebout choice.
Column D tests a different aspect of the specification, the assumption that the
additional interactions of x jm with several metropolitan-level controls that might capture
other determinants of the sorting process, the distribution of school quality, the reducedform peer effect, or the sample selection process. Moving from left to right, these controls
include the MSA-level SAT-taking rate and indicators for six census divisions; the log of the
MSA population; and two combinations of additional demographic, income distribution, and
institutional controls. In each specification, the ϕ1 point estimate is negative, although it is
only significantly different from zero in columns C and D.
background characteristics predicting SAT scores are identical to those indexing willingnessto-pay for desirable schools. To test this, I allow willingness-to-pay to depend on students’
self-reported family income, estimating the interaction between income and Tiebout choice
while including the peer quality index to absorb peer effects. The interaction coefficient
here is again negative and insignificant.
Columns E and F explore the impact of varying the sample definition. In Column
E, the basic model is estimated on public schools only, while in Column F the 18 MSAs that
All of the models in Table 1.4 are based on a particular specification of the
educational production function, (7), which may not be correct. Table 1.5 reports the results
of several alternative specifications, each using the control variables from Column E of
have only a single district are excluded. The choice-peer group interaction coefficient is
again negative in each of these specifications, significantly so (and with a substantially larger
point estimate than in the basic specifications) in the latter case.
Table 1.4. Column A repeats the relevant coefficients from that specification. In Column B,
the peer effect is allowed to depend on the standard deviation of student characteristics as
well as on their average level. The standard deviation term enters significantly, indicating
that heterogeneous schools produce substantially higher scores than do homogenous schools
with the same average student background. The choice-peer group average interaction is
slightly more negative than in Column A.
Although results are not presented here, I have estimated several additional
specifications of the basic empirical test. The absence of a positive choice effect does not
seem to derive from the particular weighting of the data used here—one might prefer to
weight MSAs equally, or by the number of SAT-takers, rather than by their high-school-age
populations—nor from the inclusion in the sample of schools with too few SAT-takers to
permit accurate estimation of the school mean. In addition, Appendix B presents several
41
42
instrumental variables estimates of (7); there is no indication that endogeneity of the choice
and including a fixed effect for each MSA.50 As in the SAT data, peer effects and
index biases the estimates presented here.
effectiveness sorting are together substantial, inflating the school-level background index
Evidence from the NELS and from high school completion rates
coefficient by 90 percent relative to the coefficient of a within-school regression of
The SAT data are uniquely valuable for my empirical strategy, both because they
individual scores on own characteristics. When the peer group measure is interacted with
span a large fraction of metropolitan high schools and because they describe an outcome
the choice index—in Column B, and again with additional controls in the remaining
that is an important factor in families’ evaluations of schools. Nevertheless, it remains
columns—the coefficient is indistinguishable from zero, with a negative point estimate in
possible that selection into SAT-taking biases the above results. To assess their validity, I
every specification.
Panel B repeats this analysis, this time with the score earned by students when they
estimate the basic model using test score data from the National Education Longitudinal
Study (NELS) and high school completion rates from the Common Core of Data (CCD).
were in the 12th grade.51 Again, estimates of the choice effect are imprecise but are—with
Neither of these has nearly the breadth of the SAT data, so the estimates presented here are
one statistically insignificant exception—of the opposite sign from that predicted by the
not as precise as those above, but the point estimates are reassuringly similar.
economic model.
The remaining panels present models for measures relating to school continuation
The NELS sampled about 23 eighth grade students from each of 815 public and 237
private schools in 1988, following up with portions of this original sample at two-year
rates, defined as one minus the cumulative dropout rate. In Panel C, the dependent variable
intervals thereafter. Using a confidential version of the NELS data and school addresses
is the fraction of students from the NELS 8th grade sample who were still in school at the
from the CCD and the Private School Survey, I am able to match 700 schools (534 public
time of the 12th follow-up survey four years later. The background index used is the same as
and 166 private) in the NELS sample to the MSAs in which they are located.
that used in Panel B; it is a strong predictor of continuation rates but there is no evidence
The first panel of Table 1.6 presents estimates using the composite test scores that
that it is a stronger predictor in high-choice markets.
The final panel leaves the NELS data, reporting models for high school completion
th
students earned during the original wave of the NELS, when they were in 8 grade. (I
continue to use the secondary choice index in this analysis; it correlates 0.98 with an
rates of the cohort entering 9th grade in the fall of 1993. Data on this outcome come from a
elementary index.) Column A presents the coefficient from a regression of school average
district-level compilation of four years of CCD data. There are several limitations to the
scores on an index of student quality, pooling all metropolitan schools in the NELS sample
50
51
43
The background measure is a weighted average of variables characterizing students’ race and their parents’
education, again using weights chosen to best predict student test scores within schools.
Peer group and test score averages are still for the 8th grade school, as once students transfer to high schools
the NELS sample is no longer representative of the schools attended.
44
CCD completion rate variable: It is measured at the district level rather than the school; it
important background variables; or the included variables may be imperfectly measured—
covers only public schools; it is missing for a great many districts who failed to report one of
likely a particular problem for family income in the SAT data, which high school students are
the component variables; and it may be unreliable if districts cannot distinguish mobility
not likely to report reliably. Any of these would attenuate the estimated gradient of school
from dropout. Moreover, the CCD contains very little information about student
average student outcomes with respect to peer group characteristics
background, and I therefore use the SAT data student quality index, aggregated to the
The reliability of x jm is likely to be higher, however, in markets where schools are
district, to measure student characteristics. I drop MSAs that are not in SAT states or where
more stratified. One reason is that stratification implies a higher true variance of the peer
available completion rate data cover less than two thirds of public enrollment. This leaves a
group, and therefore a larger signal component of the signal-to-noise ratio. A second reason
sample of 931 districts from 50 MSAs. In spite of the serious limitations in the CCD data,
is that schools in more stratified markets are likely to be more internally homogenous; as the
the pattern of results in Panel D is quite similar to that in Panel C. Again, the student quality
sampling variance of the school average depends linearly on the within-school variance of
index is a strong predictor of completion rates, but its coefficient is (insignificantly) smaller
individual characteristics, more internally homogenous schools imply more reliable school-
in high-choice than in low-choice MSAs.
level averages. A final reason to suspect a stratification-reliability relationship is that
Given the lack of precision in the NELS and CCD estimates, it is somewhat
surprising how well they line up with those in Table 1.4. As before, the choice effect is
unobserved peer group characteristics are likely to be more strongly associated with
observed characteristics in markets that are more heavily stratified.
indistinguishable from zero, but point estimates suggest that effectiveness sorting is slightly
In single-MSA regressions of test scores on student characteristics, the above
less complete in high-choice markets. There is nothing to indicate that the SAT-based results
arguments imply greater attenuation of the peer group coefficient in MSAs with less
are an aberration.
stratified schools. As choice is positively correlated with stratification, this produces a
Possible biases in estimates of (7)
tendency toward larger estimated coefficients (i.e. less bias toward zero) in high-choice
Several identifiable factors may bias the coefficient on the peer group-Tiebout choice
MSAs. In fact, I do not estimate separate regressions for each MSA, but the general effect is
interaction in specifications like (7). I discuss two here; each can produce an upward bias in
the same: Unreliability of the peer group measure produces an upward bias in the effect of
ϕ1 .
choice on the peer group gradient, and therefore in the interaction coefficient ϕ1 .
A second possible source of bias in ϕ1 is economic. There is some evidence that the
The first source of bias is statistical. There are several reasons to suspect
measurement error in the peer group variable: There may not be enough observations at any
educational labor market is more liquid in MSAs that have many districts competing for
particular high school to accurately estimate the school-level average; the data may omit
teachers’ talent than in those with more concentrated governance (Luizer and Thornton,
46
45
1986). This may make it easier for a high- x jm school to attract good teachers in a high-
choice process driven in substantial part by parental pursuit of effective schools? The
choice market than in one with less choice, where teachers are likely to be assigned to
answer appears to be no. Note that the within-MSA gradient of school average SAT scores
schools by bureaucratic rules rather than by the market. Any such effect would imply a
with respect to student characteristics is 1.74 (from Column A of the same table). Even at
positive effect of choice on the reduced-form peer effect— γ in equations (1) and (4)—
the upper limit of the confidence interval, a move from unified governance to complete
which will appear as a positive contribution to ϕ1 .52
decentralization accounts for just over ten percent of this gradient.
We can imagine as a thought experiment fully decentralizing school governance in
Either of these effects would imply upward bias in estimates of ϕ1 relative to the
effect of interest. To the extent that they are thought to be important, the results presented
in Table 1.4, 1.5, and 1.6 should be seen as upper bounds on the effect of Tiebout choice on
Miami-Dade County, which is served by a single district.53 Figure 1.7 displays the actual
distribution of peer groups and school average SAT scores in Miami, as well as the
counterfactual distribution that might be observed if the Miami choice index were changed
parental effectiveness sorting.
to one and if the effect of choice were at the upper limit of its confidence interval.54 The
Calibration of results: Can we reject meaningful effects?
None of the estimates presented in this section supports the hypothesis that effective
schools are more likely to attract the best peer groups in markets with fragmented school
governance than in those where Tiebout choice is more difficult to exercise. Point estimates
actual and counterfactual distributions of school averages are nearly identical. If the
counterfactual reflects a substantial increase in sorting on school effectiveness, it must be
that effectiveness is responsible for a very small share of the across-school variation in SAT
scores.
of the choice-peer group interaction are almost uniformly negative, suggesting that
effectiveness sorting is less complete in high-choice than in low-choice markets. These
estimates are imprecise, however, and most cannot reject a zero effect. It is worth
considering whether the confidence regions exclude the sorts of effects that we would
Recall, moreover, that this thought experiment assumes a choice coefficient at the
upper limit of the confidence interval. At the point estimate, choice reduces the gradient of
SAT scores with respect to student quality. The models in Table 1.4 reject a sizable—by any
reasonable standard—effect of choice on the test score gradient. The estimated effects are
expect if school effectiveness were a prime determinant of parental location decisions.
Consider the specification in Column E of Table 1.4. Would a true effect of
The district’s web site indicates that the county is partitioned into school attendance areas. These can be
changed easily, however, and indeed were under the supervision of federal judges for desegregation purposes
from 1970 through 2001 (Welch and Light, 1987).
54 Note that decentralization of Miami’s schools would probably change the allocation of peers as well as their
distribution across schools. If, as Table 1.2 indicates, choice causes increased stratification, the counterfactual
Miami market would exhibit more dispersion along the horizontal axis in Figure 1.7. The figure ignores any
such effect, and simply considers whether decentralization would lead to increased dispersion of SAT scores
conditional on the observed peer group allocation.
53
+0.20—the upper bound of a 95% confidence region for ϕ1 —be consistent with a Tiebout
52
Note that this effect has nothing to do with parents’ use of their power to choose: It arises from teachers
moving to schools with students who are easy to teach, rather than from parents moving to districts with
good teachers.
47
48
difficult to reconcile with a sorting process in which school effectiveness is an important part
inclusion does not substantially alter the estimated effect of choice: It remains negative and
of both location decisions and educational production.
significant.55
1.6.
Empirical Results: Choice and Average SAT Scores
is not very large: A one standard deviation (0.28) increase in the choice index corresponds
The results presented in Section 1.5 offer no evidence that the allocation of effective
with a reduction in mean scores of only about four points, about one-eighth of an MSA-level
The negative effect of Tiebout choice on average SAT scores indicated by Table 1.7
schools is systematically different in high-choice than in low-choice markets. If Tiebout
standard deviation. Moreover, in some alternative specifications not reported here, the
choice does not increase the probability that effective schools attract students from
coefficient estimate is statistically insignificant, though still negative. When MSAs are
advantaged backgrounds, it is not clear how it can provide incentives that will lead
weighted equally, for example, rather than by the number of SAT takers or by the 17-year-
administrators to exert greater effort. The above results thus suggest that the argument
old population (not shown, but similar to the SAT-taker weighting in Table 1.7), the choice
(Brennan and Buchanan, 1980; Hoxby, 2000a) that average school performance should be
effect is about one third as large as that shown here and confidence intervals do not reject
higher in markets with decentralized governance may not hold. The SAT data permit a
zero. Nevertheless, there is no indication that Tiebout choice is associated with higher SAT
direct test of this prediction, however.
scores once student background is controlled.56 Moreover, the coefficient on the
Table 1.7 presents regression models for the average level of SAT scores across
background index across MSAs—1.58 in Column C, and slightly higher in later columns—is
MSAs. Column A includes only the choice index as a regressor. It enters with a positive
nearly identical to that found within MSAs (Table 1.4, Column A). This is consistent with
coefficient, implying that fully decentralized MSAs produce average SAT scores about forty
the claim that both coefficients measure primarily the peer effect ( γ ), which might be the
points higher than do those with only a single district. Recall, however, that there are large
same across MSAs as within, rather than effectiveness sorting ( θ ), which we would expect
differences between high-choice and low-choice MSAs in both SAT-taking rates and student
to see within but not across MSAs.
characteristics (from Tables 1.1 and 1.3). Columns B, C, and D add controls for the SATtaking rate and the average background index of SAT-takers. The positive correlation
The results on SAT scores across MSAs thus support those on the distribution of
scores within MSAs: The evidence does not indicate that Tiebout choice provides incentives
between choice and performance seems to result entirely from the omission of students’
55
background characteristics; when they are included in Column C, the coefficient becomes
negative and significant. The remaining columns add additional MSA-level regressors. Their
56
Note that in Column F, which controls for several MSA demographic characteristics, the coefficient on the
SAT-taking rate finally takes on its expected sign.
Hoxby (2000a), finds a positive effect of choice on average NELS scores across MSAs, one that is larger for
high-income than for low-income students. The SAT sample might be thought analogous to her “not-lowincome” group. Hoxby’s positive effect is not seen here, either in the OLS results in Table 1.7 or in
instrumental variables specifications (in Appendix B) similar to hers. See Chapter 2 for further discussion of
her results.
49
50
to school administrators to improve productivity, as productive administrators appear no
to student characteristics varies systematically with Tiebout choice, as would be expected if
more likely to be rewarded for it in high-choice than in low-choice MSAs.
effectiveness allocations were more stratified in high-choice markets. Even at the upper
extreme of the estimated confidence intervals, the SAT gap between more- and less-desirable
1.7.
Conclusion
schools is not meaningfully larger in markets with decentralized governance than in those
This chapter has used the Tiebout choice process—the choice of school
with less Tiebout choice. Several specification tests and alternative data sets fail to reveal
characteristics via housing decisions—as a lens through which to study the strength of
important biases in the basic models. Consistent with the results on within-market sorting, I
parental preferences for effective schools relative to those for other neighborhood or school
also find no evidence that Tiebout choice increases average SAT scores across markets, as
characteristics. Earlier work on Tiebout mobility presumes that parents use their location
would be expected if choice increases competitive pressure for administrators to run
decisions to choose effective schools; one lesson of the analysis here is that the potential
effective schools.
importance of peer group externalities to community desirability can create coordination
failures in which ineffective schools are preferred to more effective competitors.
The motivation for the empirical approach is a model of the Tiebout marketplace in
I see four possible explanations for the pattern of results. First, it may be that I have
mis-measured the extent of Tiebout choice by focusing on a district-level choice index where
in fact the relevant measure of parents’ exit options is at the school level. Second, parents
which housing prices ration access to desirable schools. As is common in multicommunity
may have no concern whatever for the peer group, and may choose schools purely for their
models, equilibrium is characterized by maximum stratification of families across school
effectiveness. (Recall that there is no necessary connection between market structure and
districts, with the wealthiest families residing in the most-preferred communities. Preferred
effectiveness sorting in this case.) Third, parents’ concern for the peer group may be so
districts need not have particularly effective schools, however, when peer group enters into
large that it dominates effectiveness in their choices, so that again there is no effect of choice
parental valuations, as wealthy families can be “stuck” in ineffective schools by their
on effectiveness sorting. Finally, it may be that the sorts of policies that I call “school
unwillingness to abandon the peer group offered. For parental valuations that place
effectiveness,” those not dependent on the peer group, are relatively unimportant
substantial weight on school effectiveness, this becomes less likely as Tiebout choice
determinants of student outcomes (or that they do not vary substantially across schools), and
increases parents’ exit options.
thus that effectiveness sorting and differences in average effectiveness across markets are not
In so far as student test scores depend on school effectiveness, effectiveness sorting
is observable as an increase in the slope of school average scores with respect to student
characteristics. I find no evidence that the gradient of school-level SAT scores with respect
51
observable in the pattern of average SAT scores.
The first two of these are not particularly plausible. I present strong evidence, in
Table 1.2 and in Appendix A, that the district-level choice index is an important determinant
52
of student stratification, even when possible confounding factors are controlled. It seems
the better school were that choice separable from the residential location decision.
that parents are sorting on some characteristics of school districts, though not on anything
Moreover, voucher programs that encourage the entry of new competitors may produce
that serves to increase student performance conditional on individual and peer
more options for parents than even the most decentralized of district governance structures,
characteristics.
reducing the potential for coordination failures and increasing the probability that even
It similarly seems unlikely that parents have zero concern for peer group. In the
parents who value the peer group highly will choose effective schools. It thus seems likely
presence of direct or indirect peer effects on student learning, parents would be irrational to
that the character of equilibrium will depend crucially on the particular institutions of any
ignore peer group in their evaluations of schools, and anecdotal evidence suggests that they
choice program. Further research with large-scale voucher programs will be needed to
do not do so. The likelihood that parents have imperfect information only reinforces this
determine whether administrators of effective schools are rewarded by increased demand in
judgment, as the most widely available indicator of school quality, the average test score,
the choice regimes that these policies create.
loads heavily on the peer group, while value added is much more difficult to observe.
The alternative hypotheses that are consistent with the above results, that parental
valuations place a great deal of weight on peer group relative to effectiveness or that
administrative and instructional effectiveness is simply unimportant to the distribution of
educational outcomes, seem more plausible. I interpret the chapter’s results as cautious
support for the first of these, though the second would equally well explain the results and in
any case their implications for the productivity benefits of Tiebout choice are the same.
In the absence of parental sorting on school effectiveness, there is little theoretical
support for the claim that Tiebout choice markets create incentives for school administrators
to exert greater effort to raise student performance. Caution is required, however, in
generalizing from this chapter’s results to choice markets that do not link school assignment
to residential location. Under Tiebout choice, parents may have to give up desired
neighborhood amenities—views, parks, air quality, or characteristics of neighbors—to obtain
a more effective school. They may be unwilling to do this even though they would choose
53
54
Table 1.2.
Effect of district-level choice index on income and racial stratification
Tables and Figures for Chapter 1.
(A)
0.07
(0.01)
(B)
0.09
(0.01)
(C)
0.10
(0.01)
White/Non-White
Dissimilarity Index
(School Level)
(D)
(E)
(F)
0.15
0.10
0.10
(0.03) (0.03) (0.02)
ln(Population) / 100
0.77
(0.22)
1.29
(0.29)
0.22
(0.27)
4.39
(0.83)
2.21
(1.13)
-0.64
(0.90)
Pop: Frac. Black
0.09
0.09
(0.022) (0.022)
-0.06
(0.026)
0.33
(0.09)
0.30
(0.09)
0.12
(0.08)
Pop: Frac. Hispanic
0.02
(0.01)
0.01
(0.01)
0.01
(0.01)
0.03
(0.06)
0.04
(0.06)
0.11
(0.05)
ln(mean HH income)
0.07
0.07
(0.018) (0.018)
0.03
(0.016)
0.15
(0.07)
0.17
(0.06)
-0.06
(0.05)
Gini coeff., HH income
0.36
(0.11)
0.33
(0.11)
0.15
(0.10)
1.88
(0.42)
2.03
(0.42)
0.20
(0.33)
Pop: Frac. BA+
-0.06
-0.07
(0.033) (0.033)
-0.05
(0.036)
-0.47
(0.13)
-0.40
(0.13)
0.32
(0.12)
Foundation plan state / 100
-0.04
(0.40)
-0.04
(0.39)
0.24
(0.35)
-2.95
(1.60)
-3.00
(1.58)
0.24
(1.15)
0.10
(0.036)
0.10
(0.031)
-0.37
(0.13)
-0.38
(0.10)
Table 1.1.
Summary statistics for U.S. MSAs
Across-District Share of
Variance, HH Income
Dependent Variable:
All MSAs
Mean
(A)
Panel A: Basic Descriptive Statistics
N
318
Choice index (district level)
0.66
ln(Population)
12.7
ln(Mean HH inc.)
10.5
Gini coeff., HH inc.
0.36
Fraction Black
10%
Fraction Hispanic
7%
Fraction college grads
20%
In foundation plan state
74%
South
38%
Private enrollment share
8%
Panel B: Districts and schools (public, grades 9-12)
# of districts
14.7
# of schools
33.5
# of students (thousands)
25.8
Average district enrollment
3,053
Average school enrollment
709
Choice index (school level)
0.89
S.D.
(B)
Mean by Choice Quartile
Least
Most
Choice
Q3
Q2 Choice
(C)
(D)
(E)
(F)
5%
81
0.25
12.1
10.4
0.37
12%
9%
20%
89%
65%
7%
76
0.66
12.3
10.4
0.36
11%
7%
19%
82%
47%
8%
83
0.81
12.8
10.5
0.35
9%
7%
20%
75%
30%
8%
78
0.92
13.5
10.6
0.35
8%
5%
20%
50%
12%
9%
17.9
40.6
40.1
6,103
242
0.08
3.3
16.5
13.4
6,557
754
0.82
7.7
20.1
14.1
2,247
692
0.88
13.5
38.1
30.2
2,015
680
0.92
34.5
59.1
45.3
1,303
710
0.96
0.28
1.0
0.2
0.02
10%
13%
6%
Choice
School-level choice index
Census tract- level segregation measures:
Isolation index (white/non-white)
Sources: Common Core of Data, 1990; 1990 Decennial Census STF-3C; Card and Payne (1998). Choice
quartiles are index values 0-0.5 (Q4); 0.5-0.75 (Q3); 0.75-0.875 (Q2); and 0.875-1 (Q1).
N
2
R
0.10
(0.03)
-0.27
(0.09)
Dissimilarity index (white/non-white)
-0.07
(0.03)
1.06
(0.10)
Across share of variance, education
-0.15
(0.046)
-0.38
(0.15)
Across share of variance, HH inc.
0.37
(0.05)
0.40
(0.16)
293
0.63
293
0.64
293
0.74
289
0.63
289
0.64
289
0.82
Notes : Observations are MSAs/PMSAs. Regressions are unweighted. Dependent variable has mean (S.D.)
0.041 (0.038) in columns A-C; 0.413 (0.151) in columns D-F. Columns A through C exclude 25 one-district
MSAs. Dissimilarity index is calculated over public and private schools; 29 MSAs in which racial composition is
missing for schools with more than 20% of public enrollment are excluded. All columns include fixed effects
for nine census divisions.
55
56
Table 1.3.
Summary statistics for SAT sample
Table 1.4.
Effect of Tiebout choice on the school-level SAT score-peer group gradient
By Choice Quartile
All MSAs
# of observations
# of schools
# of MSAs
MSA SAT-taking rate
Individual-level
SAT score
Black
Hispanic
Asian
Female
Father's education
Mother's education
Family income ($1,000s)
Student background index
School-level
# of SAT observations
Sum of SAT weights
S.D., SAT score
S.D., student background
(A)
(B)
329,025
5,727
177
Least
Choice
(C)
42,286
755
42
Most
Q3
Q2
Choice
(D)
(E)
(F)
30,298 117,274 139,167
619
1,648
2,705
37
47
51
Mean S.D.
39.5% 10.6%
32.8%
Mean
37.8% 38.7%
997
12%
10%
9%
54%
14.3
13.8
47.0
997
57
102
171
62
201
47.0%
2.7
2.5
25.9
81
973
19%
17%
8%
56%
13.9
13.6
40.7
974
995
11%
12%
4%
55%
14.3
13.9
45.6
997
997
13%
13%
14%
55%
14.3
13.8
47.3
996
1004
9%
6%
7%
54%
14.3
13.9
48.7
1004
67
90
31
16
56
101
168
65
49
89
170
65
71
101
172
65
51
105
171
59
Notes : See text for description of SAT sample. Individual-level measures weight observations by inverse
sampling probability. Schools are unweighted for school-level measures. Individual- and school-level
standard deviations in Column B are computed over individuals and schools, not over MSA means. Choice
quartiles are index values 0-0.5 (Q4); 0.5-0.75 (Q3); 0.75-0.875 (Q2); and 0.875-1 (Q1).
(A)
1.74
(0.04)
Interaction of student background average with:
* Choice index
Avg. student background index
(B)
1.72
(0.17)
(C)
1.49
(0.15)
(D)
0.09
(0.27)
(E)
-2.35
(2.34)
(F)
0.76
(2.45)
0.02
(0.20)
-0.41
(0.13)
2.08
(0.51)
-0.34
(0.12)
1.94
(0.46)
0.09
(0.02)
-0.09
(0.15)
0.99
(0.44)
0.04
(0.02)
-0.33
(0.36)
0.03
(0.18)
0.18
(0.21)
3.02
(1.71)
1.56
(0.52)
-0.02
(0.06)
n
n
y
y
y
-0.11
(0.17)
1.16
(0.49)
0.03
(0.03)
-2.33
(1.09)
-1.60
(0.80)
-0.02
(0.20)
1.89
(1.90)
2.31
(0.64)
-0.03
(0.05)
-1.16
(0.65)
0.02
(0.03)
1.14
(0.82)
y
0.77
0.74
0.77
0.74
0.78
0.74
0.78
0.75
0.78
0.75
0.78
0.75
* MSA SAT-taking rate
* ln(Population)
* Pop: Frac. Black
* Pop: Frac. Hispanic
* ln(mean HH inc.)
* Gini, HH inc.
* Pop: Frac. BA+
* Foundation plan state
* Pop: Frac. White2
* ln(Density)
* Pop: Frac. LTHS
* Census division FEs
2
R
R2, within MSAs
Notes : Sample in each column is 5,727 schools in 177 MSAs. Dependent variable is the weighted mean SAT
score at the school. Within MSAs, observations are weighted by the estimated number of SAT-takers at the
school (i.e. by the sum of individual sampling weights); these are adjusted at the MSA level to make total
MSA weights proportional to the 17-yr-old population. All models include 177 MSA fixed effects, and
standard errors are clustered at the MSA level.
57
58
Table 1.5.
Effect of Tiebout choice on the school-level SAT score-peer group gradient:
Alternative specifications
(A)
Mean peer quality * choice
Full Sample
(B)
(C)
S.D.(peer quality)
Base
Public
Schools
Only
(E)
Multi-District
Markets Only
(F)
-0.10
(0.17)
-0.46
(0.17)
0.53
(0.08)
Mean family inc. ($1,000s) * choice
-0.16
(0.44)
Peers: Fr. Black
126.5
(21.9)
Peers: Fr. Hispanic
74.1
(14.5)
Peers: Fr. Asian
82.1
(22.7)
Peers: Fr. other race
38.9
(20.3)
N
R2
2
R , within MSAs
(D)
-0.09 -0.15 -0.11
(0.15) (0.15) (0.13)
5,727
0.77
0.74
5,139 5,727
0.78 0.80
0.75 0.77
No
Basic Preferred
Full
Controls Controls Controls Controls
(A)
(B)
(C)
(D)
(E)
Panel A: NELS 8th grade score (205 MSAs; 707 schools; 23.3 students per school)
Avg. student background index
1.90
1.97
2.62
7.76
7.24
(0.08) (0.15)
(0.30)
(12.75) (13.04)
* choice
-0.09
-0.42
-0.57
(0.20)
(0.25)
(0.40)
Panel B: NELS 12th grade score (202 MSAs; 682 schools; 12.1 students per school)
Avg. student background index
1.47
1.62
2.26
10.65
(0.14) (0.19)
(0.42)
(12.13)
-0.56
(0.56)
3.94
(13.82)
* choice
-0.19
-0.37
0.14
-0.04
(0.32)
(0.44)
(0.53)
(0.57)
Panel C: NELS 8th-12th grade continuation rate (202 MSAs; 682 schools; 12.1 students per school)
Avg. student background index / 100
2.53
3.19
3.03
26.46
55.22
(0.49) (2.55)
(1.73)
(44.64) (49.82)
* choice
5,690
0.78
0.75
4,453
0.80
0.75
5,476
0.78
0.75
Notes : Dependent variable in all columns is school mean SAT score. All models include 177 MSA fixed
effects and main effects of the peer quality index (or mean family income, in Column D), as well as
interactions with the "MSA Characteristics" used in Table 1.4, Column E . Observations are schools,
weighted within MSAs by the sum of individual weights and across MSAs by the 17-year-old population; see
text. Standard errors are clustered at the MSA level. Sample size varies due to availability of regressors:
S.D.(peer quality) is set to missing when there are 5 or fewer observations; mean family income is calculated
over students who report non-missing values. Column E excludes private schools, while Column F excludes
18 MSAs with only a single district.
59
Table 1.6.
Effect of Tiebout choice on the school-level SAT score-peer group gradient:
Evidence from the NELS and the CCD
-0.84
-1.28
(3.04)
(1.99)
Panel D: CCD 9th-12th grade completion rate (50 MSAs; 931 school districts)
Avg. student background index / 1,000 1.99
2.79
5.34
(0.21) (2.45)
(2.00)
* choice
-0.90
(2.56)
-5.43
(2.47)
-0.19
(2.27)
-0.69
(2.34)
-28.06
(10.59)
-33.35
(14.37)
-7.08
(4.29)
-6.34
(4.25)
Notes : Specifications are similar to those in Table 1.4, columns A, B, C, E, and F, although the MSA SATtaking rate is excluded from all models. All models control for MSA fixed effects and all standard errors are
clustered at the MSA level. Sample for Panel A is schools in the original NELS 8th grade sample; Panels B
and C restrict sample to those schools with students in the 1988-1992 NELS panel. Student Background
Index in Panels A-C is fitted value from a within-school regression of composite test scores (8th grade in A;
12th in B and C) on student race, gender, and parental education measures, averaged to the school level and
dropping the school fixed effects. Sample in Panel D is public school districts in SAT-sample MSAs with nonmissing completion data (from the Common Core of Data) for at least two thirds of metropolitan enrollment.
Student quality in this panel is the index constructed from the SAT data, averaged over schools in the district.
60
Figure 1.1.
Schematic: Illustrative allocations of effective schools in Tiebout equilibrium, by size of peer
effect and number of districts
Table 1.7.
Effect of Tiebout choice on average SAT scores across MSAs
(A)
40.7
(9.2)
Choice index
MSA SAT-taking rate
MSA SAT-taking rate
(B)
36.5
(10.1)
(C)
-16.6
(5.0)
28.7
(27.4)
(D)
-16.1
(5.2)
(E)
-26.3
(5.1)
(F)
-16.1
(5.0)
(G)
-14.1
(5.1)
(H)
-13.7
(5.8)
-3.5
3.8
-88.4
(13.1) (19.1) (17.8)
39.1
(68.5)
25.2
(72.9)
2
-157.2 -141.9
(81.6) (86.3)
Avg. bkgd. index, SAT-takers
1.58 1.58
1.79
1.75
(0.06) (0.07) (0.06) (0.12)
1.78
(0.12)
1.80
(0.13)
0.6
(1.1)
0.1
(1.1)
-0.2
(1.3)
Pop: Frac. Black
39.9
(22.2)
41.5
(22.0)
50.3
(41.8)
Pop: Frac. Hispanic
51.0
(14.1)
58.1
(14.5)
63.2
(29.7)
ln(Population)
5.9
(0.9)
ln(mean HH inc.)
-4.3
(8.6)
Gini, HH inc.
-2.8
(8.6)
-5.0
(9.8)
-180.9 -170.2 -178.5
(61.8) (61.6) (70.7)
Pop: Frac. BA+
Foundation plan state
Pop: Frac. White
164.6
(27.0)
162.6
(26.8)
163.8
(33.1)
-3.2
(2.4)
-2.9
(2.3)
-2.8
(2.4)
2
4.7
(24.1)
ln(Density)
1.0
(1.6)
Pop: Frac. LTHS
2.4
(33.0)
y
Census division FEs
n
R2
0.10
n
0.11
n
n
0.80
0.80
y
0.87
y
y
0.93
0.93
0.93
Panel A: Infinitesimal districts, with
no concern for peer group (δ = 0)
Panel B: Infinitesimal districts, with
moderate concern for peer group (δ = 1.5)
2.5
2.5
2
2
District
Effectiveness /
Desirability
1.5
(µ j )
1
0.5
0
0
0.2
0.4
0.6
Family background (x ij )
(x j δ + µ j )
1
0.5
0
District
Desirability
1.5
0.8
1
Effectiveness
(µ j )
0
Panel C: Ten districts, with
no concern for peer group (δ = 0)
0.2
0.4
0.6
Family background (x ij )
0.8
1
Panel D: Ten districts, with
moderate concern for peer group (δ = 1.5)
2.5
2.5
2
2
1.5
1
1
0.5
0.5
0
0
0
0.2
0.4
0.6
District
Desirability
1.5
District
Effectiveness /
Desirability
0.8
Effectiveness
0
1
Panel E: Three districts, with
no concern for peer group (δ = 0)
0.2
0.4
0.6
0.8
1
Panel F: Three districts, with
moderate concern for peer group (δ = 1.5)
2.5
2.5
2
2
1.5
1
District
Desirability
1.5
District
Effectiveness /
Desirability
1
Effectiveness
0.5
0.5
0
0
0
0.2
0.4
0.6
0.8
0
1
0.2
0.4
0.6
0.8
1
Notes: Each panel illustrates one possible equilibrium in a market characterized by the listed market structure and parental
valuations. In each panel, income is uniformly distributed and effectiveness parameters are equally spaced on the [0, 1]
interval. See text for details.
Notes : Dependent variable is the weighted mean SAT score at the MSA level; there are 177 MSAs in the
sample. MSAs are weighted by the sum of SAT-taker weights.
61
62
Figure 1.2.
Simulations: Average effectiveness of equilibrium schools in 3- and 10-district markets, by
income and importance of peer group
Panel A: No concern for peer group (δ=0)
Panel B: Small concern for peer group (δ=0.5)
2
0
-1
10 districts
1
1
No concern for peer group
(δ =0)
0
-1
-2
-2
0
0.2
0.4
0.6
Income Percentile
0.8
0
1
Panel C: Moderate concern for peer group (δ=1.5)
0.2
0.4
0.6
Income Percentile
0.8
1
Panel D: Large concern for peer group (δ=3)
2
Avg. Effectiveness of
Equilibrium School
2
1
0
1
0
Small concern for peer group
(δ =0.5)
0.75
with respect to peer group
1
θ (δ, J ): Average slope of effectiveness
3 districts
Avg. Effectiveness of
Equilibrium School
Avg. Effectiveness of
Equilibrium School
2
Avg. Effectiveness of
Equilibrium School
Figure 1.3.
Simulations: Slope of effectiveness with respect to average income in Tiebout
equilibrium, by market structure and importance of peer group
Moderate concern for peer
group (δ =1.5)
0.5
0.25
Large concern for peer group
(δ =3)
-1
-1
0
-2
-2
0
0.2
0.4
0.6
Income Percentile
0.8
1
0
0.2
0.4
0.6
Income Percentile
0.8
1
Notes : Each horizontal segment in each figure represents the average of 5,000 draws, where income has a standard normal
distribution and effectiveness parameters for each income bin are drawn from the same distribution, then permuted to find an
equilibrium assignment. See text for details.
63
2
3
4
5
6
7
# of districts (J )
8
9
Notes : Each point is the coefficient of a separate "within" regression of school effectiveness ( µ ) on average
income, estimated on 5,000 simulated markets with a fixed effect for each market. See text for details.
64
10
Figure 1.5.
Student characteristics and average SAT scores, school level
Figure 1.4.
Distribution of district-level choice indices across 318 U.S. metropolitan areas
1400
30
Concentrated
25
1200
Highly
Concentrated
Average SAT Score
Number of MSAs
20
15
1000
10
800
5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Choice Index, District Level
(0=local monopoly; 1=infinitesimal districts)
600
800
900
1000
1100
Student Background Index
Notes : Each point represents a single school; a randomly selected 25% subsample of schools is shown here. Circle areas
are proportional to the sum of SAT-taker weights at the school. The dark line represents a weighted regression on the full
sample with fixed effects for 177 MSAs; the line has slope 1.74.
65
66
Figure 1.7.
"Upper limit" effect of fully decentralizing Miami's school governance on the
across-school distribution of SAT scores
Figure 1.6.
Nonparametric estimates of the school-level SAT score-peer group relationship, by
choice quartile
1500
Observed scores
1200
Average SAT Score
1300
Kernel Mean SAT Score
1100
1000
900
700
850
900
950
1000
1050
Fitted trend line
900
500
800
1100
Average Background Index
Notes : Figure displays kernel estimates (using an Epanechnikov kernel and a bandwidth of 5 points) of the school-level
conditional mean SAT score as a function of the school average background index in each of 4 quartiles of the districtlevel Tiebout choice index. Schools are weighted by the number of SAT-takers, with weights adjusted so that MSAlevel total weights are proportional to 17-year-old populations. Estimates are not displayed for background index
values below the first percentile or above the 99th percentile of the school-level distribution.
67
1100
700
Least-Choice Quartile
3rd Quartile
2nd Quartile
Greatest-Choice Quartile
800
Upper limit effect of move to
maximum choice
Counterfactual trend line
900
1000
Average student background
1100
Notes : Hollow circles are observed average SATs at schools in the Miami PMSA; circle areas are proportional to
the square root of the number of SAT-takers at the school. "Fitted trend line" represents fitted values from the
model in Table 1.4, Column E. "Counterfactual trend line" represents the fitted values after complete
decentralization of Miami school governance (i.e. after the choice index goes from 0 to 1), if the choicebackground index interaction effect is assumed to be at the upper limit of the estimated 95% confidence region
from that model. Shaded circles represent counterfactual SAT averages for the schools that observed Miami peer
groups might attend under these assumptions.
68
public and private students where Hoxby’s sample includes only public schools; and use
Chapter 2.
somewhat different control variables and weighting strategies than does Hoxby’s analysis.
Does Competition Among Public Schools Really
Benefit Students? A Reappraisal of Hoxby (2000a)
uncovering the sources of the divergent conclusions. I begin by building a sample and
This chapter presents a reanalysis of Hoxby’s data, conducted with an eye toward
specification that mirrors as closely as possible that described in her published paper. Even
with the restricted-access National Education Longitudinal Study (NELS) data that Hoxby
uses, however, I am unable to replicate her exact sample or point estimates. Using one of
2.1.
Introduction
Hoxby’s two instruments—I have been unable to obtain or replicate her “larger streams”
Hoxby (2000a) argues that in metropolitan areas where governance of schools is
divided among many small school districts, each with a local monopoly, the need to attract
variable for use in the current analysis—I estimate a small, insignificant negative effect of
choice on public school students’ test scores.
residents may constrain school administrators from their self-interested tendencies to
I go on to consider the robustness of the NELS-based analysis to four potentially
inefficient production. Unlike some previous empirical tests of forms of Brennan and
important modifications of the basic replication specification. I find several causes for
Buchanan’s (1980) Leviathan Hypothesis, Hoxby finds significant positive effects of
concern about the validity of Hoxby’s conclusions, as estimates of models similar to hers
jurisdictional fragmentation on student outcomes, which she interprets as evidence in
appear to be quite sensitive to the exact sample and specification and to have substantially
support of the claim that schools respond to “Tiebout”-style competition (Tiebout, 1956).
greater sampling variability than her reported standard errors suggest.
Hoxby’s results appear to conflict with the conclusion in the previous chapter that
First, I propose an alternative instrument intended to exploit the same source of
choice among jurisdictions is unlikely to create incentives for schools to become more
exogenous variation used by Hoxby’s “streams” instruments. My proposed instrument, a
effective. The most direct conflicts are with Table 1.7, which indicates a significant negative
measure of the degree of choice in 1942, is substantially more powerful than the streams
effect of “Tiebout choice” on average SAT scores across metropolitan areas, and with Table
variables, while arguably equally valid. Like less precise estimates using Hoxby’s “smaller
B2 (in the Appendix), which presents similar but mostly insignificant estimates from
streams” instrument, the 1942 choice instrument indicates essentially zero effect of choice
instrumental variables specifications similar to Hoxby’s. However, there are potentially
on student test scores.
important differences between the two analyses: The SAT regressions are conducted at the
metropolitan area level, in contrast to Hoxby’s individual-level regression; include both
Second, I note potentially important coding errors in the data set used to link NELS
schools to the metropolitan areas in which they are located. When these coding errors are
69
70
repaired—using information on the demographic characteristics of schools’ zip codes as an
hers that allows for school effects, and also consider less parametric estimators that are
independent source of information on the schools’ locations—the estimated choice effect
robust to more general forms of residual autocorrelation. All of my autocorrelation-robust
becomes substantially smaller (more negative) for all specifications considered.
estimators produce substantially larger standard errors than are implied by the classical
Third, I address the implications of Hoxby’s restriction of her sample to students
enrolled in public schools. Hoxby notes (Table 6) a significant negative effect of public
assumptions. They indicate that even Hoxby’s point estimate of the choice effect may be
indistinguishable from zero when its sampling error is estimated appropriately.
school competition on private enrollment rates. Hsieh and Urquiola (2002) point out that if
I conclude that Hoxby’s positive estimated effect of interdistrict competition on
the marginal private school student is positively selected, the effect of choice on average
school productivity is not robust, and that a fair read of the NELS evidence suggests that
public-sector student performance is an upward-biased estimate of choice’s effect on school
any such effect is likely small and indistinguishable from zero. I do not find evidence of
productivity. I test for this by including a control for the MSA private enrollment rate in
endogeneity of the choice index to school quality, suggesting that the more precise negative
Hoxby’s base model, and also by estimating her specification on a sample that includes both
(but insignificant) OLS effect of school choice on student outcomes should be preferred to
public and private schools. The first test offers supportive evidence of the hypothesized
less precise IV estimates. As I am unable to duplicate Hoxby’s precise sample, however, I
bias, as the point estimate of the choice effect is smaller in models that control for the
cannot be sure that these results would hold up in that sample. Similarly, as I consider here
private enrollment rate. The second test is less conclusive, shrinking the estimated effect
only one of Hoxby’s specifications, I cannot speak to the effect of the current adjustments
when the streams instrument is used but producing slightly larger estimates in other
on the other specifications in her paper. An implementation of Hoxby’s specification in the
specifications. One explanation may be that the NELS sample, with fewer than two private
SAT data supports my conclusions from the NELS, and indicates that the significance of the
schools per metropolitan area, is simply too small to estimate metropolitan private
effects indicated in Tables 1.7 and B2 may also be sensitive to the precise specification used.
enrollment shares reliably.
Finally, I study the sampling error of the coefficients in individual student regression
models similar to those that Hoxby estimates. Regression errors of students within the same
2.2.
Data and Methods
Hoxby studies the cross-sectional relationship between student outcomes and the
metropolitan area, district, or school may be correlated, and classical assumptions therefore
degree of competition among public education providers. She considers two measures of
probably understate the variance of coefficient estimates. Hoxby proposes an error
intergovernmental competition within a metropolitan area—essentially, the number of
components model in which there are metropolitan- and district-level error components, but
schools and the number of districts per student, adjusted for the uniformity of school and
no component coming from the school itself. I implement a variance estimator similar to
71
72
district sizes. Her primary discussion and her main results relate to a choice index defined
minimizing the number of decisions required of the analyst. The NELS data offer
over districts, however, and I study that index exclusively.
substantially better geocoding than the NLSY and a stronger link of students to their
Hoxby argues that the current district choice index is potentially endogenous to
schools. Within the NELS, the 8th grade scores permit the most straightforward analysis:
school quality, if consolidation of school districts has been less prevalent in areas with poor
The NELS is a panel beginning with an initial sample of 8th graders in 1988, with several
schools.1 She proposes that measures of the topographical character of the area, which may
sample “freshenings” thereafter, and later years of the data offer a multiplying array of
have influenced the initial design of district boundaries when the area was first partitioned,
weights and options for matching students to metropolitan areas.
are valid instruments for current choice. She implements this with two measures of the
I attempt to define control variables similar to those used by Hoxby. Like her, I
number of rivers and streams flowing through each metropolitan area. One is derived from
draw district-level demographic characteristics from the School District Data Book (SDDB),
a publicly available electronic data source, the Geographic Names Information System
a tabulation of data from the 1990 Census along school district boundaries. However, where
(GNIS), and the other from a hand count of larger rivers on printed maps. Only the first of
Hoxby derives metropolitan area demographic characteristics from the City and County
2
these variables is available for the current analysis. Inspired by Hoxby’s basic reliance on
Data Book (CCDB)—which reports 1980 census demographic characteristics—I use instead
initial conditions as sources of exogenous variation, I construct a choice index from the
county-level tabulations of 1990 Census data from the Summary Tape File 3A.3
number of districts that existed in 1942 in the area covered by a current metropolitan area.
The metropolitan area definitions used at all points in my analysis are the Office of
This predates the post-war wave of consolidation that inspires Hoxby’s argument for
Management and Budget’s Metropolitan Statistical Area (MSA) definitions of June 30, 1990,
endogeneity of current choice, and plausibly leverages the initial conditions that are the
used to characterize metropolitan areas in 1990 census data. Each enumerated sub-area
source of topographic variables’ power.
(PMSA) within the largest urban agglomerations is treated as a distinct metropolitan area.
Hoxby’s data on student outcomes are drawn from two sources, the NELS and the
I use data from the 1990 Common Core of Data (CCD), an annual census of public
National Longitudinal Survey of Youth (NLSY). She analyzes several outcome measures
schools and school districts, to construct Hoxby’s Herfindahl-based index of choice among
from each data set: 8th, 10th, and 12th grade test scores from the NELS, and educational
districts. Heeding Urquiola’s (1999) warning about the distinction between elementary and
attainment and long-run income from the NLSY. I focus here on 8th grade reading scores
from the NELS. This decision is intended to facilitate the replication of Hoxby’s sample by
1 Figure B1 (in the Appendix) provides a time series of district consolidation that casts doubt on the claim that
this is an important source of endogeneity to current school quality.
2 This is evidently the more powerful of Hoxby’s instruments (see her Table 2, reported in Table 2.1 below).
3 A more exact replication would use the 1980 characteristics. I rely on 1990 data for two reasons. First, this
seems a more appropriate measure of MSA characteristics relevant to 8th grade students in 1988. Second, I am
unable to determine how Hoxby calculates one of her control variables, the “ethnic homogeneity index,” from
the CCDB, which does not seem to tabulate ancestry.
73
74
secondary districts, which parents cannot be said to choose between, I construct the choice
index using enrollment in grades 9-12 only.
It is worth noting that this analysis has relied upon the description of Hoxby’s
methods contained in her published paper, and that there may be differences between
The NELS data are matched to metropolitan areas in two ways. First, following
what appears to be Hoxby’s approach, the district codes on the restricted-access NELS file
are used to match NELS public schools to the 1990 CCD, which contains MSA codes for
metropolitan districts. This yields a base replication sample of 11,480 students with valid 8th
grade reading scores and demographic characteristics, only slightly more than Hoxby’s
Hoxby’s original data and my replication sample beyond those described above.5
2.2.1.
Econometric framework
I develop here a much-simplified version of Hoxby’s notation that suffices to
describe the issues of present concern. Let i index students; s schools, d school districts, and
m metropolitan areas. Hoxby’s basic model can be expressed as:
reported sample of 10,790, which I am unable to replicate despite repeated efforts.4 A
Aisdm = X isdm β + e isdm ,
second MSA match exploits variables on the NELS school file that provide detailed
demographic characteristics of the school’s zip code. Two of these—exact counts of
(1)
where X isdm is a vector that includes the metropolitan area choice index C m and other
housing units and population—uniquely identify zip codes in the STF 3B tabulation of the
control variables that may vary at any of the four levels considered here; Aisdm is a student
1990 census, the source of the NELS data. As most zip codes lie either entirely within an
outcome; and e isdm is an error term that may be correlated with C m but is not correlated with
MSA or entirely outside any MSA, this uniquely assigns the vast majority of schools in the
the remainder of X isdm or with a vector of instruments Z m .
NELS. The zip code match indicates that the CCD incorrectly codes the MSA for 24 NELS
schools. Section 2.4 explores the implications of this for the estimated model.
Hoxby estimates β by traditional least squares regression and by instrumental
variables, using student-level observations in either case. Unless the determinants of student
performance are completely specified in X, errors in (1) would not, in general, be expected
to be independent across students in the same school, district, or even metropolitan area.
Thus, we might write
My sample includes students from 197 MSAs, substantially fewer than the 211 that Hoxby reports. There
seem to be a number of typographical and coding errors in Hoxby’s MSA counts, however. Her 12th grade
NELS sample, for example, is reported as representing 316 MSAs in her Table 3 but only 209 in Table 4. One
can obtain Hoxby’s 211 figure (for the 8th grade sample) by matching the full set of NELS schools to the CCD
and counting all unique MSA codes, including those for a failed NELS-CCD match and for a non-metropolitan
school in the CCD. When missing value and invalid or duplicative MSA codes (the Denver PMSA, e.g., is
variously coded as 2080 and 342080) are eliminated, this drops to 203. It drops further to the above 197 figure
when only NELS schools that provide valid observations for the 8th grade sample are included. The count can
be increased somewhat using different MSA definitions or, as I discuss below, by repairing some invalid codes
on the CCD. I have not found a sample definition, however, that produces more than 205 MSAs.
4
75
e isdm = µ m + θ dm + ψ sdm + ε isdm ,
5 Two differences seem especially likely. First, there are apparently several versions of the SDDB data in
circulation. The data used here were generously provided by Cecilia Rouse, who obtained them from the
National Center for Education Statistics’ original contractor, and seem to be more complete than are other
extant versions. I am unsure what version Hoxby used. A second potential difference derives from the
construction of the choice index—I am unsure which measure of enrollment Hoxby used for this purpose.
76
(2)
with each component identically and independently distributed across markets, districts,
In most of the results presented here, I do not account for the non-classical error
schools, or individuals and independence of components across aggregation levels. In this
structure but instead report conventionally calculated standard errors. Surprisingly, these are
generalized random effects model, the most efficient estimator is maximum likelihood,
quite similar to those that Hoxby reports from her random effects model. In Section 2.6, I
although feasible generalized least squares is asymptotically equivalent. OLS (or IV) that
explore the standard error calculation, using first an implementation of the Moulton
does not take account of the error structure is nevertheless consistent, although traditional
estimator and second the less parametric “cluster” estimator that does not impose the
estimators of the sampling error of the resulting coefficients are biased. The true sampling
structure of (2) and (4) but allows for arbitrary correlation among observations within a
variance of OLS is given by Moulton (1986):
single MA. All of my autocorrelation-robust standard errors are substantially larger than are
( )
var βˆ ols = (X ' X )−1 X ' ΩX (X ' X )−1 , where
(3)
Ω ≡ var (e ) = σ µ2 Q m + σ θ2 Q dm + σ ψ2 Q sdm + σ ε2 I N .
(4)
the conventional estimates, and indeed they suggest that even Hoxby’s relatively large point
estimate of the choice effect may be indistinguishable from zero when standard errors are
appropriately calculated. The cluster estimator, in particular, is quite well behaved, very
I N is an N-by-N identity matrix, while Q m , Q dm , and Q sdm are block-diagonal matrices
nearly duplicating the more parametric, more involved Moulton-style estimators.
consisting of blocks of ones within each metropolitan area, district, or school, respectively
and zero elsewhere. (That is, Q m ≡ WW ' , where W is an N-by-M matrix of indicators for
2.3.
M metropolitan areas, and Q dm and Q sdm are defined similarly.) The extension is
Replication
Table 2.1 reports Hoxby’s “first stage” model (from her Table 2) and analogous
straightforward to
models derived from the replication sample.7 Although the instrument sets and samples are
( ) [
var βˆ iv = X ' Z (Z ' Z )−1 Z ' X
]
−1
[
X ' Z (Z ' Z )−1 Z ' ΩZ (Z ' Z )−1 Z ' X X ' Z (Z ' Z )−1 Z ' X
]
−1
. (5)
Moulton (1986) proposes a feasible estimator of (3) that simply replaces the variance
component terms in (4) with consistent estimates (σˆ µ2 , σˆ θ2 , σˆ ψ2 , σˆ ε2 ) . Hoxby writes that she
uses Moulton’s formula allowing for error components coming from the MSA and district,
implicitly imposing σ ψ ≡ 0 .
2
6
6 There are several available estimators of the error component variances in (4), and neither Moulton nor
Hoxby specifies which is to be used. These parameters may be estimated from the the contrast between
individual and group-mean residual variances; from the residual variance of between and within estimators;
from the covariances among observations within groups; or from an optimal minimum distance estimator
using the entire empirical covariance matrix ee ' . In finite samples, these will produce slightly different
estimates of var βˆ ols . My implementation uses the first of these, as described in Greene’s (2000) discussion
of random effects in unbalanced panels.
7 The models in Table 2.1 are estimated on the universe of MSAs, and are therefore slightly different than the
actual first stages to the IV regressions shown later, which are estimated on individual student observations
from a subset of MSAs. As in the student-level sample, my MSA-level sample size differs from Hoxby’s. One
possible explanation is that Hoxby reports the first stage from her NLSY sample, which is matched to different
metropolitan definitions than are the NELS data. There are 314 MSAs according to the 1983 county-based
definitions; when additional codes for failed matches and non-metropolitan are included one might obtain
Hoxby’s reported count of 316.
( )
77
78
slightly different, the basic results are similar.8 Note that in Hoxby’s model, reported in
fails to reject equality of the two replication models. As Hoxby’s IV model clearly rejects the
Column 1, the “smaller streams” instrument accounts for a much larger share of the variance
OLS point estimates, this suggests that her results may derive primarily from the larger
of the choice index than does the “larger streams” instrument, which is not available for the
streams instrument. This variable is not available for the current analysis, and the reader
current analysis. Moreover, in the replication model excluding the latter variable, in Column
should keep in mind the possibility that conclusions from the replication analysis may not
B, the former variable’s coefficient is similar to that reported by Hoxby. Finally, note that
generalize to the model actually estimated by Hoxby.
the 1942 choice index is a substantially more powerful predictor of 1990 choice than are
either of the streams variables.
The estimates in columns F and G use the 1942 district structure as an instrument
for the 1990 choice index, first alone and then in combination with the smaller streams
Table 2.2 reports basic replication estimates of OLS and IV models for the NELS 8th
variable. These indicate negative, though statistically insignificant, effects. Hausman tests
grade reading score. Columns A and B report Hoxby’s reported coefficient and standard
for these models fail to reject equality of the choice coefficient with that indicated by OLS,
error on the district choice index, estimated by OLS and IV respectively. Column C reports
although tests of the full coefficient vector do reject. Recall, though, that the replication
th
her coefficients from the IV model for 12 grade reading scores, the baseline model in her
standard errors are calculated under classical assumptions, assuming iid errors, and likely
paper and the only one for which control coefficients are reported. Columns D through G
overstate the precision of the estimates.9
report OLS and three different IV estimates of the 8th grade reading model using the
replication sample. The control variable coefficients in the replication sample are broadly
th
similar to those reported by Hoxby from her model for 12 grade scores.
Columns D and E both report negative point estimates of the choice effect, each
2.4.
Sensitivity to Geographic Match
There are several inconsistencies and apparent coding errors in the CCD
metropolitan area variable. The precision and accuracy of coefficient estimates can be
notably smaller than is indicated for the corresponding model from Hoxby’s paper in
improved by removing the measurement error that these coding errors produce. One error,
Columns A and B. The divergence of OLS estimates suggests that this is largely due to
duplicate codes for some MSAs, is mentioned above (see footnote 4), and is corrected in the
differences in the sample and in control variables rather than to the absence of the “larger
base replication sample analyzed in Section 2.3. Others require more caution. In this
streams” instrument in column E. However, the difference between OLS and IV estimates
section I report estimates of the choice effect from samples that repair apparently erroneous
is much smaller in the replication sample than in Hoxby’s results, and indeed a Hausman test
One major point of divergence is the population coefficient. Hoxby reports that her population measure is
scaled in thousands, so I multiply her coefficient by 1,000 to obtain the effect-per-ten million reported in Table
2.1. It seems clear that Hoxby’s coefficient is actually scaled similarly to mine.
9 Given this, it is surprising that the replication standard errors in columns D and E are so similar in magnitude
to those reported by Hoxby for corresponding models (columns A and B), which she describes as “us[ing]
formulas (Moulton, 1986) for data grouped by districts and metropolitan areas.” I revisit the standard error
question in Section 2.6, where I obtain replication standard errors using Moulton’s formulas that are
substantially larger than those reported in Table 2.2.
79
80
8
MSA codes in the CCD. I use two sources of independent information on schools’
metropolitan status. Correcting the MSA code in these cases affects 8 NELS schools.10
locations: The county codes contained in the CCD file, which are sometimes inconsistent
Row 3 of Table 2.3 indicates that this produces larger (more positive) estimates of the choice
with the reported MSA codes but which I take to be generally more accurate, and
effect in the IV specifications.
demographic characteristics of schools’ zip codes from the NELS school survey, which can
Row 4 of Table 2.3 adds to the sample four schools that are non-metropolitan
be linked to MSAs—through the zip code tabulation of 1990 Census data, the STF-3B, from
according to the CCD but whose NELS zip code information places them within MSA
which they are drawn—with ambiguity in only a small fraction of cases.
counties. Row 5 adds an additional school in New England, where counties are not
The first row of Table 2.3 repeats the choice index coefficients from Table 2.2, while
sufficient to establish metropolitan status because MSAs are based on towns rather than
the remaining rows report the estimated coefficients as corrections to the CCD MSA codes
counties, for which the zip code location and the CCD school address agree that it is indeed
are gradually implemented, beginning with the clearest errors and proceeding to less obvious
within a metropolitan area.11 These two final sample alterations account for less than one
cases. The results indicate that the estimated choice effect is quite sensitive to the exact
percent of the sample, but nevertheless have large downward effects on the estimated choice
sample used, and that it is smaller in the repaired sample than in that used in Table 2.2.
coefficient. Most notable is the final alteration, which affects less than 0.2% to the sample
The most obvious coding error in the CCD consists of obsolete MSA codes. Thus,
but reduces the streams estimate of the choice effect by 0.27.
for example, the Kansas City, Kansas school district is coded as being in MSA 3755.
2.5.
Although the 1990 CCD purports to report 1990 MSA codes, there is no MSA numbered
Are Estimates From the Public Sector Biased?
Hoxby’s analysis is limited to NELS students attending public schools, as are the
3755 in 1990. However, there is a 1983 PMSA with this number, the Kansas City, KS
PMSA, which in 1990 is demoted (with the Kansas City, MO PMSA) into the Kansas City,
replications presented thus far. This can create sample selection bias, which would—under
KS-MO MSA, number 3760. Repairing errors of this sort adds 11 NELS schools and 216
reasonable assumptions—be expected to bias the choice effect upward relative to the effect
students to the replication sample. Row 2 of Table 2.3 indicates, however, that this has
of interest in Hoxby’s paper, the response of public school administrators to competitive
negligible effects on the estimated coefficients.
pressures. This point has been made convincingly by Hsieh and Urquiola (2002) in the
There are several additional districts in the CCD for which the MSA codes are valid
but inconsistent with the reported county location. One example is the Baker County
School District in Florida, reported as in the Jacksonville MSA despite Baker County’s non-
10 A potential explanation for these inconsistencies is that some districts may span counties, serving areas both
inside and outside a metropolitan area. For this reason, I checked the CCD county codes against the NELS zip
code location—which should describe the location of the school itself rather than that of the district
headquarters—before overruling the CCD MSA assignment, although in practice these never disagreed.
11 Unfortunately, the confidentiality of the geocode NELS data preclude a description of the specific changes.
81
82
context of a Chilean school choice program; I merely summarize a simplified version of their
argument in the current notation.
There are two obvious ways to correct Hoxby’s specification for this bias. First, in
the spirit of so-called “Heckman corrections” (Heckman, 1979; Card and Payne, 2002), one
can control directly for functions of the MSA private enrollment rate in models for public
Suppose that X isdm in equation (1) contains all school- and district-level variables
school students. Second, one can estimate the model on a sample that includes private
that differ systematically between the public and private sectors, so that
E[θ dm + ψ sdm |s is a public school ] = E[θ dm + ψ sdm |s is a private school ] = 0 .
(6)
It still may be the case that students who self-select into private schools differ systematically
school students. If the sample selection bias is in the hypothesized direction, either strategy
should produce a smaller (more negative) estimate of the effect of interdistrict competition.
Private enrollment rates are readily measured from the 1990 STF files that provide
from those who choose the public sector. Let f m ≡ E[ε isdm |i attends public school; m ] be
the average ε isdm of public school students in MSA m. Let γˆ public be an unbiased estimator of
the choice effect on average public school scores, and let γ denote the true effect of choice
[
]
on public school productivity. It is clear that E γˆ public = γ +
∂f m
, so that γˆ public will be an
∂c m
upward biased estimate of γ if choice draws high- ε isdm students into the public sector or
low- ε isdm students in the opposite direction, while γˆ public will be downward-biased if the
MSA demographic characteristics. Table 2.4 presents estimates that control for the
metropolitan private enrollment share, on the base replication sample in Panel A and on the
repaired sample (from Row 5 of Table 2.3) in Panel B. In each case, the point estimate of
the private enrollment share variable is quite large and negative, indicating positive selection
into private schools, though this effect is never significant. More importantly, in all eight
cases (two samples by four model specifications) the estimated choice effect is substantially
smaller—to again insignificantly—when the private enrollment share variable is controlled.
selectivity effect goes the opposite direction.
The second correction, implemented in Table 2.5, is made possible by the inclusion
In her Tables 5 and 6, Hoxby demonstrates a significant negative effect of choice on
private enrollment rates. She interprets this as evidence that “choice among public schools
is a substitute for choice of private schools,” and suggests that a higher level of the district
choice index reduces the tendency for “families with a strong taste for education [to] leave
the public sector by shifting their children into private schools …,” (p. 1233). If taste for
of private schools in the NELS sample. This introduces two complications, however. First,
because private schools are not included in the CCD, they must be assigned to MSAs on the
basis of their zip code.12 Second, the SDDB district demographic variables are unavailable
for private schools. Hoxby argues that the coefficient of interest should not be sensitive to
the exclusion of these variables. The first two rows of Table 2.5, which present
education is positively correlated with ε isdm , this suggests that ∂f m ∂c m is positive and
specifications on the repaired public school sample both with and without district-level
therefore that γˆ public is an upward-biased estimate of the effect of competition among public
controls, indicate that this is not entirely true, as three of the four choice effect estimates are
schools on public school productivity.
12
83
For consistency, I use only the zip-code-matched repaired sample of public schools for this analysis.
84