School of Mathematical Sciences
Queensland University of Technology
Addressing Issues in Sparseness, Ecological Bias and
Formulation of the Adjacency Matrix in Bayesian
Spatio-temporal Analysis of Disease Counts
Arul Earnest
B.Soc.Sc (Hons) in Statistics, National University of Singapore
MSc in Medical Statistics, London School of Hygiene and Tropical Medicine,
University of London
A thesis submitted for the degree of Doctor of Philosophy in the Faculty of Science and
Technology, Queensland University of Technology according to QUT requirements
Principal Supervisor:
Professor Kerrie Mengersen
Associate Supervisors:
Associate Professor Geoff Morgan
Professor Tony Pettitt
2010
KEYWORDS
Spatial, autoregressive, disease mapping, CAR model, birth defects, ecological bias,
neighbourhood weight matrix, forecasting, priors, Bayesian, MCMC, joint modeling.
i
ABSTRACT
The main objective of this PhD was to further develop Bayesian spatio-temporal models
(specifically the Conditional Autoregressive (CAR) class of models), for the analysis of
sparse disease outcomes such as birth defects. The motivation for the thesis arose from
problems encountered when analyzing a large birth defect registry in New South Wales.
The specific components and related research objectives of the thesis were developed
from gaps in the literature on current formulations of the CAR model, and health service
planning requirements. Data from a large probabilistically-linked database from 1990 to
2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR)
and Midwives Data Collection (MDC) were used in the analyses in this thesis.
The main objective was split into smaller goals. The first goal was to determine how the
specification of the neighbourhood weight matrix will affect the smoothing properties of
the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the
usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a sharedcomponent model in terms of modeling a sparse outcome, and this is carried out in
chapter 7. The third goal was to identify optimal sampling and sample size schemes
designed to select individual level data for a hybrid ecological spatial model, and this is
done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR
model, and along with demographic projections, provide forecasts for birth defects at the
SLA level. Chapter 9 describes how this is done.
ii
For the first objective, I examined a series of neighbourhood weight matrices, and
showed how smoothing the relative risk estimates according to similarity by an
important covariate (i.e. maternal age) helped improve the model’s ability to recover the
underlying risk, as compared to the traditional adjacency (specifically the Queen)
method of applying weights.
Next, to address the sparseness and excess zeros commonly encountered in the analysis
of rare outcomes such as birth defects, I compared a few models, including an extension
of the usual Poisson model to encompass excess zeros in the data. This was achieved via
a mixture model, which also encompassed the shared component model to improve on
the estimation of sparse counts through borrowing strength across a shared component
(e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this
example). Using the Deviance Information Criteria (DIC), I showed how the proposed
model performed better than the usual models, but only when both outcomes shared a
strong spatial correlation.
The next objective involved identifying the optimal sampling and sample size strategy
for incorporating individual-level data with areal covariates in a hybrid study design. I
performed extensive simulation studies, evaluating thirteen different sampling schemes
along with variations in sample size. This was done in the context of an ecological
regression model that incorporated spatial correlation in the outcomes, as well as
accommodating both individual and areal measures of covariates. Using the Average
Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the
iii
SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number
of controls, provided the lowest AMSE.
The final objective involved combining the improved spatio-temporal CAR model with
population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at
the Statistical Local Area (SLA) level in New South Wales, Australia. The projections
were illustrated using sixteen different SLAs, representing the various areal measures of
socio-economic status and remoteness. A sensitivity analysis of the assumptions used in
the projection was also undertaken.
By the end of the thesis, I will show how challenges in the spatial analysis of rare
diseases such as birth defects can be addressed, by specifically formulating the
neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age),
incorporating a ZIP component to model excess zeros in outcomes and borrowing
strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample
individual-level data and sample size considerations for rare disease will also be
presented. Finally, projections in birth defect categories at the SLA level will be made.
iv
TABLE OF CONTENTS
1
1.1
1.2
1.3
1.4
INTRODUCTION
Primary research aims and motivation
Content and scope of thesis
Structure of thesis
List of publications and conferences arising from thesis
1
1
4
9
10
2
2.1
2.2
2.2.1
2.2.2
2.2.3
2.3
2.4
DATA
Summary
Sources of data
Birth defects
Births and maternal characteristics
Areal-level indices of socio-economic status
Definition and classification of birth defects
Spatial and temporal trends of birth defects in New South Wales,
Australia
12
12
12
12
13
14
16
18
3
3.1
3.2
3.3
3.3.1
3.3.2
3.3.3
3.3.4
3.3.5
LITERATURE REVIEW
Summary
Spatial analysis of birth defects
Risk factors for birth defects
Maternal age at delivery
Maternal smoking during pregnancy
Socio-economic indicators
Maternal diabetes mellitus
Common risk factors for caesarean section rates/ spatial variation
21
21
22
25
25
26
28
30
31
4
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
CONDITIONAL AUTOREGRESSIVE (CAR) MODEL
Summary
Spatial epidemiology
Disease mapping
Geographical correlation studies
Formulation of the CAR model
Comparison of single disease CAR models
Studies that have applied CAR models
Studies that have compared single disease CAR models
Comparison of multiple disease CAR models
33
33
34
35
40
41
43
55
64
67
5
5.1
5.2
5.3
5.4
CAR MODELLING ISSUES
Summary
Bayesian Theory
Markov chain Monte Carlo (MCMC)
MCMC Convergence
74
74
74
76
77
v
5.5
5.6
5.7
5.8
5.9
5.10
Specifying the hyperprior distribution
Conjugate priors and improper priors
Sensitivity analysis on priors
Model selection techniques for spatial models
Modifiable Areal Unit Problem (MAUP)
Boundary analysis
78
81
82
85
87
89
6
6.1
6.2
6.3
6.4
6.5
6.6
NEIGHBOURHOOD WEIGHT MATRIX SPECIFICATION
Background
Aims
Methods
Results
Discussion
Conclusion
93
97
102
102
111
114
118
7
7.1
7.2
7.3
7.4
7.5
MODELLING SPARSE DISEASE COUNTS
Introduction
Methods
Results
Discussion
Conclusion
127
130
133
138
141
145
8
STRATEGIES FOR COMBINING AREAL WITH INDIVIDUAL
160
DATA
Introduction
164
Ecological bias
166
Addressing ecological bias
167
Sampling techniques and sample size
168
Methods
171
Data
171
Statistical model
172
Model comparison
175
Simulation
176
Example
178
Results
179
Discussion
181
Conclusion
185
8.1
8.1.1
8.1.2
8.1.3
8.2
8.2.1
8.2.2
8.2.3
8.2.4
8.2.5
8.3
8.4
8.5
9
9.1
9.2
9.3
9.4
9.5
9.6
FORECASTING BIRTH DEFECTS AT THE SMALL AREA
197
LEVEL
Introduction
201
Aim
204
Methods
204
Results
210
Discussion
212
Conclusion
214
vi
10
10.1
10.1
10.2
10.3
232
232
233
236
239
CONCLUSION
Summary of results
Implications of research
Limitations
Directions for future research
243
REFERENCES
vii
STATEMENT OF ORIGINAL AUTHORSHIP
"The work contained in this thesis has not been previously submitted to meet
requirements for an award at this or any other higher education institution. To the best of
my knowledge and belief, the thesis contains no material previously published or written
by another person except where due reference is made”.
________________
Arul Earnest
26th February 2010
viii
ACKNOWLEDGEMENTS
I would like to thank my principal supervisor, Professor Kerrie Mengersen, from
Queensland University of Technology (QUT), for her unlimited guidance and
supervision throughout the course of my PhD candidature. I am indebted to her for
introducing the field of Bayesian statistics to me. My appreciation also goes out to
Professor Tony Pettitt for facilitating the smooth flow of my PhD studies. I would also
like to express my gratitude to my associate supervisor, Associate Professor Geoff
Morgan, from the Northern Rivers University Department of Rural Health (University of
Sydney) for constantly providing input on my PhD, in particular the epidemiological,
study design and clinical implication aspects of the thesis. I have certainly enjoyed the
numerous thought-provoking discussions we had in his office in Lismore. I am equally
indebted to Professor John Beard, director of Ageing and Lifecourse at the World Health
Organisation, who was my previous supervisor. I would like to credit him with
providing me with the opportunity to start on this PhD studies, and also for his generous
advice and guidance on the manuscripts resulting from this thesis. My sincere gratitude
goes to Dr Lee Taylor and Dr David Muscatello from the New South Wales Department
of Health for providing me with useful advice on the data upon which this thesis is built
on, and also valuable opinion on the practical applications resulting from this thesis. I
would like to show my appreciation to the internal review panel from QUT and the
external examiners, whose comments and suggestions have strengthened the quality of
this thesis. Most importantly, I would like to thank my family members, especially my
wife Josephine for sacrificing those few important years and helping take care of our
two lovely young girls single-handedly. I am eternally indebted to her.
ix
CHAPTER 1. INTRODUCTION
1.1 . Primary research aims and motivation
This thesis aims to answer questions related to the small area analysis of sparse disease
counts in a geographical region. The first question relates to the formulation of the
Conditional Autoregressive (CAR) model, a commonly used statistical model in the
analysis of geographically aggregated data. Specifically, I wanted to evaluate whether
the formulation of the neighbourhood weight matrix has any impact on the smoothing
properties of the CAR model. In addition, I wished to examine whether there were any
differences between the adjacency and distance-based methods of assigning neighbours
in terms of recovering the underlying relative risk estimates.
The second hypothesis relates to the modeling or estimation of a sparse outcome, such as
birth defects. The questions I wished to answer were: “Can we better estimate the
outcome with a sparse count by jointly modelling it with another related outcome that
may share some latent risk factors?” and “Can we improve on the estimates by
incorporating a component (zero-inflated Poisson component through a mixture model)
to model the excess zeros in the data?”
The third broad question relates to a CAR regression model, and includes both
individual-level and areal measures of covariates. The question is, for sparse outcomes
like birth defects, what is the optimal sampling scheme to select individual-level data for
analysis in the hybrid model? Also, does the sample size have any impact on the
regression coefficient estimates from the hybrid model? To answer these questions, I
1
performed an extensive simulation analysis to evaluate 13 different scenarios, including
various sampling schemes and variations in sample size.
The fourth aim of this thesis was to provide a method for forecasting sparse outcomes at
a small-area level, which took into account spatial correlation in the data, optimal
neighbourhood weight matrix formulation, consideration of excess zeros in the data, as
well as population (women) forecasts for the next 30 years at the Statistical Local Area
(SLA) level in New South Wales, Australia. Sensitivity analysis based on different
population scenarios was also assessed.
The motivation for this thesis came about from the challenges faced when analyzing
birth defects from a large registry in New South Wales (NSW), Australia, as part of an
Australian Research Council (ARC) linkage grant. The first challenge faced was
sparseness of the disease outcome, especially when individual birth defects were
mapped in geographical locations, or even when defects were analysed in broader
groupings, according to the International Classification of Disease- British Pediatric
Association (ICD9-BPA) coding system. The problem was compounded when there
were a large number of areas with zero counts of particular defects. Secondly, one had to
contend with possible spatial correlations in the disease rates across geographical
regions. On a broader level, I found that the spatial analysis of birth defects was underresearched in the medical literature, and I hypothesize that this could be attributable to
the problems mentioned above, as well as inaccessibility to suitable statistical models
and software.
2
The impetus to fine-tune the CAR model was primarily driven by gaps in literature,
identified after an extensive literature review on single-disease and multiple-disease
CAR models was performed. Both spatial only and spatio-temporal models were
evaluated and compared. The review revealed that most of the models were applied to
outcomes that were not rare, and applied to data across broad time intervals, thus
ensuring that there were enough cases in each time point. Disease mapping studies
involving birth defects were few, and none of them actually accounted for spatial
correlation in the data. Almost all the spatial studies used the simpler formulation of the
Queen adjacency method of assigning neighbours, which I suspect was done out of
convenience. The various formulations of the CAR models also failed to incorporate
sparseness in the data, implicitly or explicitly.
3
1.2. Content and scope of thesis
This section details what is covered in this thesis and areas which are not within the
scope of this manuscript. This section also provides the links between the various
chapters.
In chapter 1, the motivation for undertaking the study is stated, along with the main aims
of this thesis. The content, scope and structure of the thesis are also presented in this
chapter. The source of data used in the analysis is described in chapter 2. Here, I also
provide the definition and classification of birth defects. A description of the current
state of birth defects in New South Wales, in terms of spatial and temporal trends is
given in this chapter.
A comprehensive literature review is provided in chapter 3. Summarized components of
the literature review are included in subsequent chapters, which are structured as
manuscripts to be submitted for publication. Firstly, I summarise spatial analytical
studies in relation to birth defects, to identify gaps in literature. Secondly, I provide a
review on selected risk factors, with a view to inform the analytical models for a
subsequent analysis that combines both areal with individual risk factors (i.e. chapter 8).
The chapter also reviews the literature on risk factors common to both birth defects and
caesarean section rates and spatial analysis of caesarean section rates for inclusion in
chapter 7, which looks at modelling the two outcomes jointly.
4
Chapter 4 introduces the CAR model. I provide readers with an understanding of the
context upon which the CAR model is applied and describe the two main fields of
application: namely disease mapping and geographical correlation studies. The
mathematical properties of the CAR model are also described, along with a brief section
on the adjacency matrix, which introduces a subsequent chapter which examines in
detail the impact of various neighbourhood weight matrices on the smoothing properties
of the CAR model (i.e. chapter 6).
In the same chapter 4, I also discuss the strengths and limitations of the various types of
CAR models commonly used. In addition, I examine the properties of spatial and spatiotemporal models, including specific comparisons about the nature of data (sparseness of
outcome) used in the studies reviewed, along with the priors and model selection
techniques. The results from these comparisons inform the modelling strategy adopted in
subsequent chapters. Comparisons were made within the multivariate (i.e. models
examining more than one disease outcome simultaneously) classes of models, and the
results used in chapter 7.
The CAR model predominantly uses the Bayesian framework of analysis. To help
readers familiarize with the context of Bayesian modeling, an introduction to Bayesian
theory in general, and the Markov Chain Monte Carlo (MCMC) algorithm is provided in
chapter 5. Here, I also discuss issues pertaining to the choice of prior distributions, and
conduct a sensitivity analysis including commonly suggested values for the prior
formulation. Other issues such as boundary analysis, the Modifiable Areal Unit Problem
5
(MAUP) and Bayesian model convergence diagnostics are discussed briefly, as these do
not relate to the main objective of the thesis.
In chapter 6, I examine in detail, the effect of various choices of neighbourhood weight
matrices (ranging from adjacency to distance-based functions, as well as weights based
on key covariates) on the smoothing properties of the CAR model. Addressing the issue
of sparse disease count is the focus of chapter 7, where I investigate the performance of
a CAR model with a zero-inflated Poisson extension, in terms of its ability to recover the
underlying risk surface of specific birth defects, such as Spina Bifida and Trisomy 21. I
also demonstrate how the model can be strengthened by incorporating a shared
component, via jointly modeling birth defects with a referent outcome (caesarean
counts).
Chapter 8 discusses in detail the major drawback of ecological analysis (i.e. potential
ecological bias) and reviews the literature for suggested strategies to incorporate
individual-level data with areal level data, in order to minimize this potential bias.
Through extensive simulation studies, I investigate the performance of various sampling
strategies, along with modifications in sample sizes, and examine how they fare for
sparse outcomes such as birth defects. The findings from the simulation studies are
illustrated using cardiovascular and nervous system birth defect categories. Chapter 9
synthesises what has been learnt from the earlier analysis of the CAR model, and I apply
the modified and enhanced CAR model to provide forecasts of birth defect categories at
the Statistical Local Area (SLA) level. Using population forecasts for areas in NSW at
the SLA, the CAR model is used to make predictions on the number of cases expected to
6
be seen in the next 30 years from 2001 for sixteen randomly selected SLAs. The
strengths and limitations of this thesis, as well as areas for future research, are the focus
of the discussion in chapter 10.
In this thesis, I have excluded discussions on other seemingly related models such as
multi-level models and statistical models to analyse point process data, as my main
focus is the CAR model. The aims and objectives as well as the nature of data utilised by
the other models are generally different from studies which use the CAR model, as I will
briefly describe here. Multi-level models, or random effects model as they are
commonly known, are often used to study variables which can vary at more than one
level. The levels can be nested hierarchicaly, and the models can be formulated within
both the frequentist and Bayesian frameworks. Gelman provides details on the theory
behind these models, as well as various formulations and applications of multi-level
models(1). In the context of our spatial analysis, the CAR convolution prior (to be
discussed later in the thesis) is a more specific formulation of a multi-level model, where
the variance of the relative risk estimates is partitioned into both spatially structured and
spatially unstructured random effects.
As for point-process models, one basic goal is to determine whether cases occur at
random or whether there is any form of geographical clustering or pattern in the data. To
this end, various cluster detection models have been developed and applied in practice,
including K-function(2), nearest-neighbour function(3) and hotspot analysis(4). Studies
which analyse point-process data have also included stochastic epidemic models, as well
as spatial prediction techniques. As an example, point-process level spatio-temporal
7
models have been used to fit stochastic epidemic models to study measles epidemics in
one study(5). Gelfand and colleagues have also used spatiotemporally varying
coefficient models to study and make predictions of climate data, such as precipitation
and temperature, which are measured at fixed locations(6). The fundamental difference
between these models and CAR models is that for the latter, data is available at an
aggregate level, as opposed to fixed locations or at continuous geographical scales.
8
1.3. Structure of thesis
The thesis is structured in the following way. It consists of a series of chapters that are
either published or submitted for publication and unpublished. Chapter 6 “Addressing
the Neighbourhood Weight Matrix” has been published in the International Journal of
Health Geographics. Chapter 7 “Modelling Sparse Disease Counts” has been accepted
for publication in the Health and Place Journal. Chapter 8 “Strategies for Combining
Areal with Individual Data” and chapter 9 “Forecasting Birth Defects at the Small Area
Level, NSW” have been submitted to the Statistics in Medicine journal and the BMC
Health Services Research journal respectively. These chapters have been included in the
same format as they were submitted for publication. This explains the variations in the
way the chapters are presented, the different sub-headings used in the various chapters,
and the distinct format of the bibliographies required by the various journal. The rest of
the chapters consist of unpublished works. I have included the bibliographies separately
at the end of each chapter for the published works, and one overall bibliography for the
rest of the unpublished chapters at the end of the thesis.
9
1.4. List of publications and conferences arising from thesis
Arul Earnest, Geoff Morgan, Kerrie Mengersen, Louise Ryan, Richard Summerhayes,
John Beard. Evaluating the effect of neighbourhood weight matrices on smoothing
properties of Conditional Autoregressive (CAR) models. International Journal of Health
Geographics, November 2007, Volume 29;6: pp 54-65.
Arul Earnest, John Beard, Geoff Morgan, Douglas Lincoln, Richard Summerhayes,
Deborah Donoghue, Therese Dunn, David Muscatello, Kerrie Mengersen. Small Area
Estimation of Sparse Disease Counts using Shared Component Models- Application to
Birth Defect Registry Data in New South Wales, Australia. Health and Place Journal
(Accepted for publication 23 February 2010).
Arul Earnest, John Beard, Geoff Morgan, Deborah Donoghue, Therese Dunn, David
Muscatello, Danielle Taylor, Kerrie Mengersen. Sampling and sample size strategies for
including individual with areal-level covariates in the spatial analysis of a sparse disease
outcome . Submitted to Statistics in Medicine Journal, Oct 2009.
Arul Earnest, Kerrie Mengersen, Geoff Morgan, John Beard. Forecasting Birth Defects
at the Small-Area Level in New South Wales, Incorporating Spatial Correlation and
Changes in Demography. Submitted to BMC Health Services Research Journal, Oct
2009.
10
Arul Earnest. Evaluating the effect of neighbourhood weight matrices on smoothing
properties of Conditional Autoregressive models. Contributed talk for Spring Bayes 2729 September 2006, Queensland University of Technology.
Arul Earnest, John Beard, Geoff Morgan, Douglas Lincoln, Richard Summerhayes,
Deborah Donoghue, Therese Dunn, David Muscatello, Kerrie Mengersen. Modelling
Sparse Disease Counts Using the Shared Component Model. Poster presentation at the
International Society for Bayesian Analysis, 9th World Meeting, Hamilton Island,
Australia, July 20-25 2008.
Arul Earnest, John Beard, Geoff Morgan, Douglas Lincoln, Richard Summerhayes,
Deborah Donoghue, Therese Dunn, David Muscatello, Kerrie Mengersen. Modelling
Sparse Disease Counts Using the Shared Component Model. Poster presentation at the
National Healthcare Group Annual Scientific Congress. 7-8 November 2008, Singapore.
The poster won the first prize in the best poster competition for the Quality/ Health
Services Research section.
11
CHAPTER 2. DATA
2.1.
Summary
The aim of this chapter is to provide readers with an understanding of the sources of data
used in subsequent analyses in this thesis. Selected birth defects are also described,
along with the classification or grouping of birth defects. A background description of
current spatial and temporal trends of birth defects in New South Wales is provided as a
precursor to subsequent work in this area. It is clear from existing official health
department reports that birth defects do indeed exhibit clear spatial relationships as well
as a time gradient.
2.2.
Sources of data
2.2.1. Birth defects
De-identified birth defect records were obtained from the NSW Birth Defects Register
(BDR). The register has been operational since 1990, and in the early years, reporting of
defects was done on a voluntary basis. Since 1998, doctors, hospitals and laboratories
have been required by law to report all birth defects. These defects included those
observed during pregnancy, at birth or up to one year of life. Each birth defect is
recorded as a separate record, so the total number of congenital abnormalities reported is
considerably greater than the number of children born with a birth defect. The study
period was from 1990 to 2004.
12
2.2.2. Births and maternal characteristics
Information on births in NSW from 1990 to 2004 was obtained from the NSW
Midwives Data Collection (MDC), which is a population-based register just like the
BDR. Covering all births in NSW (including public, private and home-births), the MDC
is dependent on the attending midwife or doctor to complete and submit a notification
form whenever a birth occurs(7). The registry includes all livebirths and stillbirths of at
least 20 weeks gestation or at least 400 grams birth weight. I also obtained maternal
demographic information (e.g. residential address at time of birth, maternal age at
delivery, maternal smoking during pregnancy, maternal diabetes, delivery in private
versus public hospital), pregnancy, labour, delivery and perinatal outcomes from the
MDC.
Each of the birth records in NSW within the study period was geocoded (i.e. given a
longitude and latitude) based on the mother’s residential address at the time of birth.
This geocoding was done by Mr Richard Summerhayes from the Northern Rivers
University Department of Rural Health using geocoding software developed by the
NSW Health and Australian National University. Further details on the software called
FEBRYL, can be found in this reference(8). Each record was then assigned to the 2001
Census Collectors Districts (CCDs) within which they fell in. There are 11,706 CCDs in
NSW. This assignment was again performed by Mr Summerhayes using the ARC GIS
software. Subsequently, I aggregated the data at an appropriate higher level of grouping,
SLA, or Statistical Local Area, which has 198 areas in NSW, as this level of aggregation
was found to be most useful for policy-makers. The individual records from BDR were
13
also probabilistically linked to the MDC, and this was carried out by the Department of
Health, NSW. The combined data was used in a subsequent analysis in the thesis,
involving the association between birth defects and individual maternal characteristics
along with areal covariates, such as socio-economic status of the area that the mother
was living in.
In 1998 a 2% sample of Midwives Data Collection records (N=1703) was validated
against other hospital records(9). The excellent quality of this database is reflected in
high correlations, including a 99.1% agreement on gestational diabetes (kappa 0.87),
94.9% agreement for smoking in pregnancy (kappa 0.85), 96.5% agreement for
birthweight (kappa not calculated) and 84.8% agreement for gestational age (kappa
0.81). This study, and access to both BDR and MDC databases, was approved by the
New South Wales Population & Health Services Research Ethics Committee.
2.2.3. Areal-level indices of socio-economic status
I used data from the Australian Bureau of Statistics (ABS) to describe the level of social
and economic well-being in areal levels of NSW. This data was freely available on the
ABS website, and a technical paper can be found here (10). The following 4 indices
were available to us:
1. Index of Advantage/ Disadvantage. Higher values reflect areas with a greater
advantage. Variables such as income, education, occupation, wealth and living
conditions were used to compute this index.
14
2. Index of Relative Social Disadvantage. Higher values reflect lack of
disadvantage, which has a subtle difference from the index above. The variables
that were used to compute this index included income, educational attainment,
unemployment, and dwellings without motor vehicles.
3. Index of Economic Resources. Variables such as income, expenditure and assets
of families, such as family income, rent paid, mortgage repayments, and dwelling
size went into computing this index.
4. Index of Education and Occupation. This index took into account the proportion
of people with a higher qualification or those employed in a skilled occupation.
The data were available at the various Australian Standard Geographical Classification
(ASGC) levels, starting from the most basic Census Collection District (CCD) to the
Statistical Local Area (SLA) level. There are problems associated with the simple
averaging up of the indices from CCD to SLA level, and I used an index that was
calculated at the SLA level and population-weighted. This was performed by the ABS.
Data on the four indices were standardised by the ABS to have a mean of 1000 units and
a standard deviation of 100. For the purposes of my analysis, I used the more general
Index of Relative Social Disadvantage (IRSD). Compared to the other indices, this index
only included variables that are measures of or indicators of disadvantage (rather than
advantage). This index was derived from variables that reflect rather than measure
disadvantage. A decision-tree process, along with principal component analysis was
15