I have >1 million postcodes clustered in ~7000 middle super output areas for England and Wales which I'm analysing using MLwiN. I have used a multilevel model with random effects and have clustered the postcode observations at the middle super output area.
My concern is that when analysing the middle super output areas using a Queens 1st order contiguity weights matrix (in GeoDa), Moran's I came to 0.00324916 (p=0.013) (see graph below). While there is some spatial autocorrelation, this seems to be a very, very small amount - is it small enough for me to ignore?
If I couldn't ignore this then I would have to cluster at the regional level leaving me with only 32 clusters. As I'm using MCMC estimation this would dramatically increase the computational intensity of my model runs (leading to days per run instead of hours), hence I would quite like to avoid this if possible.
.
Answer
Moran's I ranges between -1 (perfectly dispersed) and 1 (perfectly clustered), with a value of 0 indicating random distribution. While 0.003 isn't "perfectly" random, it's much, much closer to random than to dispersed or clustered. The question of whether it's random enough depends on your discipline, personal standards, and research question.
I'd personally accept the value as random and therefore accept the data as not spatially autocorrelated; I think most geographers would as well. I made five random distributions of 100,000 points, with value either 1 or 2, and ran Moran's I to see what that would come up with. The results were .002554
, -.001750
, .001363
, -.001058
, and -.000166
. All very close to zero, none perfectly zero.
You can also calculate Geary's C as verification and/or additional evidence to support an assertion that the data is random. (Target value in that case would be ~1). Again, this depends on how strictly random your discipline, research, etc. expect data to be.
No comments:
Post a Comment