In arcGIS 10.1 (advanced licence), I have two polygon shapefiles, each representing two different types of building (let's call them A and B).
Each type is actually used for two different purposes. What I would like to accomplish is to formally test if building type B tends to "occur" close to building type A.
What procedure, either within arcGIS or with other programs, could I follow?
As for 'other programs', kindly note that I have some familiarity with SAM, GeoDA, and Passage programs, and that I have no experience with GRASS.
Answer
Use a permutation test.
You begin by constructing a quantitative way to measure proximity. It could be anything you think relevant that measures typical distances between type-A and type-B buildings. Abstractly, given any configuration consisting of a layer X of type-A buildings and another layer Y of type-B buildings, let this measure be called t(X,Y). It's supposed to be a number that is close to zero when buildings of type B tend to be close to those of type A and increases as their relationship appears to grow less and less strong.
Suppose there are n type-A features and m type-B features. The null hypothesis (of no relationship) can neatly be expressed by supposing that the assignment of types to the n+m features is perfectly random, conditional on the fact that n will be assigned type A and m will be assigned type B. There are C(n+m,n) = (n+m)!/n! distinct such assignments. Associated with each one of them is its value of t(X,Y). This yields a distribution of C(n+m,n) numbers. If the actual value of t(X,Y) is smaller than most of the numbers in that distribution, you have evidence of a spatial association. In particular, the p-value of this hypothesis test is the proportion of all such assignments whose t value is less than or equal to the actual t value.
Because C(n+m,n) usually is huge (for instance, with n = m = 100 it is greater than 9E+58), it's not practical to compute the full distribution. However, it's eminently practical to estimate the full distribution by simulation: randomly make a few hundred (or more) assignments of the n+m buildings into n of type A and m of type B. For each random assignment compute t. The distribution of these values approximates the true distribution. Estimate the p-value as the proportion of these simulated distributions for which t is less than or equal to the actual t value.
This can usually be expeditiously done (at least for smallish layers, having less than a few thousand total features) by precomputing a distance matrix between all n+m features and basing the proximity statistic on the distances. This allows you to use an appropriate computing platform for the permutation test, because it doesn't need to perform GIS operations.
As an illustration, this code generates 100 type-A point features at random. They are shown as open dots in the left map below. It then generates 50 more type-B point features close to the first 50 type-A features. They appear as solid gray dots on the map. (I find it difficult to determine whether there is any visible spatial association among the two types, even though I know there is.) For the t statistic it computes the average distance from the type-A features to type-B features and it computes the average distance from the type-B features to type-A features. It returns the smaller of these two averages. The histogram at the right shows the distribution created from 1000 permutations. The vertical red line marks the value of the actual t statistic. Because it's far to the left (near the low end), this is significant evidence of spatial association.
#
# Create data.
#
n <- 100
m <- 50
set.seed(17)
a <- matrix(rnorm(2*n), ncol=2)
b <- matrix(rnorm(2*m, a[1:m, ], sd=1/5), ncol=2)
#
# Map them.
#
par(mfrow=c(1,2))
plot(rbind(a,b), xlab="x", ylab="y", main="Features")
points(b, pch=16, col="#909090")
#
# Describe calculations needed for the test.
#
features <- rbind(a, b)
distance2 <- outer(features[ ,1], features[, 1], function(x,y) (x-y)^2) +
outer(features[ ,2], features[, 2], function(x,y) (x-y)^2)
stat <- function(i, j) {
min(mean(apply(distance2[i, j], 1, min)),
mean(apply(distance2[i, j], 2, min)))
}
#
# Perform the test.
#
t.0 <- stat(1:n, 1:m + n)
sim <- replicate(1e3, {
i <- sample.int(n+m, n)
j <- setdiff(1:(n+m), i)
stat(i, j)
})
p.value <- mean(c(1, sim <= t.0))
#
# Plot its results.
#
hist(sim, xlim=range(c(t.0, sim)), main=paste("P-value =", round(p.value, 4)))
abline(v=t.0, col="Red", lwd=2)
Reference
Phillip Good, Permutation, Parametric, and Bootstrap Tests of Hypotheses. Third Edition (2005), Springer.
No comments:
Post a Comment