I'm using Elasticsearch's GeoHash grid Aggregation to plot clusters on a map (using Leaflet). I understand that for larger areas a lower precision setting should be used to limit the number of buckets created/returned.
How should I determine the appropriate precision value to request?
Is there a standard or recommended formula for calculating the optimal precision based on a bounding box and/or zoom level? Is it better to just map zoom levels to precision values (I know that's probably easiest.)
Answer
The page you linked to hints at the answer; find the area of your bounding box and divide by the bucket area. It leaves out how to calculate the size of each geohash bucket, though, although it gives an example with precision 5.
According to the page you linked to, it'll stop the query at 10000 buckets.
Calculate the area of your bounding box in degrees squared (don't attempt to do this in km, keep it in lat/long)
The screenshot below shows the bounds of precision 2 (the colour is categorised by precision 1).
These aren't really tiles, though - a geohash represents a point with errors in lat/lon - and those errors will sometimes vary between longitude and latitude (the error is greater with latitudes, when the precision is even)
Used a bit of Python (using the Geohash library) to estimate the size of each 'tile' for different precisions.
from Geohash import geohash
strg = geohash.encode(56.9,-3.2,precision=15)
for prec in range(1,10):
y,x,yerror,xerror = geohash.decode_exactly(strg[:prec])
xsize = 2*xerror
ysize = 2*yerror
area = xsize*ysize
print("Precision {}".format(prec))
print("\tSize approx {} long by {} lat".format(xsize,ysize))
print("\tArea is {}".format(area))
this gives the following output
Precision 1
Size approx 45.0 long by 45.0 lat
Area is 2025.0
Precision 2
Size approx 11.25 long by 5.625 lat
Area is 63.28125
Precision 3
Size approx 1.40625 long by 1.40625 lat
Area is 1.9775390625
Precision 4
Size approx 0.3515625 long by 0.17578125 lat
Area is 0.061798095703125
Precision 5
Size approx 0.0439453125 long by 0.0439453125 lat
Area is 0.0019311904907226562
Precision 6
Size approx 0.010986328125 long by 0.0054931640625 lat
Area is 6.034970283508301e-05
Precision 7
Size approx 0.001373291015625 long by 0.001373291015625 lat
Area is 1.885928213596344e-06
Precision 8
Size approx 0.00034332275390625 long by 0.000171661376953125 lat
Area is 5.893525667488575e-08
So one approach would be,
- calculate the "area" (in square degrees) of your lat/lon based bounding box
- go down that table, starting at precision 1, and divide your bbox area (in square degrees) by the area for that precision
- choose the precision value with the lowest acceptable division value
To clarify 'acceptable':-
a very low division value like 0.001 probably means the precision is too low. You'll not be fetching many buckets, but will be considering a lot of distant points you don't need to.
For a value over 10000, the precision is too high. You'll be discarding possible hits, and suffer slower performance.
You'll need to experiment to find a value which gives the best performance.

No comments:
Post a Comment