Thursday 22 September 2016

data - Bulk geocoding 20 million US addresses



Are there any free or reasonably priced databases for the US which can be searched and return latitude and longitude information?




Answer



For that many records, don't even consider a web service. They will throttle or cut you off before you can finish your task.


So then your option becomes to run it locally, and for that you have several commercial or free options.


The free options will use the census TIGER dataset which you will need to load into a spatial database. You can find libraries that geocode against TIGER for PostGIS or even sqlite. Heck you can even use ArcGIS to geocode against TIGER. Of course, ArcGIS is not free, which brings me to the next commercial options. If you do have an ArcGIS license chances are you have StreetMap DVD with a TeleAtlas (I mean Tom Tom) or Navteq dataset. That depends if you got StreetMap Premium bundled. Any of those two datasets will probably give you more consistent results than TIGER.


Do yourself a favor and make several copies of the street database once your data is loaded and run the geocoding process on several machines with a subset of the input data. Dont try to run it on just one machine or you will be waiting for days for it to finish not to mention that most likely whatever process you run will probably leak memory and crash several times before it finishes. This means that you want to have different checkpoints for your process.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...