Friday, 23 September 2016

geocoding - Disambiguate messy place names in python (preferably on local machine)


I have list with several million place names that come from Flickr profiles. Users provided these placenames as free text, so they look like this:


Roma, Italy
Kennesaw, USA
Saginaw, MI
Rucker, Missouri, USA
Melbourne, Australia
Madrid, Spain

live in Sarnia / work in London, Canada
Valladolid, EspaƱa
Italia
West Hollywood, United States

I want to disambiguate these place names. I am aware that there is in some cases no straightforward to this solution, but I am willing to live with some false disambiguation and with "no answer" for some of the places. If a place name corresponds to the name of multiple cities, then I want to assign that place to the largest city that it corresponds to.


Yahoo's place finder api would be a good solution to this problem, but I would need to make too many API calls to get through my list, so I'd like a local solution (i.e., one that does not depend on a remote api). Does anyone know of any python libraries that do this kind of thing, or any other local solutions?


(I've also asked this question on stackoverflow.)



Answer



You could try the Python library geodict. This has datasets you can download and import to a database - you can check the lists to see if they'd work well or not with your data. It works in two steps:




  1. Extracting names

  2. Matching names to a location in the lists


More details (and another online option in the comments) here.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...