Saturday, 11 February 2017

How are geocoding scores calculated in ArcGIS?


After a table of addresses is geocoded, ArGIS provides information about each geocoded address, among those the "match score" of the candidate to which the address was matched, which ranges from 0 to 100. According to their documentation "The match score is based on how well the locations found in the reference data match with the address data being searched."


It seems intuitive that 100 means an address with the exact name was found in the Address locator and 0 means no such address was found. However, I could not find any information about how exactly this score is calculated, particularly if values are somewhere between the extremes. I this known?


I found the pointer to this white paper in the answer to this question, but I could not find any information in that paper that would answer the question.



Answer



The scores are based on a weighted numbering system; based on the number of matching characters in each of the prioritized/configured address element areas. So the more characters that can match the better the likelihood of a high score.

When using ranged-address data such as street center-lines the address range and parity will also figure into the process. So if you have a range from 3000-6000 even and the address is 2998 but the rest of the streetname match; ArcGIS will make this a candidate but lower the score since the number was outside the expected goal.




  • D.E.Wright


See Bruce Harold's response at Re: Geocoding Score Documentation: How is the score value determined?:


"Re: Geocoding Score Documentation: How is the score value determined? Bruce Harold Level 5 Bruce Harold Employee Apr 10, 2015 2:25 PM (in response to Nathan Lowry)


Hello


Score calculation is not documented in detail, but I can give you a thumbnail.


If you open USAddress.lot.xml in Firefox from its installed location at file:///C:/Program Files (x86)/ArcGIS/Desktop10./Locators you will see a navigable tree.


In Top Level Elements navigate to FullNormalAddress; the superscript numbers for NormalAddress (70) and Zone (30) are the relative weights for score contributions from those elements. Coincidentally they sum to 100 but only the relative weight is relevant.


Navigating further from NormalAddress you will see 70/100 of the score is contributed 15/75 and 60/75 by House and FullStreetName respectively, where 75 is the sum of the weights, and further down you can see the elements prefix (5/92), pretype (6/92), StName (70/92), suftype (6/92) and suffix (5/92) weights where 92 is the sum of those weights. An individual score for any lowest level element (like how to calculate a score contribution from an imperfect street name) may be determined by the Spelling/Scoring section of the XML file if an anticipated spelling correction is required to match the reference data, or by a proprietary algorithm for unanticipated spelling errors or noise or repeated characters, as when you have keybounce.



Scores are weight summed, with percentage normalization, from the bottom up. Missing elements do not penalize a score, they simply do not contribute.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...