Tuesday, 26 February 2019

arcgis 10.1 - Replacing non-English characters in attribute tables using ArcPy and Python?


I have a few shapefiles where some of the attributes contain the non-English characters ÅÄÖ. Since some queries doesn't work with these characters (specifically ChangeDetector), I tried to change them in advance with a simple script and add the new strings to another field.


However, change in characters works fine but not updating the field with arcpy.UpdateCursor.



What is an appropriate way of solving this?


I have also tried to do this via the Field Calculator while posting "code" to the codeblock, with the same error.


Error message:
Runtime error Traceback (most recent call last): File "", line 1, in File "c:/gis/python/teststring.py", line 28, in val = code(str(prow.Typkod)) UnicodeEncodeError: 'ascii' codec can't encode character u'\xc4' in position 3: ordinal not in range(128)


Code:


# -*- coding: cp1252 -*-
def code(infield):
data = ''
for i in infield:
## print i

if i == 'Ä':
data = data + 'AE'
elif i == 'ä':
data = data + 'ae'
elif i == 'Å':
data = data + 'AA'
elif i == 'å':
data = data + 'aa'
elif i == 'Ö':
data = data + 'OE'

elif i == 'ö':
data = data + 'oe'
else:
data = data + i
return data


shp = r'O:\XXX\250000\DB\ArcView\shape.shp'

prows = arcpy.UpdateCursor(shp)


for prow in prows:
val = code(unicode(str(prow.Typkod), "utf-8"))
prow.Typkod_U = val
print val
prows.updateRow(prow)

The values of Typkod are of the type: [D, D, S, DDRÄ, TRÄ] etc.


I use ArcMap Basic (10.1) on Windows 7.





New Error message:
Runtime error Traceback (most recent call last): File "", line 1, in File "c:/gis/python/teststring.py", line 29, in val = code(unicode(str(row.Typkod), "utf-8")) UnicodeEncodeError: 'ascii' codec can't encode character u'\xc4' in position 3: ordinal not in range(128)


>>> val 'DDRÄ'
>>> type(val) type 'str'




It appears as if the output from the function is wrong somehow. When there's ÅÄÖ involved it returns data = u'DDR\xc4' and not (as was my intention) data = 'DDRAE'. Any suggestions on what might cause this?



Answer



Turns out iterating over ÅÄÖ wasn't that easy. It is refered to as a unicode string, and when checking in the if-statements that has to be used instead of the literal ÅÄÖ. After I figured that out, the rest was a piece of cake :)


Resulting code:


# -*- coding: cp1252 -*-

def code(infield):
data = ''
for i in infield:
## print i
if i == u'\xc4': #Ä
data = data + 'AE'
elif i == u'\xe4': #ä
data = data + 'ae'
elif i == u'\xc5': #Å
data = data + 'AA'

elif i == u'\xe5': #å
data = data + 'aa'
elif i == u'\xd6': #Ö
data = data + 'OE'
elif i == u'\xf6': #ö
data = data + 'oe'
else:
data = data + i
return data



shp = arcpy.GetParameterAsText(0)
field = arcpy.GetParameterAsText(1)
newfield = field + '_U'
arcpy.AddField_management(shp, newfield, 'TEXT')

prows = arcpy.UpdateCursor(shp)

for row in prows:
row.newfield = code(row.field)

prows.updateRow(row)

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...