Saturday 17 September 2016

Which character encoding is used by the DBF file in shapefiles?


Which character-encoding is used by the dbf-file in shapefiles? It seems it is handled different, based on the program and the local encoding-settings of the machine. Which encoding is 'right' - specified for the format?




Answer



The original DBF standard defines to use ISO8859-1, and only ISO8859-1. So, when you get a Shapefile that is really standards conform, it should be ISO8859-1. Of course, this (very old) restriction is a not really usable nowadays.


ArcGIS and Geopublisher, AtlasStyler and Geoserver started to extend the standard to define the encoding. For ArcGIS, e.g., just create a .cpg file (with the same basename as the other Shapefiles) and fill it with the name of the encoding.


e.g. create a myshape.cpg with a texteditor and insert 5 characters "UTF-8" and save it. If you then open the Shapefile in ArcGIS, it reads the textual contents of the DBF in that charset.


Geoserver: Geoserver WFS can export any WFS layer as a zipped Shapefile. When this is done, a .cst file is contained in the zip, doing exactly the same as the .cpg file.


Attention: All this only applies to the data, not the column names. You should really only use ASCII in the column names of a DBF if you want the file to be openable with other programs.


Hint: To change the encoding of a DBF open it with OpenOffice Calc.. choose SaveAs... click the "Filter options" in the bottom left and press save. You can then define the encoding to convert the text contents into.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...