Sunday, 31 March 2019

ogr2ogr - Converting shapefile from Shift_JIS to UTF-8 when the usual methods fail


Long story short, I'm trying to import this ESRI shapefile of Japan into CartoDB. (Sorry, no direct link: to download, click on the orange ファイルのダウンロード button, check 同意する to agree to the T&C, then click on the green 全国市区町村界データのダウンロード button.)



Problem is, the DBF in the file is encoded as Shift_JIS, and CartoDB only likes UTF-8. I've tried the following unsuccessfully:


1) ogr2ogr


ogr2ogr --config SHAPE_ENCODING Shift_JIS japan_ver72_utf8.shp 

No-op: SJIS in, SJIS out.


ogr2ogr --config SHAPE_ENCODING UTF-8 japan_ver72_utf8 japan_ver72.shp

Makes ogr2ogr think the input is UTF-8, meaning I get garbage out.


2) QGIS


Load the shapefile into QGIS as ShiftJIS. But while the shapes load fine, QGIS dumps a whole bunch of this on load:



ERROR 1: fread(48623) failed on DBF file.

And inspecting the attribute table just shows a bunch of nulls, so there's no point trying to save as UTF-8.


3) OpenOffice Calc


Load the DBF into OpenOffice, re-export as SJIS. But OO throws an error when parsing the DBF and refuses to import the file at all.


4) iconv


Run iconv directly on the DBF:


iconv -f Shift_JIS -t UTF-8 japan_ver72_sjis.dbf >japan_ver72.dbf

This "works", in the sense that the Japanese within is correctly recoded as UTF-8, but it destroys the DBF in the process.



Ideas?



Answer



If you only need to do the job once and there is no need to go to scripting then one simple way is to convert the data with OpenJUMP.


Activate the charactes set selection from menu Customize - Options


enter image description here


Open your dataset as Shift-JIS


enter image description here


Save data back with Save as... and select UTF-8 charset


enter image description here


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...