Long story short, I'm trying to import this ESRI shapefile of Japan into CartoDB. (Sorry, no direct link: to download, click on the orange ファイルのダウンロード button, check 同意する to agree to the T&C, then click on the green 全国市区町村界データのダウンロード button.)
Problem is, the DBF in the file is encoded as Shift_JIS, and CartoDB only likes UTF-8. I've tried the following unsuccessfully:
1) ogr2ogr
ogr2ogr --config SHAPE_ENCODING Shift_JIS japan_ver72_utf8.shp
No-op: SJIS in, SJIS out.
ogr2ogr --config SHAPE_ENCODING UTF-8 japan_ver72_utf8 japan_ver72.shp
Makes ogr2ogr think the input is UTF-8, meaning I get garbage out.
2) QGIS
Load the shapefile into QGIS as ShiftJIS. But while the shapes load fine, QGIS dumps a whole bunch of this on load:
ERROR 1: fread(48623) failed on DBF file.
And inspecting the attribute table just shows a bunch of nulls, so there's no point trying to save as UTF-8.
3) OpenOffice Calc
Load the DBF into OpenOffice, re-export as SJIS. But OO throws an error when parsing the DBF and refuses to import the file at all.
4) iconv
Run iconv directly on the DBF:
iconv -f Shift_JIS -t UTF-8 japan_ver72_sjis.dbf >japan_ver72.dbf
This "works", in the sense that the Japanese within is correctly recoded as UTF-8, but it destroys the DBF in the process.
Ideas?
Answer
If you only need to do the job once and there is no need to go to scripting then one simple way is to convert the data with OpenJUMP.
Activate the charactes set selection from menu Customize - Options
Open your dataset as Shift-JIS
Save data back with Save as... and select UTF-8 charset
No comments:
Post a Comment