Saturday, 16 July 2016

mysql - How did QGIS fix a shapefile's character encoding, and can I do this on the command line with ogr2ogr?


Solved: I'm not sure why, but it works fine if I specify --config SHAPE_ENCODING "ISO-8859-1" in my ogr2ogr command. For whatever reason, this allowed ogr2ogr to go from the original TigerLine shapefile, as-is, directly into a MySQL database. My final ogr2ogr instruction, in full:


ogr2ogr --config SHAPE_ENCODING "ISO-8859-1" -f "MySql" MySql:"usa_basemap,host=127.0.0.1,user=myUser,password=myPass,port=3306" -lco engine=MYISAM "C:/path/to/data/tl_2012_us_county.shp" -nln county -nlt "geometry" -s_srs EPSG:4269 -t_srs EPSG:3857



Background: I was finally able to use ogr2ogr to import the TigerLine 2012 USA Counties shapefile into MySQL, but only after using QGIS to save the shapefile with a new encoding value (UTF-8).


I'm wondering: What can I do differently, right inside the commandline with ogr2ogr, to avoid having to involved QGIS in the workflow? (Nothing against QGIS, it's just that I want to be able to automate this..)





Here's what I did before involving QGIS:


1) Changed the MySQL max_allowed_packet size as is warned might be necessary on the OGR MySQL driver page.


In MySQL Workbench, I set the new parameter value like this:


SET GLOBAL max_allowed_packet=1073741824;

2) Next, I tried to import it into MySQL with ogr2ogr like this:


ogr2ogr -f "MySql" MySql:"basemap,host=127.0.0.1,user=myUser,password=myPass,port=3306" -lco engine=MYISAM "C:/path/to/data/tl_2012_us_county.shp" -nln county -nlt "geometry" -s_srs EPSG:4269 -t_srs EPSG:3857

But that bombs with an error that spills outside of the buffer, it ends as shown below, note the appearance of the special chars, which are lost.


[....hugely....truncated....console....output....]

322122,-7366573.8710948359 2062754.2910927597))',1) , '72', '045', '01804502', '
72045', 'Comer�o', 'Comer�o Municipio', '13', 'H1', 'G4020', '490', '41980', 'A'
, 73557129, 319735, '+18.2250402', '-066.2202984')

It's worth noting that after hitting this brick wall and Googling around, I tried the tips suggested by Mateusz Łoskot in this discussion; to restate the salient portion:



*Windows Command Prompt can work with UTF-8 characters if you change codepage to UTF-8:


0) Open new prompt (cmd.exe)


1) Change font to Lucida Concole


3) chcp 65001*




But this didn't overcome the obstacle.


Finally, wondering if QGIS could help, I used QGIS to export the shapefile, specifying UTF-8 as the character encoding, then the ogr2ogr command I provided above worked. Bingo. Now I have "Comerío Municipio" in my database..


..but what gives?


So my question: Is QGIS doing something I can add to my ogr2ogr instruction that automatically resolves the special characters issues I encountered so I can automate this task in the future and not have to manually involve QGIS?



Answer



Quoting from the GDAL documentation for the Esri Shapefile driver:



An attempt is made to read the LDID/codepage setting from the .dbf file and use it to translate string fields to UTF-8 on read, and back when writing. LDID "87 / 0x57" is treated as ISO8859_1 which may not be appropriate. The SHAPE_ENCODING configuration option may be used to override the encoding interpretation of the shapefile with any encoding supported by CPLRecode or to "" to avoid any recoding. (Recoding support is new for GDAL/OGR 1.9.0)




Depending on which GDAL/OGR version you are using, ogr2ogr may be trying to translate your data to UTF-8 or not doing nothing at all.


So you would do either:


ogr2ogr --config SHAPE_ENCODING "UTF-8" -f "MySql" MySql:"basemap,host=127.0.0.1,user=myUser,password=myPass,port=3306" -lco engine=MYISAM "C:/path/to/data/tl_2012_us_county.shp" -nln county -nlt "geometry" -s_srs EPSG:4269 -t_srs EPSG:3857

or


ogr2ogr --config SHAPE_ENCODING "" -f "MySql" MySql:"basemap,host=127.0.0.1,user=myUser,password=myPass,port=3306" -lco engine=MYISAM "C:/path/to/data/tl_2012_us_county.shp" -nln county -nlt "geometry" -s_srs EPSG:4269 -t_srs EPSG:3857

Finally, check that your MySQL database is using UTF-8 and not Latin-1.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...