Tuesday 24 May 2016

gdal - Convert huge XYZ CSV to GeoTIFF


I have a huge amount of data in the form of CSV containing UTM coordinates as X and Y and an elevation value as Z information. I need to convert these data into a DEM as GeoTIFF for further analysis. In this case, a huge amount means 16 m. lines, with one point in X, Y and Z per line. The points are equally distributed, therefore an interpolation is not needed; each point just needs to be converted into a raster cell.


The original data came without separator, with fixed column widths. I already figured out how to convert the file syntax to use a separator instead of fixed widths and eliminate all space characters, using the stream text editor sed. From here on, normally my workflow would be to import the data into ArcGIS by creating a feature class from the X, Y and Z data and as a second step, converting the point shapefile into a GeoTIFF, using the Point to Raster tool. However, the file I currently have is way too big for this process.


Instead of the above described workflow, I was looking for an efficient alternative and discovered GDAL. However, in gdal_translate, the closest supported format I can find in the supported filetype list, is ASCII grid but no comma separated XYZ. Another difficulty is, that I have UTM coordinates, while most examples seem to use decimal degree coordinates. However, I need to stay within the UTM system (or at least, my output GeoTIFF needs to be in a UTM coordinate system).


So I am looking for a way to convert the CSV XYZ into a GeoTIFF, using GDAL, but so far wasn't able to find examples dealing with this exact problem. I would be very happy for some hints or even code examples.



Answer



You can do this using GDAL, it directly supports XYZ format. It doesn't matter if your coordinates are UTM, gdal_translate will output in the same coordinate system.


So to convert to GeoTIFF is as simple as:


gdal_translate test.xyz test.tif


Look at the GeoTIFF doc for output options (such as compression) and the gdal_translate doc for more usage info. In particular, you should specify what the coordinate system is with the -a_srs parameter.



-a_srs srs_def:


Override the projection for the output file. The srs_def may be any of the usual GDAL/OGR forms, complete WKT, PROJ.4, EPSG:n or a file containing the WKT.



gdal_translate -a_srs EPSG:12345 test.xyz test.tif

Comma/space separated and fixed column widths, with and without a header row are supported.




The supported column separators are space, comma, semicolon and tabulations.



$ head -n 2 test_space.xyz 
x y z
146.360047076550984 -39.0631214488636616 0.627969205379486084

$ gdalinfo test_space.xyz
Driver: XYZ/ASCII Gridded XYZ
Files: test_space.xyz
Size is 84, 66

Coordinate System is `'
Origin = (146.359922066953317,-39.062997159090934)
Pixel Size = (0.000250019195332,-0.000248579545455)
Corner Coordinates:
Upper Left ( 146.3599221, -39.0629972)
Lower Left ( 146.3599221, -39.0794034)
Upper Right ( 146.3809237, -39.0629972)
Lower Right ( 146.3809237, -39.0794034)
Center ( 146.3704229, -39.0712003)
Band 1 Block=84x1 Type=Float32, ColorInterp=Undefined

Min=0.336 Max=0.721

$ head -n 2 test_commas.xyz
x, y, z
146.360047076550984, -39.0631214488636616, 0.627969205379486084

$ gdalinfo test_commas.xyz
Driver: XYZ/ASCII Gridded XYZ
etc...


$ head -n 2 test_formatted.xyz
x y z
146.3600471 -39.06312145 0.627969205

$ gdalinfo test_formatted.xyz
Driver: XYZ/ASCII Gridded XYZ
etc...

The only gotchas I'm aware of are:




  1. The opening of a big dataset can be slow as the driver must scan the whole file to determine the dataset size and spatial resolution; and


  2. The file has to be sorted correctly (by Y, then X).



    Cells with same Y coordinates must be placed on consecutive lines. For a same Y coordinate value, the lines in the dataset must be organized by increasing X values. The value of the Y coordinate can increase or decrease however.



    $ head -n 5 test.csv
    x,y,z
    146.3707979,-39.07778764,0.491866767
    146.3787985,-39.07157315,0.614820838

    146.3637974,-39.07132457,0.555555582
    146.3630473,-39.07579901,0.481217861

    $ gdalinfo test.csv
    ERROR 1: Ungridded dataset: At line 3, too many stepY values
    gdalinfo failed - unable to open 'test.csv'.

    $ tail -n +2 test.csv| sort -n -t ',' -k2 -k1 > test_sorted.xyz

    $ head -n 5 test_sorted.xyz

    146.3600471,-39.07927912,0.606096148
    146.3602971,-39.07927912,0.603663027
    146.3605471,-39.07927912,0.603663027
    146.3607971,-39.07927912,0.589507282
    146.3610472,-39.07927912,0.581049323

    $ gdalinfo test_sorted.xyz
    Driver: XYZ/ASCII Gridded XYZ
    etc...



No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...