Monday 3 July 2017

Reading KML file into R?


I'm working with huge .kml files (up to 10 Gb) and need an efficient way to read them into R. Until now I've been converting them to shapefiles via QGIS and then back into R with readShapePoly and readOGR (the latter, by the way, is ~1000 faster than the former). I'd ideally like to cut-out the QGIS intermediary stage as it is cumbersome and slow.


How to read .kml files in directly?


I see this can be also be done with readOGR. Unfortunately, I cannot see how to implement the worked example (after lengthy preparation of .kml file: xx <- readOGR(paste(td, "cities.kml", sep="/"), "cities")). It seems that "cities" here is the name of the spatial objects.


Roger Bivand admits that "How one discovers this name is not obvious, since the KML driver in OGR needs it to access the file. One possibility is:


system(paste("ogrinfo", paste(td, "cities.kml", sep="/")), intern=TRUE)


"


But this does not work for me either. Here's a test .kml file to try it on. With it in my working directory, readOGR("x.kml", "id") generates this error message:


Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv) : 
Cannot open layer .

And system(paste("ogrinfo", "x.kml"), intern=TRUE) generates:


[1] "Had to open data source read-only."   "INFO: Open of `x.kml'"               
[3] " using driver `KML' successful." "1: x (3D Polygon)"


, which I simply don't understand.


Would getKMLcoordinates {maptools} be a valid alternative?


I've also tried this:


tkml <- getKMLcoordinates(kmlfile="x.kml", ignoreAltitude=T)
head(tkml[[1]])
tkml <- SpatialPolygons(tkml,
proj4string=CRS("+init=epsg:3857"))

The coordinates are generated correctly, but my attempt to convert them back into a polygon object failed with the following message:


Error in SpatialPolygons(tkml, proj4string = CRS("+init=epsg:3857")) : 

cannot get a slot ("area") from an object of type "double"

Answer



To read a KML with the OGR driver, you give it the file name and the layer name.


Roger's comment is that the layer name is hidden in the KML file, and unless you know how the KML was created you can't infer the layer name from the KML file name.


Looking at your example KML, I can see:




x



Which is telling me the layer name is x, not id, and so:


> foo = readOGR("/tmp/x.kml", "x")
OGR data source with driver: KML
Source: "/tmp/x.kml", layer: "x"
with 1 features and 2 fields
Feature type: wkbPolygon with 2 dimensions

works nicely.


Now, you can try and get the name by parsing the KML as XML using an R XML parser, or you can maybe try reading it in R as a text file until you find the name tag.


The other approach is to run the command-line ogrinfo program which spits out the layer names of a KML file:



$ ogrinfo /tmp/x.kml 
Had to open data source read-only.
INFO: Open of `/tmp/x.kml'
using driver `KML' successful.
1: x (Polygon)

here showing there is a polygon layer called x.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...