Thursday 22 January 2015

Extracting non-spatial data from imported KML file using R?


I have a KML file which was created using Google's My Maps.


The original file can be downloaded here: Google My Maps



Using R, I can import this using the "readOGR" function of the rgdal library This brings the KML file in as a SpatialPointsDataFrame (SPDF) - which i am calling asf52


![RStudio Data Pane


In this SPDF, the spatial data is contained under @coords and is readily extracted into a dataframe using code like


df  <- data.frame(asf52@coords[,1:2])

However, I am struggling to come up with a way to neatly extract the the non-spatial data - contained under @data$Description - and turn it into a dataframe with a column for each variable.



Answer



You don't need to call data.frame() around the extract - the @data slot already is a data.frame. Just do


  df <- asf52@data


to pull out a copy. That said, you may be better served by using the newer sf library for this task:


library(sf)
ob_kml <- file.path(getwd(), 'Outbreaks 56 (OIE).kml')

There is more than one layer in your KML - list them with e.g.


st_layers(ob_kml)

Use read_sf() with the layers argument to choose your point data specifically and read it in. read_sf() defaults to stringsAsFactors = FALSE which may be preferable.


asf_c <- read_sf(ob_kml, layer = 'ASF in China.xlsx')


To get a plain dataframe, just drop the geometry as follows:


asf_c_df <- st_set_geometry(asf_c, NULL)

EDIT: I see your secondary issue now; it looks like neither sf nor sp look at the tags that hold the attribute data you want (open the KML in Notepad++ if you want to see what I mean). QGIS does detect and import them as separate attribute columns, so @Jella's advice is sound. I'm not sure if the issue here lies with sf/sp or GDAL, but it may be worth raising an issue of the sf github page.


In the meantime, your instinct to go with tidyr functions is sound, its just a little tricky to get a clean separation. The following looks pretty good:


asf_c_df <- st_set_geometry(asf_c, NULL) %>%
# remove duplicate
tags
dplyr::mutate(Description = gsub('

', '
', Description)) %>%
# split on
tidyr::separate(., col = Description,

into = c('Date', 'Province', 'City', 'County', 'Location',
'Total_herd_size', 'Affected_animals', 'Deaths',
'Culled', 'Latitude', 'Longitude', 'Source'),
sep = '
') %>%
# ditch the key: part of key: value
dplyr::mutate_all(., funs(gsub('^.*: ', '', .))) %>%
# data type fixes
dplyr::mutate_at(vars(7:10), as.integer) %>%
dplyr::mutate_at(vars(11,12), as.numeric) %>%
# bonus points: proper dates. First, fix September, then cast to Date datatype

dplyr::mutate(Date = gsub('Sept', 'Sep', Date),
Date = as.POSIXct(Date, format = '%b %d, %Y')) %>%
# double bonus! proper NA for missing data
dplyr::mutate_if(is.character, funs(ifelse(. == '', NA, .)))

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...