Extracting non-spatial data from imported KML file using R?

Thursday, 22 January 2015

Extracting non-spatial data from imported KML file using R?

I have a KML file which was created using Google's My Maps.

The original file can be downloaded here: Google My Maps

Using R, I can import this using the "readOGR" function of the rgdal library This brings the KML file in as a SpatialPointsDataFrame (SPDF) - which i am calling asf52

In this SPDF, the spatial data is contained under @coords and is readily extracted into a dataframe using code like

df  <- data.frame(asf52@coords[,1:2])

However, I am struggling to come up with a way to neatly extract the the non-spatial data - contained under @data$Description - and turn it into a dataframe with a column for each variable.

Answer

You don't need to call data.frame() around the extract - the @data slot already is a data.frame. Just do

  df <- asf52@data

to pull out a copy. That said, you may be better served by using the newer sf library for this task:

library(sf)
ob_kml <- file.path(getwd(), 'Outbreaks 56 (OIE).kml')

There is more than one layer in your KML - list them with e.g.

st_layers(ob_kml)

Use read_sf() with the layers argument to choose your point data specifically and read it in. read_sf() defaults to stringsAsFactors = FALSE which may be preferable.

asf_c <- read_sf(ob_kml, layer = 'ASF in China.xlsx')

To get a plain dataframe, just drop the geometry as follows:

asf_c_df <- st_set_geometry(asf_c, NULL)

EDIT: I see your secondary issue now; it looks like neither sf nor sp look at the tags that hold the attribute data you want (open the KML in Notepad++ if you want to see what I mean). QGIS does detect and import them as separate attribute columns, so @Jella's advice is sound. I'm not sure if the issue here lies with sf/sp or GDAL, but it may be worth raising an issue of the sf github page.

In the meantime, your instinct to go with tidyr functions is sound, its just a little tricky to get a clean separation. The following looks pretty good:

asf_c_df <- st_set_geometry(asf_c, NULL) %>%
  # remove duplicate 
 tags
  dplyr::mutate(Description = gsub('

', '
', Description)) %>%
  # split on 
  tidyr::separate(., col = Description, 

                  into = c('Date', 'Province', 'City', 'County', 'Location', 
                           'Total_herd_size', 'Affected_animals', 'Deaths',
                           'Culled', 'Latitude', 'Longitude', 'Source'), 
                  sep = '
') %>%
  # ditch the key: part of key: value
  dplyr::mutate_all(., funs(gsub('^.*: ', '', .))) %>%
  # data type fixes 
  dplyr::mutate_at(vars(7:10), as.integer) %>%
  dplyr::mutate_at(vars(11,12), as.numeric) %>%
  # bonus points: proper dates. First, fix September, then cast to Date datatype

  dplyr::mutate(Date = gsub('Sept', 'Sep', Date),
                Date = as.POSIXct(Date, format = '%b %d, %Y')) %>%
  # double bonus! proper NA for missing data
  dplyr::mutate_if(is.character, funs(ifelse(. == '', NA, .)))

Blog

Thursday, 22 January 2015

Extracting non-spatial data from imported KML file using R?

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?