I have a KML file which was created using Google's My Maps.
The original file can be downloaded here: Google My Maps
Using R, I can import this using the "readOGR" function of the rgdal library This brings the KML file in as a SpatialPointsDataFrame (SPDF) - which i am calling asf52
In this SPDF, the spatial data is contained under @coords and is readily extracted into a dataframe using code like
df <- data.frame(asf52@coords[,1:2])
However, I am struggling to come up with a way to neatly extract the the non-spatial data - contained under @data$Description - and turn it into a dataframe with a column for each variable.
You don't need to call data.frame()
around the extract - the @data
slot already is a data.frame. Just do
df <- asf52@data
to pull out a copy. That said, you may be better served by using the newer sf
library for this task:
library(sf)
ob_kml <- file.path(getwd(), 'Outbreaks 56 (OIE).kml')
There is more than one layer in your KML - list them with e.g.
st_layers(ob_kml)
Use read_sf()
with the layers
argument to choose your point data specifically and read it in. read_sf()
defaults to stringsAsFactors = FALSE which may be preferable.
asf_c <- read_sf(ob_kml, layer = 'ASF in China.xlsx')
To get a plain dataframe, just drop the geometry as follows:
asf_c_df <- st_set_geometry(asf_c, NULL)
EDIT: I see your secondary issue now; it looks like neither sf
nor sp
look at the
tags that hold the attribute data you want (open the KML in Notepad++ if you want to see what I mean). QGIS does detect and import them as separate attribute columns, so @Jella's advice is sound. I'm not sure if the issue here lies with sf
/sp
or GDAL, but it may be worth raising an issue of the sf
github page.
In the meantime, your instinct to go with tidyr
functions is sound, its just a little tricky to get a clean separation. The following looks pretty good:
asf_c_df <- st_set_geometry(asf_c, NULL) %>%
# remove duplicate
tags
dplyr::mutate(Description = gsub('
', '
', Description)) %>%
# split on
tidyr::separate(., col = Description,
into = c('Date', 'Province', 'City', 'County', 'Location',
'Total_herd_size', 'Affected_animals', 'Deaths',
'Culled', 'Latitude', 'Longitude', 'Source'),
sep = '
') %>%
# ditch the key: part of key: value
dplyr::mutate_all(., funs(gsub('^.*: ', '', .))) %>%
# data type fixes
dplyr::mutate_at(vars(7:10), as.integer) %>%
dplyr::mutate_at(vars(11,12), as.numeric) %>%
# bonus points: proper dates. First, fix September, then cast to Date datatype
dplyr::mutate(Date = gsub('Sept', 'Sep', Date),
Date = as.POSIXct(Date, format = '%b %d, %Y')) %>%
# double bonus! proper NA for missing data
dplyr::mutate_if(is.character, funs(ifelse(. == '', NA, .)))