Wednesday 26 July 2017

Joining CSV to GeoJSON in Leaflet?


My goal:



  • Store the attributes of a geojson in a csv file

  • Allow a non-GIS user to edit a cloud-hosted csv file which then updates a webmap

  • Have the web map be a static app with no GIS/PostGIS/SQL etc server involved


My method:



  • Load a csv (from say dropbox) and a geojson stored on the webserver


  • Join the csv to the geojson file using javascript

  • Style the polygons using the field added from the csv


This works, but it is too slow...


The loading of the csv happens in about 1 second, and the geojson tiles get created almost instantly. But the join function using eachLayer() takes between 7 and 10 seconds - on a desktop machine with a fast internet connection. On a phone performance for the join is awful, maybe 30-50 seconds.


Is there a faster/better way to join the geojson to the csv?


I could be going about this in the absolute wrong way altogether.


I am using leaflet 0.7.7. The geojson and csv contain 4500 features/rows.


Using performance.now() after the Papa.parse, eachLayer and after the tile script gives me this:


csv loaded in 889.39

csv joined in 7406.235000000001
geojson tiles created in 7523.535000000002

And without the join:


csv loaded in 918.0150000000002
geojson tiles created in 1030.2800000000002

Here is the code (wellZoneParcelsData is the omnivore variable and wellZoneParcels is the L.geoJson):


wellZoneParcelsData.on('ready', function () {
Papa.parse('assets/data/wellston_zone_codes.csv', {

download: true,
header: true,
complete: function(results) {
var now = performance.now();
csvloaded = now - start;
console.log('csv loaded in ' + csvloaded);
wellZoneParcels.eachLayer(function(layer) {
var pid = layer.feature.properties.PARCELID;
var csvData = results.data.filter(function(data){return data.PARCELID == this;}, pid);
layer.feature.properties.zonecode = csvData[0].zonecode;

});//end eachlayer function
now = performance.now();
csvjoined = now - start;
console.log('csv joined in ' + csvjoined);
}//end complete function
});//end papa parse
});

Answer



The Array.prototype.filter method is really slow. In cases where you can rely on an array being predictably formatted (as is presumably the case with CSV data returned from Papa), you are better off just using a for loop to iterate over it yourself. Here is a function that takes the GeoJSON feature properties, Papa Parse data, and a join field name as inputs, then joins the data much more quickly than the array filter method:


function featureJoinByProperty(fProps, dTable, joinKey) {

var keyVal = fProps[joinKey];
var match = {};
for (var i = 0; i < dTable.length; i++) {
if (dTable[i][joinKey] === keyVal) {
match = dTable[i];
for (key in match) {
if (!(key in fProps)) {
fProps[key] = match[key];
}
}

}
}
}

To perform the join in your example, you would just use:


wellZoneParcels.eachLayer(function(layer) {
featureJoinByProperty(layer.feature.properties, results.data, "PARCELID");
});

Here is a fiddle of this working with some sample data hosted on Dropbox:



http://fiddle.jshell.net/nathansnider/2qrwecvo/


This parcel data is not very well-cleaned (I think there are a lot of duplicate polygons), but it does perform the join (with ~5300 features and 8 fields in the CSV) about 5 times faster than the array filter method.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...