Saturday, 11 January 2020

google earth engine - Applying a reducer over a very large feature


The problem: How can we work around memory limits to apply a reducer over a large feature? EE's debugging page suggests either setting using tileScale to reduce the size of the tiles used for server-side parallelization if using arrays, or to set bestEffort:true with a regular feature. Neither option works in the following example.


A reproducible example: I want to use a reducer with a very large feature. For example, I want to sum the values in the white pixels that occur in the Canada polygon in the below image. Note that this polygon appears to be broken into 6 EE 'tiles'. We will also try the task with a less complex polygon, the East Siberian Taiga ecoregion.


enter image description here


Let's work with existing EE assets:


var countries = ee.FeatureCollection("USDOS/LSIB_SIMPLE/2017")
var hansen = ee.Image("UMD/hansen/global_forest_change_2017_v1_5")

var CAN=countries.filter(ee.Filter.eq('country_na','Canada'))

The pixels are extracted from multiband image ("hansen") at 30m resolution. Let's select one band of interest and then subset it to make the raster data as small as possible (a solution that depends on this subset is not preferred- it's specific to this dataset, and does not offer a general solution to the problem). Finally, let's convert pixel values to area, as per this tutorial.


var lossImage2016=hansen.select(['lossyear']).eq(16)
var areaImage = lossImage2016.multiply(ee.Image.pixelArea())

Running a reducer on 'areaImage' throws a memory limit error, so let's see if we can reduce the size of the raster data even further by masking 0 value pixels (this also produces the above map):


var areaImageMasked=areaImage.updateMask(areaImage)
Map.addLayer(CAN,{},'Canada') //make the map
Map.addLayer(areaImageMasked,{},'areaImageMasked')


Let's try to sum this (sparse) raster layer. Calling "print(CAN)" shows CAN is a feature collection. You can call reduceRegion() with "geometry:CAN.geometry()" to force an aggregation of all the features, but the resulting polygon is too big, so this throws a 'user memory limit exceeded' error. Instead, you can call reduceRegions() with "collection:CAN":


var stats = areaImageMasked.reduceRegions({
reducer: ee.Reducer.sum(), //variously, ''sum'' or 'ee.Reducer.sum()' are used in ee documentation
collection: CAN
scale:30, //scale of image pixel size should always be specified, as per guides/reducers_reduce_region
//tileScale:16 //valid tileScale is 1 to 16.
});
print('pixels representing loss: ', stats.get('loss'), 'square meters')


This throws a 'computation timed out' error. The same result is obtained when setting "tileScale:16" (the max value) to reduce the size of each tile used for server-side parallelization.


Let's try the same task with the East Siberian taiga polygon. This is a multi-polygon with 9147 vertices:


var ecoregions = ee.FeatureCollection("RESOLVE/ECOREGIONS/2017")
var singleRegion=ecoregions.filter(ee.Filter.eq('ECO_NAME','East Siberian taiga'))
var stats = areaImageMasked.reduceRegion({
reducer: ee.Reducer.sum(),
geometry: singleRegion,
scale:30,
maxPixels:1e12 //There appear to be 9176816981 to count!!
});


This also throws a 'computation timed out' error. Even when clipping the input Image to the extent of the singleRegion polygon, there appear to remain ~9 billion pixels to count! This seems implausible.


I have identified the following sub-questions which may be helpful in finding an answer:



  • Can this problem be overcome by exporting? (I think "no" - this is a memory issue, not a computation time issue).

  • When tileScale is maxed out (at 16x, so here there are 96 tiles), how can we get EE to take it easy and run the task in even smaller digestible pieces?

  • Does changing the scale change the output of a reducer? If so, how do we quantify this?

  • Is it better to set scale (here @30 (meters), the pixel size of the 'hansen' data) or bestEffort? This source says "GEE will run your computations at the resolution of your current map view in the code editor unless you tell it otherwise. Whenever possible, explicitly set the scale arguments to force GEE to work in a scale that makes sense for your imagery/analysis".




No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...