Wednesday 30 March 2016

r - Filtering (lasfilter on) huge LiDAR dataset


My dataset is about 8.3 GB after loaded into R environment by using lidR package. Some points which have return number greater than number of returns need to be removed.


With a small dataset, I could just use:


l <- lasfilter(las, ReturnNumber <= NumberOfReturns)

which is not possible with my PC.



What is the right way to filter (to use 'lasfilter' on) huge LiDAR data? I am not sure which one to chose from the list of filter expression for argument 'filter' in the 'readLAS' function. I modified a sample dataset from lidR package so it has points that return number are greater than their number of returns:


library(lidR)
LASfile <- system.file("extdata", "Megaplot.laz", package="lidR")
las = readLAS(LASfile)

# make rn > nr
las@data$ReturnNumber[4] <- 3L
las@data$ReturnNumber[8] <- 3L

writeLAS(las, "/TEMP/rnGTnr.laz")


# using "-keep_last", guessing from its source code (and it's wrong)
las2 <- readLAS("/TEMP/rnGTnr.laz", filter = "-keep_last")

# > Warning message:
# > Invalid data: 2 points with a 'return number' greater than the 'number of returns'.

So, the problem persists.


Can it be done using LASCatalog object (by dividing the data into chunks)? I don't know how to filter using LASCatalog object. There is a filter option for LASCatalog, and again I don't know which one is appropriate for my case.



Answer




There is no streaming equivalent of ReturnNumber <= NumberOfReturns I can see some options:



  1. I'm pretty sure that the warnings comes from points that have a NumberOfReturns = 0. Thus I would try filter = "-drop_number_of_returns 0".

  2. Go to the github repo of the rlas package and open an issue with a feature request. This is not hard to add such filter.

  3. Apply lasfilter at the R level on small chunks with catalog_apply that gives access to the core engine of the package. The side effect is that your file will be split in chunks. But it is not a problem actually.




filter_invalid_returns = function(chunk)
{
las = readLAS(chunk)

if (is.empty(las)) return(NULL)

las <- lasfilter(las, ReturnNumber <= NumberOfReturns)
return(las)
}

LASfile <- system.file("extdata", "Megaplot.laz", package="lidR")
project <- catalog(LASfile)

opt_chunk_buffer(project) <- 0

opt_chunk_size(project) <- 120 # small because this is a dummy example.
opt_output_files(project) <- paste0(tempfile(), "_{ID}")

output <- catalog_apply(project, filter_invalid_returns)
new_proj = catalog(unlist(output))
plot(new_proj)

But the true question is why do you have a file that do not respect LAS specification? And is really pertinent to remove these points? This is another non technical question that nobody can answer for you.


Edit: Actually you did not mention if your 8 GB are from a single file or from several. If it comes from several file you may prefer to use


opt_chunk_buffer(project) <- 0

opt_chunk_size(project) <- 0
opt_output_files(project) <- paste0(tempdir(), "/{ORIGINALFILENAME}")

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...