I have a 5.8 m resolution satellite image (LISS IV by IRS-P6, in bands 4, 3 and 2) of a densely forested region in the Eastern Himalayas (altitude 700 m to 3200 m). My objective is to predict regions of canopy cover. Regions of canopy vs. non-canopy.
The R model 'randomForest' turns out to be a popular and a do-able method; now to understand how it works more thoroughly.
In the question Performing Random-Forest Classification of 10cm Imagery for species-distribution in R (no point-shapes)? the example training data has an ndvi and a class column.
Will the inclusion of terrain and NDVI datasets (elevation, aspect, slope) improve the predictive ability of the algorithm?
Answer
In the example you are referencing, NDVI is included as a predictor variable along with all of the band values. The response variable is the class (vegetation type). In your case, you could simply have a binary response (cover, or non-cover).
Random forests is a very valuable machine learning algorithm because you can incorporate any type of predictor you can imagine, including both continuous and discrete datasets. The distribution of the data does not effect the model performance.
For your type of land cover classification, it would be advisable to include predictor variables such as slope, aspect, elevation, CTI, texture, and a variety of vegetation indices. You can also include landform data such as soil type, horizontal distance to water, vertical distance to water, etc...
There is an interesting competition at Kaggle that highlights how to classify forest cover using only landform variables--I highly recommend reading through the forums, since there are lots of sample scripts and links to literature on the subject. Here is the link:
No comments:
Post a Comment