Primary tabs

The R code for extracting LiDAR predictor variables and comparing model types

14 files in this archive

  • Greenbelt21-main/
  • Greenbelt21-main/Biomass_function.R
  • Greenbelt21-main/Creating_Canopy_Height_Models.Rmd
  • Greenbelt21-main/Creating_Canopy_Height_Models.html
  • Greenbelt21-main/Creating_Plot_Polygons.Rmd
  • Greenbelt21-main/Creating_Plot_Polygons.html
  • Greenbelt21-main/EPA_R5_Load_Reduction_Calculator.html
  • Greenbelt21-main/EPA_R5_Load_Reduction_Calculator.txt
  • Greenbelt21-main/Extracting_Lidar_Predtictor_Variables.Rmd
  • Greenbelt21-main/Extracting_Lidar_Predtictor_Variables.html
  • Greenbelt21-main/Model_Comparison_Final.Rmd
  • Greenbelt21-main/Model_Comparison_Final.html
  • Greenbelt21-main/README.md
  • Greenbelt21-main/SoilModelTutorial.py
Variables
EPA_R5_Load_Reduction_Calculator
  • Label:
  • Definition: Text file and html file containing script for custom EPA Region 5 Load Reduction Calculator website.
  • Type: Nominal
  • Missing values: None specified
Creating_Plot_Polygons
  • Label:
  • Definition: R script that creates plot polygons from the southwest corner of a plot.
  • Type: Nominal
  • Missing values: None specified
Extracting_Lidar_Predictor_Variables
  • Label:
  • Definition: R script for extracting the pixel distribution of a plot from a canopy height file(CHM). Then calculates various height precentiles and crown geometric volume of a plot.
  • Type: Nominal
  • Missing values: None specified
Model_Comparison_Final
  • Label:
  • Definition: Using the created training data from Extracting_Lidar_Predictor_Variables a multiple linear regression, log-linear, random forest, and support vector regression model via a 10-fold cross-validation were compared to determine the most effective predictive model.
  • Type: Nominal
  • Missing values: None specified
Biomass_Function
  • Label:
  • Definition: A function used to make a canopy height model (CHM) into a biomass estimation raster with 10 meter by X 10 meter cell size using a log-linear model.
  • Type: Nominal
  • Missing values: None specified
SoilModelTutorial
  • Label:
  • Definition: Python script for estimating belowground carbon storage (soil organic carbon) for existing Greenbelt properties.
  • Type: Nominal
  • Missing values: None specified
Methods: 

Choosing appropriate predictor variables from LiDAR-derived forest measurements that are correlated with aboveground forest biomass is necessary to create a parsimonious model for biomass estimation (Gleason & Im, 2012). In this study, several LiDAR-derived predictor variables were explored that were assumed to be correlated with biomass. Predictor variables were obtained from Gleason and Im and expanded upon, resulting in a list of predictor variables that included the maximum and minimum pixel height values within each plot, as well as the LiDAR-derived height percentiles (10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, and 90th) for each plot.
We considered four models including Ordinary Least Squares Regression (OLS), Power Law (PL), Random Forest (RF), and Support Vector Regression (SVR). For the parametric models, the optimal model was chosen by which combination of predictor variables yielded the model with the lowest Akaike information criterion (AIC) value. For the PL model, only height percentiles above the 40th percentile were considered to avoid log transforming height percentiles that were equal to zero. Because AIC values are not applicable for non-parametric machine learning models, predictor variables for RF and SVR were selected using a RF classifier which has been found in other studies to provide valuable insight regarding the discriminative ability of individual predictor variables (Archer & Kimes, 2008). Ranked predictor variable importance were, in descending order, Maximum Height, Minimum Height, 90th percentile, 10th percentile, 20th percentile, 50th percentile, 30th percentile, 80th percentile, 40th percentile, 60th percentile, and lastly the 70th percentile. Predictor variables were then added in order of ranked importance to the model until the addition of additional predictor variables resulted in a model with poorer model performance (lower cross validated R2 and Root Mean Squared Error (RMSE)).
To measure the accuracy of each model type, a 10-fold cross-validation scheme was used due to the limited available training data. R2 values and RMSE were calculated for each model type along with the standard error for each model metric. Adjusted R2 and p-values were also calculated on the entire dataset for each model.