The R code for extracting LiDAR predictor variables and comparing model types

Greenbelt21-main.zip

14 files in this archive

Greenbelt21-main/
Greenbelt21-main/Biomass_function.R
Greenbelt21-main/Creating_Canopy_Height_Models.Rmd
Greenbelt21-main/Creating_Canopy_Height_Models.html
Greenbelt21-main/Creating_Plot_Polygons.Rmd
Greenbelt21-main/Creating_Plot_Polygons.html
Greenbelt21-main/EPA_R5_Load_Reduction_Calculator.html
Greenbelt21-main/EPA_R5_Load_Reduction_Calculator.txt
Greenbelt21-main/Extracting_Lidar_Predtictor_Variables.Rmd
Greenbelt21-main/Extracting_Lidar_Predtictor_Variables.html
Greenbelt21-main/Model_Comparison_Final.Rmd
Greenbelt21-main/Model_Comparison_Final.html
Greenbelt21-main/README.md
Greenbelt21-main/SoilModelTutorial.py

Variables

EPA_R5_Load_Reduction_Calculator

Label:
Definition: Text file and html file containing script for custom EPA Region 5 Load Reduction Calculator website.
Type: Nominal
Missing values: None specified

Creating_Plot_Polygons

Label:
Definition: R script that creates plot polygons from the southwest corner of a plot.
Type: Nominal
Missing values: None specified

Extracting_Lidar_Predictor_Variables

Label:
Definition: R script for extracting the pixel distribution of a plot from a canopy height file(CHM). Then calculates various height precentiles and crown geometric volume of a plot.
Type: Nominal
Missing values: None specified

Model_Comparison_Final

Label:
Definition: Using the created training data from Extracting_Lidar_Predictor_Variables a multiple linear regression, log-linear, random forest, and support vector regression model via a 10-fold cross-validation were compared to determine the most effective predictive model.
Type: Nominal
Missing values: None specified

Biomass_Function

Label:
Definition: A function used to make a canopy height model (CHM) into a biomass estimation raster with 10 meter by X 10 meter cell size using a log-linear model.
Type: Nominal
Missing values: None specified

SoilModelTutorial

Label:
Definition: Python script for estimating belowground carbon storage (soil organic carbon) for existing Greenbelt properties.
Type: Nominal
Missing values: None specified

Methods:

Choosing appropriate predictor variables from LiDAR-derived forest measurements that are correlated with aboveground forest biomass is necessary to create a parsimonious model for biomass estimation (Gleason & Im, 2012). In this study, several LiDAR-derived predictor variables were explored that were assumed to be correlated with biomass. Predictor variables were obtained from Gleason and Im and expanded upon, resulting in a list of predictor variables that included the maximum and minimum pixel height values within each plot, as well as the LiDAR-derived height percentiles (10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, and 90th) for each plot.
We considered four models including Ordinary Least Squares Regression (OLS), Power Law (PL), Random Forest (RF), and Support Vector Regression (SVR). For the parametric models, the optimal model was chosen by which combination of predictor variables yielded the model with the lowest Akaike information criterion (AIC) value. For the PL model, only height percentiles above the 40th percentile were considered to avoid log transforming height percentiles that were equal to zero. Because AIC values are not applicable for non-parametric machine learning models, predictor variables for RF and SVR were selected using a RF classifier which has been found in other studies to provide valuable insight regarding the discriminative ability of individual predictor variables (Archer & Kimes, 2008). Ranked predictor variable importance were, in descending order, Maximum Height, Minimum Height, 90th percentile, 10th percentile, 20th percentile, 50th percentile, 30th percentile, 80th percentile, 40th percentile, 60th percentile, and lastly the 70th percentile. Predictor variables were then added in order of ranked importance to the model until the addition of additional predictor variables resulted in a model with poorer model performance (lower cross validated R2 and Root Mean Squared Error (RMSE)). All resulting predictor variables are shown in Table 2.4.
To measure the accuracy of each model type, a 10-fold cross-validation scheme was used due to the limited available training data. R2 values and RMSE were calculated for each model type along with the standard error for each model metric. Adjusted R2 and p-values were also calculated on the entire dataset for each model.

Field	Value
mimetype	application/zip
filesize	1.02 MB
resource type	file upload
timestamp	Jun 29, 2021

The R code for extracting LiDAR predictor variables and comparing model types

14 files in this archive

Resources

Additional Information