*14* files in this archive

- Greenbelt21-main/
- Greenbelt21-main/Biomass_function.R
- Greenbelt21-main/Creating_Canopy_Height_Models.Rmd
- Greenbelt21-main/Creating_Canopy_Height_Models.html
- Greenbelt21-main/Creating_Plot_Polygons.Rmd
- Greenbelt21-main/Creating_Plot_Polygons.html
- Greenbelt21-main/EPA_R5_Load_Reduction_Calculator.html
- Greenbelt21-main/EPA_R5_Load_Reduction_Calculator.txt
- Greenbelt21-main/Extracting_Lidar_Predtictor_Variables.Rmd
- Greenbelt21-main/Extracting_Lidar_Predtictor_Variables.html
- Greenbelt21-main/Model_Comparison_Final.Rmd
- Greenbelt21-main/Model_Comparison_Final.html
- Greenbelt21-main/README.md
- Greenbelt21-main/SoilModelTutorial.py

Choosing appropriate predictor variables from LiDAR-derived forest measurements that are correlated with aboveground forest biomass is necessary to create a parsimonious model for biomass estimation (Gleason & Im, 2012). In this study, several LiDAR-derived predictor variables were explored that were assumed to be correlated with biomass. Predictor variables were obtained from Gleason and Im and expanded upon, resulting in a list of predictor variables that included the maximum and minimum pixel height values within each plot, as well as the LiDAR-derived height percentiles (10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, and 90th) for each plot.

We considered four models including Ordinary Least Squares Regression (OLS), Power Law (PL), Random Forest (RF), and Support Vector Regression (SVR). For the parametric models, the optimal model was chosen by which combination of predictor variables yielded the model with the lowest Akaike information criterion (AIC) value. For the PL model, only height percentiles above the 40th percentile were considered to avoid log transforming height percentiles that were equal to zero. Because AIC values are not applicable for non-parametric machine learning models, predictor variables for RF and SVR were selected using a RF classifier which has been found in other studies to provide valuable insight regarding the discriminative ability of individual predictor variables (Archer & Kimes, 2008). Ranked predictor variable importance were, in descending order, Maximum Height, Minimum Height, 90th percentile, 10th percentile, 20th percentile, 50th percentile, 30th percentile, 80th percentile, 40th percentile, 60th percentile, and lastly the 70th percentile. Predictor variables were then added in order of ranked importance to the model until the addition of additional predictor variables resulted in a model with poorer model performance (lower cross validated R2 and Root Mean Squared Error (RMSE)).

To measure the accuracy of each model type, a 10-fold cross-validation scheme was used due to the limited available training data. R2 values and RMSE were calculated for each model type along with the standard error for each model metric. Adjusted R2 and p-values were also calculated on the entire dataset for each model.