Model Fitting and Validation Sample Clauses
Model Fitting and Validation. The model described in Equation 1 is the linear mixed effects model used to fit predictors of PM2.5 to ground observations from both the SENAMHI and ▇▇▇▇▇▇▇▇ sites. AOD was allowed to have daily random effects while relative humidity and planetary boundary layer height is expected to vary but not significantly through the month and as a result was set on the random effects at the monthly level. Overall, the regression R2 for the LME model was 0.63 and the cross-validation (CV) R2 and RMSE is 0.58 and 7.08 µg/m3, respectively. Figure 21 shows a density plot of the correlations between predicted and measured PM2.5 values from the cross-validation of the LME model. Table 7 shows the beta coefficients, standard error, degrees of freedom, t-value, and p-value for each parameter. All predictors except wind U-component, temperature, NDVI, and relative humidity, were highly significant. Wind U-component and temperature were parameters from the WRF-CHEM simulation, and were not highly correlated with the ground observations from Weather Underground. Therefore, insignificance of these parameters were expected. NDVI is the normalized vegetative index, categorizing the vegetative canopy of the particular area from negative one with no vegetative canopy to one with full vegetative canopy. Since air monitors are centrally placed in urban environments, with NDVI staying constant over time, it is therefore expected that NDVI would not be a significant predictor of PM2.5 in the LME model. A random forest model was used to fit predictors of PM2.5 to ground observations from both the SENAMHI and ▇▇▇▇▇▇▇▇ sites. The random forest model was specified with a nodesize of 6, maxnode of 2048, mtry of 6 and the ntree at 1000. The “out of bag” R2 from the random forest model using the entire dataset is 0.73 with an RMSE of 5.61 µg/m3, with a cross-validation (CV) R2 and RMSE of 0.73 and 5.66 µg/m3, respectively. Figure 22 shows a density plot of the correlations between predicted and measured PM2.5 values from the cross-validation of the random forest model. Table 8 shows the name of each predictor along with the importance, or percent increase in MSE. Although random forest is a “black-box” machine learning method, the importance output is a measure of parameter predictive power based on a permutation test [26]. Under the null hypothesis in a random forest model, each predictor variable is not important; the permutation test rearranges the values of that variable to detect any degr...
