Cross Validation and Predictions Clause Samples
Cross Validation and Predictions. A 10-fold cross-validation (CV) process was carried out on both the LME and random forest model in the same manner to validate the prediction results from both models. The model fitting dataset consisting of 8,491 ground observations were randomly divided into 10 segments or subsets with each segment containing 10% of the data. Nine of the segments were used as a training dataset set to fit the model and the remaining segment is used as a testing dataset to make predictions. This process is repeated 10 times, each time dividing the dataset at different intervals to ensure that the segments are not repeated. After the 10th repetition, the total number of predictions based on the testing dataset is combined into one dataset and is equal to the original number of ground observations. A correlation between the predictions and the original ground observations is conducted to produce a CV R2. After cross-validation, daily datasets consisting of the set of variables used in the LME and random forest models except for ground measurements were created to make predictions. Predictions were made using the predict function in statistical software R using both models on the same daily datasets. Once predictions were made, daily files were aggregated to the monthly and yearly level for mapping using ArcGIS.
