Full XXXX. The main training hyper-parameters are shown in Tab. 3, which basically follow [55]. Most of the hyper-parameters are the same for different training strategies, except for the number of training epochs (#Epoch) on MNLI. For the standard CE loss and example reweighting, the model is trained for 3 epochs. For XxX and confidence regularization, the model is trained for 5 epochs.
Appears in 3 contracts
Samples: openreview.net, openreview.net, openreview.net
Full XXXX. The main training hyper-parameters are shown in Tab. 3, which basically follow [5525]. Most of the hyper-parameters are the same for different training strategies, except for the number of training epochs (#Epoch) on MNLI. For the standard CE loss and example reweighting, the model is trained for 3 epochs. For XxX and confidence regularization, the model is trained for 5 epochs.
Appears in 1 contract
Samples: proceedings.neurips.cc