Full XXXX. The main training hyper-parameters are shown in Tab. 3, which basically follow [55]. Most of the hyper-parameters are the same for different training strategies, except for the number of training epochs (#Epoch) on MNLI. For the standard CE loss and example reweighting, the model is trained for 3 epochs. For XxX and confidence regularization, the model is trained for 5 epochs.
Appears in 3 contracts
Samples: Research and Development, Research and Development, Research Paper
Full XXXX. The main training hyper-parameters are shown in Tab. 3, which basically follow [5525]. Most of the hyper-parameters are the same for different training strategies, except for the number of training epochs (#Epoch) on MNLI. For the standard CE loss and example reweighting, the model is trained for 3 epochs. For XxX and confidence regularization, the model is trained for 5 epochs.
Appears in 1 contract
Samples: Research Paper