PLM Backbone. We mainly experiment with the XXXX-base-uncased model [6]. It has roughly 110M parameters in total, and 84M parameters in the Transformer layers. As described in Section 3.1, we derive the subnetworks from the Transformer layers and report sparsity levels relative to the 84M parameters. To generalize our conclusions to other PLMs, we also consider two variants of the XXXX family, namely XxXXXXx-base and XXXX-large, the results of which can be found in Appendix C.5.
Appears in 3 contracts
Samples: openreview.net, openreview.net, openreview.net
PLM Backbone. We mainly experiment with the XXXX-base-uncased model [65]. It has roughly 110M parameters in total, and 84M parameters in the Transformer layers. As described in Section 3.1, we derive the subnetworks from the Transformer layers and report sparsity levels relative to the 84M parameters. To generalize our conclusions to other PLMs, we also consider two variants of the XXXX family, namely XxXXXXx-base and XXXX-large, the results of which can be found in Appendix C.5.
Appears in 1 contract
Samples: openreview.net