Conclusions and Limitations. In this paper, we investigate whether sparsity and robustness to dataset bias can be achieved simulta- neously for PLM subnetworks. Through extensive experiments, we demonstrate that XXXX indeed contains sparse and robust subnetworks (SRNets) across a variety of NLU tasks and training and pruning setups. We further use the OOD information to reveal that there exist sparse and almost unbiased XXXX subnetworks. Finally, we present analysis and solutions to refine the SRNet searching process in terms of subnetwork performance and searching efficiency. The limitations of this work is twofold. First, we focus on XXXX-like PLMs and NLU tasks, while dataset biases are also common in other scenarios. For example, gender and racial biases exist in dialogue generation systems [7] and PLMs [17]. In the future work, we would like to extend our exploration to other types of PLMs and NLP tasks (see Appendix E.2 for a discussion). Second, as we discussed in Section 5.1, our analysis on “the timing to start searching SRNets” mainly serves as a proof-of-concept, and actually reducing the training cost requires predicting the exact timing.
Appears in 3 contracts
Samples: Research and Development, Research and Development, Research Paper
Conclusions and Limitations. In this paper, we investigate whether sparsity and robustness to dataset bias can be achieved simulta- neously for PLM subnetworks. Through extensive experiments, we demonstrate that XXXX indeed contains sparse and robust subnetworks (SRNets) across a variety of NLU tasks and training and pruning setups. We further use the OOD information to reveal that there exist sparse and almost unbiased XXXX subnetworks. Finally, we present analysis and solutions to refine the SRNet searching process in terms of subnetwork performance and searching efficiency. The limitations of this work is twofold. First, we focus on XXXX-like PLMs and NLU tasks, while dataset biases are also common in other scenarios. For example, gender and racial biases exist in dialogue generation systems [76] and PLMs [1715]. In the future work, we would like to extend our exploration to other types of PLMs and NLP tasks (see Appendix E.2 for a discussion). Second, as we discussed in Section 5.1, our analysis on “the timing to start searching SRNets” mainly serves as a proof-of-concept, and actually reducing the training cost requires predicting the exact timing.
Appears in 1 contract
Samples: Research and Development