Mask Training. As we described in Section 3.2.2 of the main paper, we realize mask training via binarization in forward pass and gradient estimation in backward pass. Following [42, 32], we adopt a magnitude- based strategy to initialize the real-valued masks. Specially, we consider two variants: The first one (hard variant) identifies the weights in matrix W with the smallest magnitudes, and sets the corresponding elements in mˆ to zero, and the remaining elements to a fixed value: mˆ i,j 0 if Wi,j ∈ Mins(abs(W)) = α × ϕ otherwise
Appears in 3 contracts
Samples: Research and Development, Research and Development, Research Paper
Mask Training. = As we described in Section 3.2.2 of the main paper, we realize mask training via binarization in forward pass and gradient estimation in backward pass. Following [4217, 3211], we adopt a magnitude- based strategy to initialize the real-valued masks. Specially, we consider two variants: The first one (hard variant) identifies the weights in matrix W with the smallest magnitudes, and sets the corresponding elements in mˆ to zero, and the remaining elements to a fixed value: mˆ i,j 0 if Wi,j ∈ Mins(abs(W)) = α × ϕ otherwise
(1) where Mins(abs(W)) extracts the weights with the lowest absolute value, according to sparsity level
Appears in 1 contract
Samples: Research Paper