Pruning Filters for Efficient ConvNets, Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, Hans Peter Graf, 2016International Conference on Learning Representations (ICLR)DOI: 10.48550/arXiv.1608.08710 - Introduces the concept of structured pruning by removing entire filters in convolutional neural networks, establishing a key approach for hardware-efficient model compression.
Are Sixteen Heads Really Better than One?, Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, Ivan Titov, 2019ACL (Association for Computational Linguistics)DOI: 10.48550/arXiv.1905.09418 - Investigates the redundancy and importance of attention heads in Transformer models, providing insights and methods for effective attention head pruning.
Sparsity in Deep Learning: Pruning and beyond, Jacob Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, John Guttag, 2020Proceedings of the IEEE, Vol. 108 (IEEE)DOI: 10.1109/JPROC.2020.3031112 - A comprehensive survey on deep learning sparsity, covering various pruning techniques, including structured pruning and its application to modern neural network architectures.