Attention Is All You Need, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, 2017Advances in Neural Information Processing SystemsDOI: 10.48550/arXiv.1706.03762 - The original paper introducing the Transformer architecture, which includes the detailed design of the Position-wise Feed-Forward Network.
Dive into Deep Learning, Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola, 2024 (Cambridge University Press) - A comprehensive open-source textbook offering explanations and code examples for deep learning models, with a dedicated section on the Transformer's Position-wise Feed-Forward Network.