What Does BERT Learn about the Structure of Language?, John Hewitt and Christopher D. Manning, 2019Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)DOI: 10.18653/v1/P19-1084 - A foundational paper introducing structural probing, a technique to analyze what syntactic information is encoded in contextual word representations like BERT's hidden states.
BERT Rediscovers the Classical NLP Pipeline, Ian Tenney, Dipanjan Das, and Ellie Pavlick, 2019Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)DOI: 10.18653/v1/P19-1443 - This work systematically investigates the types of linguistic knowledge, from morphological to semantic, that emerge at different layers of BERT's representations through extensive probing experiments.
Linguistic Knowledge and Transferability of Contextual Representations, Xiaodong Liu, Pengcheng He, Weizhu Chen, and Jianfeng Gao, 2019Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Vol. 1 (Association for Computational Linguistics)DOI: 10.18653/v1/N19-1111 - A comprehensive analysis of linguistic information encoded in various contextual embeddings, including BERT, across different layers and its transferability to downstream tasks, using probing techniques.
Transformers Documentation - AutoModel, Hugging Face team, 2024 (Hugging Face) - Official documentation for the Hugging Face Transformers library, detailing how to load pre-trained models and configure outputs (like hidden states), which is fundamental for implementing probing experiments.