Knowing Knowledge: Epistemological Study of Knowledge in Transformers
:1. Introduction
2. Subject and Methods
- gnoseological nature of human knowledge;
- origin and encoding of human knowledge in Artificial Neural Networks;
- specificity of artificial knowledge based on symbols and numbers;
- relationship between Human Neural Networks and Artificial Neural Networks;
3. Obtained Results
3.1. Innateness
3.2. No-Innateness
3.3. The Truth Lies in the Middle
4. Experiments
4.2. Transformer
- encoder-decoder attention layers, in this type of layer, the queries come from the previous decoder layer while the keys and values come from the encoder output. This allows each position in the decoder to give attention to all the positions of the input sequence.
- self-attention layer contained in the encoder receives key, value, and query input from the output of the previous encoder layer. Each position in the encoder can get an attention score from every position in the previous encoder layer.
- self-attention in the decoder, this is similar to self-attention in the encoder where all queries, keys, and values come from the previous layer. The self-attention decoder allows each position to attend each position up to and including that position. The future values are masked with (-Inf). This is known as masked-self attention.
4.3. Experimental Set-Up
4.4. Discussion
5. Conclusions
“The distance that makes objects smaller to the eye enlarges them at the thought.”
Author Contributions
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
- Chomsky, N. Aspects of the Theory of Syntax; The MIT Press: Cambridge, UK, 1965. [Google Scholar]
- Pinker, S.; Jackendoff, R. The faculty of language: What’s special about it? Cognition 2005, 95, 201–236. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ranaldi, L.; Fallucchi, F.; Zanzotto, F.M. Dis-Cover AI Minds to Preserve Human Knowledge. Future Internet 2022, 14, 10. [Google Scholar] [CrossRef]
- Chomsky, N. Syntactic Structures; Cambridge Center for Behavioral Studies (CCBS): Littleton, MA, USA, 1957. [Google Scholar]
- Chomsky, N. On certain formal properties of grammars. Inf. Control 1959, 2, 137–167. [Google Scholar] [CrossRef] [Green Version]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:cs.CL/1706.03762. [Google Scholar]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. arXiv 2020, arXiv:cs.CL/2005.14165. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G.E. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Sinha, K.; Sodhani, S.; Dong, J.; Pineau, J.; Hamilton, W.L. CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text. arXiv 2019, arXiv:cs.LG/1908.06177. [Google Scholar]
- Talmor, A.; Elazar, Y.; Goldberg, Y.; Berant, J. oLMpics—On what Language Model Pre-training Captures. arXiv 2020, arXiv:cs.CL/1912.13283. [Google Scholar] [CrossRef]
- McCoy, T.; Pavlick, E.; Linzen, T. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; Association for Computational Linguistics. pp. 3428–3448. [Google Scholar] [CrossRef]
- Ranaldi, L.; Ranaldi, F.; Fallucchi, F.; Zanzotto, F.M. Shedding Light on the Dark Web: Authorship Attribution in Radical Forums. Information 2022, 13, 435. [Google Scholar] [CrossRef]
- Ranaldi, L.; Fallucchi, F.; Santilli, A.; Zanzotto, F. KERMITviz: Visualizing Neural Network Activations on Syntactic Trees. In Proceedings of the Metadata and Semantic Research, Virtual Event, 29 November–3 December 2021. [Google Scholar]
- Elman, J.L. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
- Jiang, Y.; Li, X.; Luo, H.; Yin, S.; Kaynak, O. Quo vadis artificial intelligence? Discov. Artif. Intell. 2022, 2, 1–19. [Google Scholar] [CrossRef]
- Zeyl, D.; Sattler, B. Plato’s Timaeus. In The Stanford Encyclopedia of Philosophy, Summer 2019 ed.; Zalta, E.N., Ed.; Metaphysics Research Lab, Stanford University: Stanford, CA, USA, 2019. [Google Scholar]
- Newmeyer, F.J. Explaining language universals. J. Linguist. 1990, 26, 203–222. [Google Scholar] [CrossRef]
- Fodor, J.A. The Modularity of Mind: An Essay on Faculty Psychology; MIT Press: Cambridge, MA, USA, 1983. [Google Scholar]
- Chomsky, N. A Review of B. F. Skinner’s Verbal Behavior. In Readings in the Psychology of Language; Jakobovits, L.A., Miron, M.S., Eds.; Prentice-Hall: Englewood Cliffs, NJ, USA, 1967. [Google Scholar]
- White, L. Second Language Acquisition and Universal Grammar. Stud. Second Lang. Acquis. 1990, 12, 121–133. [Google Scholar] [CrossRef]
- Lowenthal, F. Logic and language acquisition. Behav. Brain Sci. 1991, 14, 626–627. [Google Scholar] [CrossRef]
- Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Moschitti, A. Making Tree Kernels practical for Natural Language Learning. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics EACL’06, Trento, Italy, 3–7 April 2006. [Google Scholar]
- Collins, M.; Duffy, N. New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, USA, 6–12 July 2002. [Google Scholar]
- Culotta, A.; Sorensen, J. Dependency Tree Kernels for Relation Extraction. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Stroudsburg, PA, USA, 21–26 July 2004; Association for Computational Linguistics, ACL’04. p. 423-es. [Google Scholar] [CrossRef] [Green Version]
- Pighin, D.; Moschitti, A. On Reverse Feature Engineering of Syntactic Tree Kernels. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics. Uppsala, Sweden, 15–16 July 2010; pp. 223–233. [Google Scholar]
- Zanzotto, F.M.; Dell’Arciprete, L. Distributed Tree Kernels. arXiv 2012, arXiv:cs.LG/1206.4607. [Google Scholar]
- Zanzotto, F.M.; Santilli, A.; Ranaldi, L.; Onorati, D.; Tommasino, P.; Fallucchi, F. KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; Association for Computational Linguistics. pp. 256–267. [Google Scholar]
- Bonner, S.; Barrett, I.P.; Ye, C.; Swiers, R.; Engkvist, O.; Hoyt, C.T.; Hamilton, W.L. Understanding the performance of knowledge graph embeddings in drug discovery. Artif. Intell. Life Sci. 2022, 2, 100036. [Google Scholar] [CrossRef]
- Chen, S.; Liu, X.; Gao, J.; Jiao, J.; Zhang, R.; Ji, Y. HittER: Hierarchical Transformers for Knowledge Graph Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), Punta Cana, Dominican Republic, 7–11 November 2021. Association for Computational Linguistics. [Google Scholar]
- Pellegrin, P. Definition in Aristotle’s Posterior Analytics. In Being, Nature, and Life in Aristotle: Essays in Honor of Allan Gotthelf; Lennox, J.G., Bolton, R., Eds.; Cambridge University Press: Cambridge, UK, 2010; pp. 122–146. [Google Scholar] [CrossRef]
- Bodnar, I. Aristotle’s Natural Philosophy. In The Stanford Encyclopedia of Philosophy, Spring 2018 ed.; Zalta, E.N., Ed.; Metaphysics Research Lab, Stanford University: Stanford, CA, USA, 2018. [Google Scholar]
- Smith, R. Aristotle’s Logic. In The Stanford Encyclopedia of Philosophy, Fall 2020 ed.; Zalta, E.N., Ed.; Metaphysics Research Lab, Stanford University: Stanford, CA, USA, 2020. [Google Scholar]
- Pinker, S.; Bloom, P. Natural language and natural selection. Behav. Brain Sci. 1990, 13, 707–727. [Google Scholar] [CrossRef] [Green Version]
- Dai, Z.; Yang, Z.; Yang, Y.; Carbonell, J.; Le, Q.V.; Salakhutdinov, R. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. arXiv 2019, arXiv:cs.LG/1901.02860. [Google Scholar]
- Hewitt, J.; Manning, C.D. A Structural Probe for Finding Syntax in Word Representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1 (Long and Short Papers), Association for Computational Linguistics. pp. 4129–4138. [Google Scholar] [CrossRef]
- Christiansen, M.H.; Chater, N. Language as shaped by the brain. Behav. Brain Sci. 2008, 31, 489–509. [Google Scholar] [CrossRef] [Green Version]
- Marler, P. Innateness and the instinct to learn. An. Acad. Bras. Ciênc. 2004, 76, 189–200. [Google Scholar] [CrossRef] [Green Version]
- Marcus, G. Innateness, AlphaZero, and Artificial Intelligence. arXiv 2018, arXiv:cs.AI/1801.05667. [Google Scholar]
- Spelke, E.S.; Kinzler, K.D. Core knowledge. Dev. Sci. 2007, 10, 89–96. [Google Scholar] [CrossRef]
- Gervain, J.; Berent, I.; Werker, J.F. Binding at Birth: The Newborn Brain Detects Identity Relations and Sequential Position in Speech. J. Cogn. Neurosci. 2012, 24, 564–574. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Senghas, A.; Kita, S.; Özyürek, A. Children Creating Core Properties of Language: Evidence from an Emerging Sign Language in Nicaragua. Science 2004, 305, 1779–1782. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lely, H.; Pinker, S. The biological basis of language: Insight from developmental grammatical impairments. Trends Cogn. Sci. 2014, 18, 586–595. [Google Scholar] [CrossRef]
- Zhu, M.; Zhang, Y.; Chen, W.; Zhang, M.; Zhu, J. Fast and Accurate Shift-Reduce Constituent Parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013; Volume 1, Long Papers, Association for Computational Linguistics. pp. 434–443. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, (Long and Short Papers), Association for Computational Linguistics. pp. 4171–4186. [Google Scholar] [CrossRef]
- Zhu, Y.; Kiros, R.; Zemel, R.; Salakhutdinov, R.; Urtasun, R.; Torralba, A.; Fidler, S. Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books. arXiv 2015, arXiv:cs.CV/1506.06724. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. arXiv 2019, arXiv:abs/1910.03771. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2015, arXiv:abs/1412.6980. [Google Scholar]
- Warstadt, A.; Singh, A.; Bowman, S.R. Neural Network Acceptability Judgments. Trans. Assoc. Comput. Linguist. 2019, 7, 625–641. [Google Scholar] [CrossRef]
- Cer, D.; Diab, M.; Agirre, E.; Lopez-Gazpio, I.; Specia, L. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, BC, Canada, 3–4 August 2017; Association for Computational Linguistics. pp. 1–14. [Google Scholar] [CrossRef] [Green Version]
- Sharma, L.; Graesser, L.; Nangia, N.; Evci, U. Natural Language Understanding with the Quora Question Pairs Dataset. arXiv 2019, arXiv:abs/1907.01041. [Google Scholar]
- Levesque, H.J.; Davis, E.; Morgenstern, L. The Winograd Schema Challenge. In Proceedings of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning, Rome, Italy, 10–14 June 2012; KR’12. AAAI Press: Palo Alto, CA, USA; pp. 552–561. [Google Scholar]
- Podkorytov, M.; Biś, D.; Liu, X. How Can the [MASK] Know? The Sources and Limitations of Knowledge in BERT. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar] [CrossRef]
Model | CoLa | STSB | QQP | WNLI |
BERT | 67.5( | |||
KERMIT | ||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (
Share and Cite
Ranaldi, L.; Pucci, G. Knowing Knowledge: Epistemological Study of Knowledge in Transformers. Appl. Sci. 2023, 13, 677.
Ranaldi L, Pucci G. Knowing Knowledge: Epistemological Study of Knowledge in Transformers. Applied Sciences. 2023; 13(2):677.
Chicago/Turabian StyleRanaldi, Leonardo, and Giulia Pucci. 2023. "Knowing Knowledge: Epistemological Study of Knowledge in Transformers" Applied Sciences 13, no. 2: 677.
APA StyleRanaldi, L., & Pucci, G. (2023). Knowing Knowledge: Epistemological Study of Knowledge in Transformers. Applied Sciences, 13(2), 677.