A Comprehensive Evaluation of Inductive Reasoning Capabilities and Problem Solving in Large Language Models

Abstract

Inductive reasoning is fundamental to both human and artificial intelligence. The inductive reasoning abilities of current Large Language Models (LLMs) are evaluated in this research.We argue that only considering induction of rules is too narrow and unrealistic, since inductive reasoning is usually mixed with other abilities, like rules application, results/rules validation, and updated information integration.We probed the LLMs with a set of designed symbolic tasks and found that even state-of-the-art (SotA) LLMs fail significantly, showing the inability of LLMs to perform these intuitively simple tasks.Furthermore, we found that perfect accuracy in a small-size problem does not guarantee the same accuracy in a larger-size version of the same problem, provoking the question of how we can assess the LLMs’ actual problem-solving capabilities.We also argue that Chain-of-Thought prompts help the LLMs by decomposing the problem-solving process, but the LLMs still learn limitedly.Furthermore, we reveal that few-shot examples assist LLM generalization in out-of-domain (OOD) cases, albeit limited. The LLM starts to fail when the problem deviates from the provided few-shot examples.

Anthology ID:: 2024.findings-eacl.22
Volume:: Findings of the Association for Computational Linguistics: EACL 2024
Month:: March
Year:: 2024
Address:: St. Julian’s, Malta
Editors:: Yvette Graham, Matthew Purver
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 323–339
Language:
URL:: https://aclanthology.org/2024.findings-eacl.22
DOI:
Bibkey:
Cite (ACL):: Chen Bowen, Rune Sætre, and Yusuke Miyao. 2024. A Comprehensive Evaluation of Inductive Reasoning Capabilities and Problem Solving in Large Language Models. In Findings of the Association for Computational Linguistics: EACL 2024, pages 323–339, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):: A Comprehensive Evaluation of Inductive Reasoning Capabilities and Problem Solving in Large Language Models (Bowen et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-eacl.22.pdf
Software:: 2024.findings-eacl.22.software.zip
Video:: https://aclanthology.org/2024.findings-eacl.22.mp4

PDF Cite Search Software Video