Artificial intelligence programs have achieved many successes in recent years - Photo: REUTERS
We cannot observe the entire process from input data to output results of large language models (LLMs).
To make it easier to understand, scientists have used common terms like “reasoning” to describe how these programs work. They also say that the programs can “think,” “reason,” and “understand” the way humans do.
Exaggerating the capabilities of AI
Over the past two years, many AI executives have used hyperbole to exaggerate simple technical achievements, according to ZDNET on September 6.
In September 2024, OpenAI announced that the o1 reasoning model "uses a chain of inference when solving problems, similar to the way humans think for a long time when faced with difficult questions."
However, AI scientists object. They believe that AI does not have human intelligence.
A study on the arXiv database by a group of authors at Arizona State University (USA) has verified the reasoning ability of AI with a simple experiment.
The results showed that "inference by chain of thought is a fragile illusion", not a real logical mechanism, but just a sophisticated form of pattern matching.
The term “chain of thought” (CoT) allows AI to not only come up with a final answer but also present each step of the logical reasoning, as in the GPT-o1 or DeepSeek V1 models.
Illustration of OpenAI's GPT-2 language model - Photo: ECHOCRAFTAI
Check out what AI actually does
Large-scale analyses show that LLM tends to rely on semantics and surface clues rather than logical reasoning processes, the researchers say.
"LLM constructs superficial logical chains based on learned input associations, often failing on tasks that deviate from conventional reasoning methods or familiar patterns," the team explains.
To test the hypothesis that LLM is just pattern matching and not actually inferring, the team trained GPT-2, an open-source model released by OpenAI in 2019.
The model was initially trained on very simple tasks on the 26 English letters, like reversing some letters, for example turning "APPLE" into "EAPPL". Then the team changed the task and asked GPT-2 to handle it.
The results show that for tasks not included in the training data, GPT-2 cannot solve them accurately using CoT.
Instead, the model tries to apply the most similar learned tasks. So its “inferences” sound reasonable, but the results are often wrong.
The group concluded that one should not rely too much or blindly trust the LLM's answers, as they can produce "nonsense that sounds very convincing".
They also stressed the need to understand the true nature of AI, avoid hype, and stop promoting that AI has the ability to reason like humans.
Source: https://tuoitre.vn/nghien-cuu-moi-ai-khong-suy-luan-nhu-con-nguoi-20250907152120294.htm
Comment (0)