WisPaper
WisPaper
Search
QA
Pricing
TrueCite

Do large language models exhibit theory of mind capabilities?

Large language models can mimic theory of mind on many tests but lack genuine understanding, showing brittle reasoning and key failures.

Direct answer

Large language models (LLMs) can perform impressively on many theory of mind tests, sometimes matching or exceeding human accuracy, but they do not possess genuine theory of mind. For example, GPT-4 matched human performance on false beliefs and indirect requests but struggled with detecting faux pas, while LLaMA2's apparent superiority on that test was actually a bias toward attributing ignorance [1]. The models' reasoning is brittle: minimal changes to a scenario caused answer consistency to drop 18–34% [2], and they lack the developmental and embodied experience that underpins human social understanding [6].

7sources cited

This article was generated with WisPaper-powered search and paper analysis.

How well do LLMs actually perform on theory of mind tests?

On many standard theory of mind tasks, the best LLMs perform at or above human levels. In a comprehensive battery comparing GPT-4, LLaMA2, and 1,907 humans, GPT-4 matched or exceeded humans on identifying indirect requests, false beliefs, and misdirection [1]. GPT-4o also performed comparably to humans on the Strange Stories paradigm, even in the most challenging conditions [5]. These results show that LLMs can produce correct answers on tests that require reasoning about others' mental states.

However, performance is uneven and sometimes deceptive. The same study found that GPT-4 specifically struggled with detecting faux pas, while LLaMA2 appeared to outperform humans on that test — but follow-up analysis showed this was an artifact of a bias toward attributing ignorance, not genuine understanding [1]. GPT-4's poor faux pas performance stemmed from a hyperconservative approach, refusing to commit to conclusions that humans found self-evident [7]. This pattern reveals that high accuracy on some tasks can mask fundamentally different underlying processes.

Why is LLM theory of mind different from human theory of mind?

The core difference is that LLMs lack the developmental, embodied, and cognitive mechanisms that give rise to genuine theory of mind in humans. A systematic review concluded that LLMs produce an 'illusion of understanding' because they have no real-world experience, no developmental trajectory, and no multimodal sensory input — all of which are crucial for human social cognition [6]. Without embodiment in an action-oriented environment, their mentalistic inference is qualitatively different from human cognition [7].

This brittleness is measurable. When researchers applied minimal adversarial transformations to theory of mind scenarios, all tested LLMs showed answer consistency drops of 18–34% [2]. The models' reasoning is not robust: it can be disrupted by small changes that would not fool a human. Furthermore, earlier and smaller models were strongly affected by the number of inferential cues and vulnerable to distracting information, whereas GPT-4o showed high robustness [5]. This variability across models and conditions underscores that LLMs are not reliably deploying a stable, human-like reasoning ability.

What does this mean for trusting LLMs in social roles?

Users already attribute mental states to LLMs, but these attributions affect trust in nuanced ways. In a study of 410 participants, attributing intelligence (reasoning, planning) to an LLM strongly predicted how much people trusted its advice, while attributing consciousness or emotions actually predicted less trust [4]. This suggests users have sophisticated intuitions: they trust LLMs for cognitive tasks but are wary of attributing subjective experience to them.

For practical applications like social skills training, LLMs show promise but require caution. GPT-4o matched human experts in evaluating theory of mind tasks in a gamified environment for autistic users, with no statistically significant differences in accuracy [3]. However, the same study noted that LLMs' 'black box' nature raises concerns about explainability and transparency, especially when used by vulnerable populations. The evidence overall suggests LLMs can be useful tools for social reasoning tasks, but their outputs should not be mistaken for genuine understanding — and their brittleness means they can fail unpredictably.

Sources used in this answer

1

Testing theory of mind in large language models and humans

GPT-4 matched or exceeded humans on false beliefs, indirect requests, and misdirection but struggled with faux pas; LLaMA2's apparent superiority on faux pas was a bias toward attributing ignorance [1].

2

Functional Theory of Mind Evaluation in Large Language Models: A Behavioral and Causal Stability Framework

LLMs showed 18–34% drops in answer consistency under minimal scenario transformations, and later transformer layers (65–80) encoded perspective-taking with measurable causal effects [2].

3

Large language models for autism: evaluating theory of mind tasks in a gamified environment.

GPT-4o matched human experts in evaluating theory of mind tasks in a gamified environment for autistic users, with no statistically significant differences [3].

4

The influence of mental state attributions on trust in large language models

Attributions of intelligence to an LLM strongly predicted trust, while attributions of consciousness predicted less trust, in a study of 410 participants [4].

5

Do Large Language Models Possess a Theory of Mind? A Comparative Evaluation Using the Strange Stories Paradigm

GPT-4o performed comparably to humans on the Strange Stories paradigm even in challenging conditions, while smaller models were vulnerable to distracting information [6].

6

Artificial Intelligence and the Illusion of Understanding: A Systematic Review of Theory of Mind and Large Language Models.

LLMs produce an 'illusion of understanding' because they lack developmental, embodied, and multimodal mechanisms essential for genuine theory of mind [8].

7

Testing Theory of Mind in GPT Models and Humans

GPT models showed human-level performance on false beliefs and misdirection but were impaired at faux pas due to hyperconservatism in drawing conclusions [9].