Does artificial intelligence understand word impressions like humans do?
en-GBde-DEes-ESfr-FR

Does artificial intelligence understand word impressions like humans do?


UOsaka researchers quantitatively compare the ways humans and machines conceptualize words

Osaka, Japan – By now, it’s no secret that large language models (LLMs) are experts at mimicking natural language. Trained on vast troves of data, these models have proven themselves capable of generating text so convincing that it regularly appears humanlike to readers. But is there any difference between how we think about a word and how an LLM does?
In an article published this month in Behavior Research Methods, a team of researchers from The University of Osaka studied the “impressions” LLMs form of words along several different dimensions and compared them with how humans conceptualize words.

The experiment was straightforward: the research team took 695 words acquired early in life by children and asked different LLMs to rate each one in terms of attributes such as concreteness, socialness, and arousal. They then quantitatively compared these model-based ratings with human norms from prior studies.

“We observe relatively strong correlations between the two groups for some attributes, such as concreteness, imageability, and body-object interaction,” says Hiromichi Hagihara, lead author of the study. “These findings suggest that LLMs, despite not interacting directly with the world, may hold certain forms of embedded knowledge because humans implicitly encode information about the world into language.”

However, although some attributes showed strong correlations in the ratings, humans and LLMs did not always agree. “As an example, evaluations of iconicity, which is the degree to which a word’s form and meaning are similar, diverge substantially between humans and LLMs,” explains Kazuki Miyazawa, a co-author of the study. “For many attributes, the discrepancy is greatest for words like prepositions and conjunctions. Even for attributes like concreteness, where there is high overall agreement, human ratings vary widely across these words, whereas LLMs tend to assign consistently low ratings. This suggests that LLMs do not perceive these kinds of words the way we do.”

Taking these results one step further, the team examined how well human- and LLM-based attribute ratings predicted the age of acquisition of these words. They found that while some attributes exhibited similar predictive patterns across humans and LLMs, systematic biases also emerged, depending on the model.

“In some cases, LLM-based ratings tend to overestimate how strongly certain word features are related to how early words are learned, compared with human ratings,” says Hagihara. “For example, concreteness is negatively correlated with age of acquisition in human data, which means children tend to learn more concrete words earlier. While LLM-based ratings capture this general trend, some models exaggerate the strength of these relationships for particular features.”

The team believes that the results of this study can be used to build LLMs that approximate humans cognitively or serve as complementary tools in psychological studies. One day, such LLMs could offer new perspectives on how humans learn and process language.
###
The article, “How well do large language models mirror human cognition of word concepts? A comparison of psychological ratings for early-acquired English words,” was published in Behavior Research Methods at https://doi.org/10.3758/s13428-025-02938-2

About The University of Osaka
The University of Osaka was founded in 1931 as one of the seven imperial universities of Japan and is now one of Japan's leading comprehensive universities with a broad disciplinary spectrum. This strength is coupled with a singular drive for innovation that extends throughout the scientific process, from fundamental research to the creation of applied technology with positive economic impacts. Its commitment to innovation has been recognized in Japan and around the world. Now, The University of Osaka is leveraging its role as a Designated National University Corporation selected by the Ministry of Education, Culture, Sports, Science and Technology to contribute to innovation for human welfare, sustainable development of society, and social transformation.
Website: https://resou.osaka-u.ac.jp/en
Title: How well do large language models mirror human cognition of word concepts?: A comparison of psychological ratings for early-acquired English words
Journal: Behavior Research Methods
Authors: Hiromichi Hagihara, and Kazuki Miyazawa
DOI: 10.3758/s13428-025-02938-2
Funded by: Japan Society for the Promotion of Science
Article publication date: 2026 Feb 2nd
Related links:
Developmental Cognitive Science Lab, Graduate School of Human Sciences, The University of Osaka
https://babylab.hus.osaka-u.ac.jp/en
Fichiers joints
  • Fig. 1 Schematic overview of the study design. The authors asked large language models (LLMs) to rate approximately 700 English words that are typically acquired early in childhood according to 21 psychological features (e.g., Concreteness, Emotional Valence). By comparing these AI-generated ratings with ratings from human participants in prior studies, the authors examined how well LLMs can reproduce humanlike impressions and intuition about these words.©Original content, Credit must be given to the creator., Hiromichi Hagihara
  • Fig. 2 Example comparisons between human and LLM ratings. Each scatterplot compares ratings from humans and an LLM for a given psychological feature. Points closer to the diagonal line (from bottom left to top right) indicate stronger agreement between humans and the LLM. For Concreteness, ratings from humans and the LLM generally show high agreement. In contrast, for Iconicity (the degree to which a word’s sound resembles its meaning), the rating patterns differ substantially. Notably, even for Concreteness, which shows high overall agreement, human ratings vary widely for function words such as prepositions and conjunctions, whereas the LLM consistently assigns them low concreteness values. This highlights systematic differences in how humans and AI “perceive” certain types of words.©CC BY., 2025, Hiromichi Hagihara et al., How well do large language models mirror human cognition of word concepts?: A comparison of psychological ratings for early-acquired English words, Behavior Research Methods (Publisher: Springer Nature)
Regions: Asia, Japan, Europe, United Kingdom, North America, United States
Keywords: Society, Psychology, Social Sciences, Humanities, Linguistics

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Témoignages

We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet
AlphaGalileo is a great source of global research news. I use it regularly.
Robert Lee Hotz, LA Times

Nous travaillons en étroite collaboration avec...


  • e
  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2026 by DNN Corp Terms Of Use Privacy Statement