Artificial intelligence struggles to understand the Mainz dialect
en-GBde-DEes-ESfr-FR

Artificial intelligence struggles to understand the Mainz dialect


New study of Mainz University finds that language models misinterpret words in regional varieties

How well do language models understand Meenzerisch, the dialect spoken in the German city of Mainz? A research team led by Johannes Gutenberg University Mainz (JGU) has now investigated this question for the first time. Meenzerisch still shapes regional language culture today and is known across Germany through the satirical carnival speeches in the famous Mainz Fastnacht tradition. The study's findings, recently presented at the 2026 Language Resources and Evaluation Conference (LREC) in Palma de Mallorca, show that current Artificial intelligence (AI) models have great difficulty understanding the dialect correctly.

"Language varieties such as Meenzerisch are an important part of cultural identity, but are disappearing from everyday use," said Minh Duc Bui of the JGU Institute of Computer Science, who led the study together with Professor Katharina von der Wense. "Regional dialects have so far received very little attention in digital language research. Yet language technologies could help make them more visible and preserve them in the long run."

Making Meenzerisch machine-readable

The team, which also included a researcher from Marburg University, first created a new dataset for Meenzerisch. It was based on a dictionary published in 1966, which the researchers digitized. The result was a machine-readable lexicon of 2,351 dialect words and their definitions in standard German. "Until now, resources of exactly this kind were missing for Meenzerisch," stated Professor Katharina von der Wense, head of the Natural Language Processing group at JGU.

Using the digital lexicon, the researchers were now able to systematically examine how well large language models handle the dialect. They tested several open-source language models of different sizes. The models were asked first to explain the meanings of Meenzerisch words and then to generate the correct dialect words from definitions written in standard German.

"Our results are quite clear," said Bui. "The models we tested fail both at understanding and at producing the dialect." When asked to generate word definitions, the models achieved an average accuracy of only 4.24 percent. Even the strongest models in the study scored very poorly. In the reverse task of generating a dialect word from a definition, accuracy dropped to just 0.56 percent. Additional support, such as prompt examples or automatically derived rules, did little to improve performance. In every case, accuracy remained below 10 percent.

Smaller language varieties risk becoming digitally invisible

"The results show very clearly that today's language models barely understand Meenzerisch," said co-author Professor Peter Herbert Kann of Marburg University, who also speaks the Mainz dialect. "That is interesting from a technical perspective, but it also shows how quickly smaller language varieties can become invisible in digital applications." According to the researchers, one reason may be that dialects are primarily spoken rather than written, leaving very little text data available for analysis.

"In the long term, we need models that can process not only standard languages, but also regional and culturally significant varieties," emphasized Bui. Language technologies could help document dialects digitally and make them more accessible. The current study, funded by the Carl Zeiss Foundation as part of the interdisciplinary JGU research project "Trading Off Non-Functional Properties of Machine Learning" (TOPML), is a first step in that direction. "Going forward, however, targeted datasets and new training approaches will be needed to support linguistic and cultural diversity in the digital sphere."


Related links:
Read more:
M. D. Bui et al., Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect, Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), May 2026,
DOI: 10.63317/4foh8f7kygj8
https://lrec.elra.info/lrec2026-main-258
Fichiers joints
  • The word “Rachebutzer” as misinterpreted by a large language model (top) and in its actual meaning in the Mainz dialect (bottom).(ill./©: Minh Duc Bui, created with the help of Claude)
Regions: Europe, Germany
Keywords: Applied science, Computing, Artificial Intelligence, Public Dialogue - applied science, Business, Universities & research

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Témoignages

We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet
AlphaGalileo is a great source of global research news. I use it regularly.
Robert Lee Hotz, LA Times

Nous travaillons en étroite collaboration avec...


  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2026 by DNN Corp Terms Of Use Privacy Statement