The hidden geometry that separates complex data
en-GBde-DEes-ESfr-FR

The hidden geometry that separates complex data


Are two sets of data genuinely different, or is it because of randomness? This question, known as the two-sample testing problem, problem becomes notoriously difficult in modern datasets, because they are often high-dimensional, complex, and differences between them can take countless subtle forms.

“Simply put, we don't know what differences to look for, the possibilities are bewildering,” says Professor Victor Panaretos at EPFL’s Institute of Mathematics.

To solve the problem, mathematicians have developed the so-called “kernel methods”, which have emerged as powerful solutions, widely used in fields such as genomics, finance, and artificial intelligence.

In a new study, Panaretos, with mathematicians Leonardo Santoro (EPFL) and Kartik Waghmare (ETH Zurich), have found a mathematical explanation for the remarkable performance of kernel methods, which until now lacked a clear theoretical foundation. Published in PNAS, the work introduces a theorem that clarifies why kernel methods perform so well, potentially helping to improve their design.

“We show that these methods transform even very subtle differences between probability distributions into a form of maximal separation,” says Panaretos. “As a corollary, we also found that performance can be boosted substantially when informed by our theorem.”

The “kernel trick”
“Kernel methods transform data into a new form where differences become easier to detect,” explains Panaretos. “This is often called the ‘kernel trick’.”

The EPFL team pushed this idea further. Instead of applying the kernel trick and then comparing datasets using simple summaries like averages, they compared them through a richer mathematical geometry that captures more of their underlying structure.

“The classic approach takes the data, X, and transforms it yielding transformed data Y,” explains Panaretos. “Then you look at the structure of Y through the prism of a ‘standard geometry’, like the Euclidean geometry of the world we live in. “But what we realized is that there is a much richer geometry one could use that clearly reveals patterns, even intricate ones in Y. This richer geometry is more complex but using it ultimately boils down to calculating summaries like averages, and yet much more effective.”

This change of perspective explained how even the smallest differences between datasets can be magnified so that they can no longer be confused, providing a rigorous explanation for the empirical success of kernel methods.
The study also shows that current approaches can be improved, because they are not based on criteria that are geared to harness the separation effect, offering guidance for designing even more powerful statistical tools.

Considering the widespread use of kernel methods and the ubiquity of the two-sample problem, the findings potentially have broad implications across science and technology. By clarifying how kernel methods distinguish patterns in complex data, the research could enhance machine learning, data science, and statistical inference in several areas.

“Beyond the technical contribution, the result can be stated in a fairly simple and striking way, highlighting how seemingly abstract features of infinite-dimensional geometry can have concrete implications for modern data science,” says Panaretos.

Other contributors
ETH Zürich Department of Mathematics

Funding
Swiss National Science Foundation
Leonardo V. Santoro, Kartik G. Waghmare, Victor M. Panaretos. Kernel Embeddings and the Separation of Measure Phenomenon. PNAS 05 June 2026. DOI: 10.1073/pnas.2522504123
Regions: Europe, Switzerland
Keywords: Science, Mathematics

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Témoignages

We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet
AlphaGalileo is a great source of global research news. I use it regularly.
Robert Lee Hotz, LA Times

Nous travaillons en étroite collaboration avec...


  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2026 by DNN Corp Terms Of Use Privacy Statement