Learning Prior Distribution Behind Relational Tables
en-GBde-DEes-ESfr-FR

Learning Prior Distribution Behind Relational Tables

02/04/2026 Frontiers Journals

Synthesizing tables—creating artificial datasets that closely resemble real ones—plays a crucial role in supervised machine learning (ML), with a wide range of practical applications. These include data augmentation, where synthetic data enhances training datasets, and the publication of fake tables that maintain the privacy of real data. A core challenge is: given a real table, can we generate a synthetic version that allows ML models, trained on either the real or synthetic table, to perform similarly on an unseen test set?

Most existing approaches to table synthesis, typically employing deep generative models like GANs (Generative Adversarial Networks) or VAEs (Variational Autoencoders), focus on learning the data distribution of a real table from sampled records. However, these methods often treat records independently, neglecting potential correlations between them. In practice, this assumption is frequently violated. For example, purchase records for the same product are likely to be correlated, and failing to capture such relationships can result in synthetic tables that differ significantly from the real data. This discrepancy in data structure can lead to poor performance when ML models are trained on synthetic tables compared to real ones.

To solve the problems, a research team led by Yaoyu ZHU published their new research on 15 March 2026 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.

Their method explicitly models record correlations by grouping data based on user-defined categorical values (e.g., records with the same (Market, Product) pair should belong to the same group). By leveraging these groups, the team applies conditional GANs to model both discrete (categorical) and continuous (numerical) values within each record, ensuring that both global (overall table) and local (within-group) data distributions are preserved in the synthetic table.

In addition, the team extends previous work on differentially private GANs (DPGANs), which only ensured privacy for the discriminator, by further securing the privacy of original data embeddings and sample frequencies. This added layer of protection ensures that the synthetic data not only retains its usefulness but also guarantees stronger privacy protections.

Experimental results demonstrate that this approach significantly outperforms current state-of-the-art table synthesis methods for supervised ML tasks, offering both high utility and robust privacy protection.
DOI:10.1007/s11704-025-40424-2

Archivos adjuntos
  • 597898711.png
02/04/2026 Frontiers Journals
Regions: Asia, China
Keywords: Applied science, Computing

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Testimonios

We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet
AlphaGalileo is a great source of global research news. I use it regularly.
Robert Lee Hotz, LA Times

Trabajamos en estrecha colaboración con...


  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2026 by DNN Corp Terms Of Use Privacy Statement