In the rapidly evolving landscape of digital communication, video traffic has surged exponentially, driving the need for more efficient and adaptable video communication systems. Traditional video communication systems, however, are increasingly strained by limitations in data compression, high energy consumption, and narrow service scopes. Against this backdrop, a novel concept known as “generative video communication” has emerged, offering a transformative approach that leverages the latest advancements in artificial intelligence to enhance video content expression and transmission efficiency.
A recent paper published in
Engineering titled “Generative Video Communications: Concepts, Key Technologies, and Future Research Trends” by Wenjun Zhang, Guo Lu, Zhiyong Chen, and Geoffrey Ye Li explores this innovative paradigm in detail. The authors propose that generative video communication integrates generative AI technologies with traditional discriminative video communication methods to address the multidimensional constraints of current systems. This integration aims to improve video communication by enabling new gains in the cognitive domain, complementing existing frameworks, and offering more efficient, adaptable, and immersive video services.
The core characteristics of generative video communication include the use of discriminative technology as a cornerstone to ensure reliability and interpretability, while generative technology enhances expressiveness by uncovering underlying patterns in video content. This paradigm is not intended to replace traditional systems but to evolve them, maintaining reliability while incorporating generative capabilities. The authors outline three levels of generative video communication based on the ratio of generative to discriminative information: full-reference generation, semi-reference generation, and no-reference generation. Each level caters to different scenarios and network conditions, optimizing the balance between compression efficiency, visual quality, and system stability.
To realize this new paradigm, the paper proposes several key technical pathways. Elastic encoding, for instance, involves efficiently representing both reference and generative information through adaptive strategies such as full-reference, semi-reference, and no-reference generation. Collaborative transmission explores methods to transmit compressed bitstreams collaboratively, leveraging elastic transmission, scalable transmission, and generative reception technologies. Utility evaluation, on the other hand, focuses on assessing generated content based on consistency, rationality, and usability, using advanced frameworks such as multimodal large models.
The potential applications of generative video communication are vast, ranging from task-oriented and immersive communication to scenarios requiring low-bitrate transmission. For example, in extremely bandwidth-constrained environments, no-reference generation can transform visual signals into compact textual representations and back into high-quality video content. This capability makes it possible to transmit video signals even under severe transmission constraints, preserving the core communicative intent.
Despite its promising prospects, generative video communication faces several challenges. Model reliability is a key concern, as the uncertainty of generative outputs raises issues of consistency and controllability. Additionally, real-time generation and decoding impose significant computational costs, especially for resource-constrained receivers. The authors suggest that future research should focus on constructing a foundational theory of generative video communication from an information-theoretic perspective, optimizing the design of generative models for real-time processing, and developing intelligent transmission strategies to adapt to heterogeneous device capabilities and fluctuating network conditions.
As the demand for immersive and personalized video communication continues to grow, generative video communication represents a significant step forward in redefining the objectives and implementation strategies of video communication. By integrating reference sampling with generative information, this paradigm not only enhances the expressive capabilities of video content under limited bitrates but also paves the way for future innovations in 6G networks and beyond.
The paper “Generative Video Communications: Concepts, Key Technologies, and Future Research Trends,” is authored by Wenjun Zhang, Guo Lu, Zhiyong Chen, Geoffrey Ye Li. Full text of the open access paper:
https://doi.org/10.1016/j.eng.2025.06.018. For more information about
Engineering, visit the website at
https://www.sciencedirect.com/journal/engineering.