Turning PC and Mobile Devices into AI Infrastructure, Reducing ChatGPT Costs​
en-GBde-DEes-ESfr-FR

Turning PC and Mobile Devices into AI Infrastructure, Reducing ChatGPT Costs​


Until now, AI services based on Large Language Models (LLMs) have mostly relied on expensive data center GPUs. This has resulted in high operational costs and created a significant barrier to entry for utilizing AI technology. A research team at KAIST has developed a technology that reduces reliance on expensive data center GPUs by utilizing affordable, everyday GPUs to provide AI services at a much lower cost.

On December 28th, KAIST announced that a research team led by Professor Dongsu Han from the School of Electrical Engineering developed 'SpecEdge,' a new technology that significantly lowers LLM infrastructure costs by utilizing affordable, consumer-grade GPUs widely available outside of data centers.

SpecEdge is a system where data center GPUs and "edge GPUs"—found in personal PCs or small servers—collaborate to form an LLM inference infrastructure. By applying this technology, the team successfully reduced the cost per token (the smallest unit of text generated by AI) by approximately 67.6% compared to methods using only data center GPUs.

To achieve this, the research team utilized a method called 'Speculative Decoding.' In this process, a small language model placed on the edge GPU quickly generates a high-probability token sequence (a series of words or word fragments). Then, the large-scale language model in the data center verifies this sequence in batches. During this process, the edge GPU continues to generate words without waiting for the server's response, simultaneously increasing LLM inference speed and infrastructure efficiency.

Compared to performing speculative decoding solely on data center GPUs, SpecEdge improved cost efficiency by 1.91 times and server throughput by 2.22 times. Notably, the technology was confirmed to work seamlessly even under standard internet speeds, meaning it can be immediately applied to real-world services without requiring a specialized network environment.

Furthermore, the server is designed to efficiently process verification requests from multiple edge GPUs, allowing it to handle more simultaneous requests without GPU idle time. This has realized an LLM serving infrastructure structure that utilizes data center resources more effectively.

This research presents a new possibility for distributing LLM computations—which were previously concentrated in data centers—to the edge, thereby reducing infrastructure costs and increasing accessibility. In the future, as this expands to various edge devices such as smartphones, personal computers, and Neural Processing Units (NPUs), high-quality AI services are expected to become available to a broader range of users.

Professor Dongsu Han, who led the research, stated, "Our goal is to utilize edge resources around the user, beyond the data center, as part of the LLM infrastructure. Through this, we aim to lower AI service costs and create an environment where anyone can utilize high-quality AI."

Dr. Jinwoo Park and M.S candidate Seunggeun Cho from KAIST participated in this study. The research results were presented as a 'Spotlight' (top 3.2% of papers, with a 24.52% acceptance rate) at the NeurIPS (Neural Information Processing Systems) conference, the world's most prestigious academic conference in the field of AI, held in San Diego from December 2nd to 7th.

  • Paper Title: SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs
  • Paper Links: NeurIPS Link, arXiv Link

This research was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) under the project 'Development of 6G System Technology to Support AI-Native Application Services.'

Attached files
  • images0001170016.png
  • figure1.jpg
  • figure2.jpg
Regions: Asia, South Korea
Keywords: Applied science, Artificial Intelligence, Computing, Engineering, Technology

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Testimonials

For well over a decade, in my capacity as a researcher, broadcaster, and producer, I have relied heavily on Alphagalileo.
All of my work trips have been planned around stories that I've found on this site.
The under embargo section allows us to plan ahead and the news releases enable us to find key experts.
Going through the tailored daily updates is the best way to start the day. It's such a critical service for me and many of my colleagues.
Koula Bouloukos, Senior manager, Editorial & Production Underknown
We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet

We Work Closely With...


  • e
  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2025 by AlphaGalileo Terms Of Use Privacy Statement