KAIST Develops an AI Semiconductor Brain Combining Transformer's Intelligence and Mamba's Efficiency​
en-GBde-DEes-ESfr-FR

KAIST Develops an AI Semiconductor Brain Combining Transformer's Intelligence and Mamba's Efficiency​


As recent Artificial Intelligence (AI) models’ capacity to understand and process long, complex sentences grows, the necessity for new semiconductor technologies that can simultaneously boost computation speed and memory efficiency is increasing. Amidst this, a joint research team featuring KAIST researchers and international collaborators has successfully developed a core AI semiconductor 'brain' technology based on a hybrid Transformer and Mamba structure, which was implemented for the first time in the world in a form capable of direct computation inside the memory, resulting in a four-fold increase in the inference speed of Large Language Models (LLMs) and a 2.2-fold reduction in power consumption.

KAIST (President Kwang Hyung Lee) announced on the 17th of October that the research team led by Professor Jongse Park from KAIST School of Computing, in collaboration with Georgia Institute of Technology in the United States and Uppsala University in Sweden, developed 'PIMBA,' a core technology based on 'AI Memory Semiconductor (PIM, Processing-in-Memory),' which acts as the brain for next-generation AI models.

Currently, LLMs such as ChatGPT, GPT-4, Claude, Gemini, and Llama operate based on the 'Transformer' brain structure, which sees all of the words simultaneously. Consequently, as the AI model grows and the processed sentences become longer, the computational load and memory requirements surge, leading to speed reductions and high energy consumption as major issues.

To overcome these problems with Transformer, the recently proposed sequential memory-based 'Mamba' structure introduced a method for processing information over time, increasing efficiency. However, memory bottlenecks and power consumption limits still remained.

Professor Park Jongse's research team designed 'PIMBA,' a new semiconductor structure that directly performs computations inside the memory in order to maximize the performance of the 'Transformer–Mamba Hybrid Model,' which combines the advantages of both Transformer and Mamba.

While existing GPU-based systems move data out of the memory to perform computations, PIMBA performs calculations directly within the storage device without moving the data. This minimizes data movement time and significantly reduces power consumption.

As a result, PIMBA showed up to a 4.1-fold improvement in processing performance and an average 2.2-fold decrease in energy consumption compared to existing GPU systems.

The research outcome is scheduled to be presented on October 20th at the '58th International Symposium on Microarchitecture (MICRO 2025),' a globally renowned computer architecture conference that will be held in Seoul. It was previously recognized for its excellence by winning the Gold Prize at the '31st Samsung Humantech Paper Award.' ※Paper Title: Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving, DOI: 10.1145/3725843.3756121

This research was supported by the Institute for Information & Communications Technology Planning & Evaluation (IITP), the AI Semiconductor Graduate School Support Project, and the ICT R&D Program of the Ministry of Science and ICT and the IITP, with assistance from the Electronics and Telecommunications Research Institute (ETRI). The EDA tools were supported by IDEC (the IC Design Education Center).

Attached files
Regions: Asia, South Korea, Europe, Georgia, Sweden, North America, United States
Keywords: Applied science, Artificial Intelligence, Computing, Engineering, Technology

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Testimonials

For well over a decade, in my capacity as a researcher, broadcaster, and producer, I have relied heavily on Alphagalileo.
All of my work trips have been planned around stories that I've found on this site.
The under embargo section allows us to plan ahead and the news releases enable us to find key experts.
Going through the tailored daily updates is the best way to start the day. It's such a critical service for me and many of my colleagues.
Koula Bouloukos, Senior manager, Editorial & Production Underknown
We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet

We Work Closely With...


  • e
  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2025 by AlphaGalileo Terms Of Use Privacy Statement