Designing molecules is one of chemistry’s most complex challenges. From life-saving drugs to advanced materials, each compound requires a precise sequence of reactions. Planning these steps demands both technical knowledge and strategic insight, making it a task that often relies on years of experience.
Two problems plague much of modern chemistry. The first is retrosynthesis: Chemists start from a target molecule and work backward to identify simpler building blocks and viable reaction pathways. Retrosynthesis involves countless decisions, from choosing starting materials to determining when to form rings or protect sensitive functional groups. While computers can explore vast “chemical spaces", they often struggle to capture the strategic reasoning used by human experts.
The second problem is reaction mechanisms. These describe how chemical reactions unfold step by step through movements of electrons. Mechanistic insight helps scientists predict new reactions, improve efficiency, and reduce costly trial and error. Existing computational methods can generate many possible pathways, but often lack the chemical intuition needed to identify the most plausible ones.
A study led by
Philippe Schwaller at EPFL has now developed a new approach that uses large language models (LLMs) as reasoning engines for chemistry. Instead of generating chemical structures directly, the models evaluate and guide traditional computational tools.
The framework, called Synthegy, combines established search algorithms with artificial intelligence capable of interpreting chemical strategies expressed in natural language.
“When making tools for chemists, the user interface matters a lot, and previous tools relied on cumbersome filters and rules,” says Andres M Bran, the first author of the Synthegy paper published in
Matter. “With Synthegy, we're giving chemists the power to just talk, allowing them to iterate much faster and navigate more complex synthetic ideas.”
Synthegy on retrosynthesis
Synthegy begins with a target molecule and a plain-language instruction from the user. For example, a chemist may request early formation of a specific ring or ask to avoid unnecessary protecting groups. Traditional retrosynthesis software then generates numerous potential synthetic routes. Each route is translated into text and analyzed by a language model.
Synthegy evaluates how well each pathway aligns with the user’s goals, assigns scores, and explains its reasoning. This process allows researchers to rank and filter candidate routes efficiently. By steering computational searches with natural-language queries, chemists can focus on strategies that best meet their objectives.
Synthegy on reaction mechanisms
Synthegy addresses reaction mechanisms in a similar way: It breaks reactions into elementary electron movements and explores multiple possibilities. The LLM assesses each step, guiding the search toward chemically plausible mechanisms. Additional information, such as reaction conditions or expert hypotheses, can also be incorporated as text.
In synthesis planning, Synthgey successfully identified routes that match complex strategic requests. In a double-blind expert study, 36 chemists provided 368 valid evaluations, and their judgments aligned with the system’s assessments 71.2% of the time on average. The framework can detect unnecessary protecting steps, assess reaction feasibility, and prioritize efficient pathways.
Synthegy shows that LLMs can analyze chemistry across multiple levels. They can interpret functional groups, assess individual reactions, and evaluate complete synthetic pathways. Larger and more advanced models demonstrate the strongest performance, while smaller models show limited capability.
The work redefines how AI can support chemistry. By positioning LLMs as evaluators rather than generators, the Synthegy approach allows chemists to express their goals in plain language and receive strategically relevant solutions. The technology could accelerate drug discovery, improve reaction design, and make advanced computational tools more accessible to researchers.
"The connection between synthesis planning and mechanisms is very exciting: we usually use mechanisms to discover new reactions that enable us to synthesize new molecules,” says Andres M Bran. “Our work is bridging that gap computationally through a unified natural language interface.”
Other contributors
- National Centre of Competence in Research Catalysis (NCCR Catalysis)
- b12 Labs
Funding
- Swiss National Science Foundation
- NCCR Catalysis
- Intel and Merck KGaA AWASES program