A comprehensive review published in
Engineering traces the rapid evolution of artificial intelligence in finance, documenting how foundation models are transforming everything from market forecasting to regulatory compliance. The survey, conducted by researchers from Tsinghua University, E Fund Management, and The Hong Kong Polytechnic University, establishes a systematic taxonomy for understanding this emerging field while identifying critical gaps that remain before widespread deployment.
The researchers categorize financial foundation models into three distinct modalities. Financial language foundation models, exemplified by BloombergGPT with its 50 billion parameters trained on 363 billion financial tokens and 345 billion general-purpose tokens, process financial reports, news, and contracts for tasks including sentiment analysis and compliance checking. Financial time-series foundation models address sequential data such as price histories and order flows, with architectures ranging from scratch-trained transformers like MarketGPT to language-model adaptations such as Time-LLM that reprogram time series into natural language prompts. The third category, financial visual-language foundation models, enables joint processing of charts, tables, and textual narratives, as demonstrated by systems like FinLLaVA and FinTral that interpret Federal Open Market Committee projection charts and candlestick diagrams.
The survey documents a clear progression in capability. Early BERT-style models from 2019 have given way to generative architectures and, most recently, reasoning-enhanced systems employing chain-of-thought techniques and group relative policy optimization. The training methodologies follow established three-stage frameworks: pre-training on massive financial corpora, supervised fine-tuning with task-specific instruction datasets ranging from 30,000 to over 1.5 million examples, and alignment phases that incorporate regulatory constraints and factual accuracy requirements.
Dataset development mirrors this trajectory. The field has moved from small-scale English-centric collections such as the 4,840-sentence Financial PhraseBank to multilingual benchmarks including ICE-FLARE with 604,000 bilingual samples and FinMME with over 11,000 visual question-answering pairs. Recent benchmarks like FinTSB address realistic trading constraints across four market regimes including black swan events, though the researchers note most visual-language datasets remain limited to hundreds or thousands of samples.
Real-world applications are proliferating across four domains. For knowledge extraction, domain-specific models such as ICE-INTENT demonstrate superior bilingual financial understanding compared to general-purpose alternatives. Market prediction applications include value-at-risk forecasting using TimesFM and multimodal sentiment analysis for timing signals. Trading systems incorporate retrieval-augmented generation with compliance-aware checkers, while multi-agent simulations explore market microstructure and investor behavior modeling.
The survey identifies persistent obstacles that temper current capabilities. Data scarcity constrains multimodal model development, with financial visual data requiring precise numerical interpretation that generic encoders struggle to capture. Privacy regulations including GDPR and MiFID II limit access to proprietary transaction records and trading strategies, prompting exploration of federated learning approaches such as DPFinLLM that combine low-rank adaptation with differential privacy. Algorithmic challenges include hallucination risks in high-stakes financial contexts and lookahead bias from training on future information, addressed through temporally constrained datasets like TimeMachineGPT. Infrastructure demands present formidable barriers, with BloombergGPT's training estimated at 1.3 million GPU hours on NVIDIA A100s, driving interest in collaborative systems that pair large models with lightweight specialists for latency-critical applications.
The researchers emphasize that financial foundation models are not merely technical curiosities but practical tools requiring alignment with domain-specific behavioral expectations. Compliance-focused reward models and reasoning-centric reinforcement learning represent emerging approaches to bridge capability and trustworthiness. The survey concludes by noting that while general-purpose models currently dominate exploratory applications, domain-specific foundation models demonstrate clear advantages in structured reasoning and regulatory alignment, suggesting a trajectory toward specialized systems integrated into real-world financial infrastructure.
The paper “Advancing Financial Engineering with Foundation Models: Progress, Applications, and Challenges,” is authored by Liyuan Chen, Shuoling Liu, Jiangpeng Yan, Xiaoyu Wang, Henglin Liu, Chuang Li, Kecheng Jiao, Jixuan Ying, Yang Veronica Liu, Qiang Yang, Xiu Li. Full text of the open access paper:
https://doi.org/10.1016/j.eng.2025.11.029. For more information about
Engineering, visit the website at
https://www.sciencedirect.com/journal/engineering.