This paper presents a large-scale benchmark of modern deep learning architectures—including Transformers, State-Space Models (Mamba), and xLSTM—for financial time-series prediction and position sizing. The study identifies that hybrid models like VSN+LSTM (VLSTM) and xLSTM-based variants achieve superior risk-adjusted performance (SOTA Sharpe Ratio of 2.40) over traditional linear baselines and generic deep learning models throughout 15 years of cross-asset data.
TL;DR
A massive 15-year benchmark from the University of Oxford reveals that the "bigger is better" Transformer philosophy fails in financial markets. Instead, models with adaptive gating and structured memory—specifically VLSTM (VSN + LSTM) and the new xLSTM—dominate the leaderboard with Sharpe Ratios exceeding 2.30, proving that denoised temporal representations are the secret sauce for Navigating "noisy" market regimes.
Background: The Financial "Noise" Problem
Most deep learning benchmarks focus on datasets like weather or electricity, where the signal is loud and clear. Finance is the opposite. It is a low signal-to-noise environment where patterns are fleeting. This paper asks: can modern heavyweights like Mamba2, PatchTST, and xLSTM actually survive a real-world trading backtest across Commodities, FX, Bonds, and Equities?
Methodology: Beyond Simple Forecasting
The authors didn't just predict price; they built an end-to-end portfolio optimization pipeline.
- Input: Statistical/technical indicators + Ticker Embeddings.
- Architecture: A variety of encoders (Linear, Transformer, SSM, Recurrent).
- Loss Function: Direct optimization of the Differentiable Sharpe Ratio, forcing the model to find risk-adjusted returns rather than just minimizing Mean Squared Error.
- Risk Control: Volatility targeting to equalize contributions across different assets.
Figure 1: The architecture transforms raw features into weights by maximizing the Sharpe Ratio.
Key Architectures Compared
1. The Comeback of Recurrence: xLSTM & VLSTM
While Transformers are the current trend, this study finds that Recurrent Neural Networks (RNNs) are far from dead.
- xLSTM: Uses exponential gating and matrix memory to prevent the "forgetting" issue of old LSTMs.
- VLSTM: Adds a Variable Selection Network (VSN) to "denoise" the input before it even hits the LSTM.
2. State-Space Models (Mamba & Mamba2)
Mamba offers linear scaling and "infinite" lookback. However, in this benchmark, it showed heterogeneous behavior, meaning it worked well in some years (2020) but struggled to match the consistent risk-adjusted returns of gated recurrent models.
3. Transformers (PatchTST & iTransformer)
PatchTST breaks time series into "patches" to smooth noise. While it performs better than basic Transformers, it often lacks the stable "state" required to handle market regime shifts compared to VLSTM.
Experimental Results: The Leaderboard
The results were striking. VLSTM and LPatchTST (a hybrid of LSTM and Patching) were the clear winners.
| Strategy | Sharpe Ratio (2010-2025) | CAGR | Max Drawdown | | :--- | :--- | :--- | :--- | | VLSTM | 2.40 | 26.3% | -22.9% | | LPatchTST | 2.31 | 25.5% | -17.4% | | xLSTM | 1.80 | 19.3% | -14.1% | | Mamba2 | 0.78 | 5.8% | -26.3% | | AR1x (Linear) | 0.77 | 8.1% | -16.7% |
Depth Insight: Transaction Cost & Efficiency
One of the most valuable parts of this paper is the Breakeven Transaction Cost analysis. A model can have a high Sharpe Ratio but trade so frequently that costs eat all profits.
- xLSTM showed the most "signal-to-trade" efficiency, maintaining a higher cost buffer (breakeven cost) in liquid contracts compared to others.
Figure 2: Cumulative PnL paths. Notice the stability of the sequence-based models compared to the flat performance of linear baselines.
Critical Analysis & Takeaways
- Inductive Bias > Model Size: Financial data is too small and noisy for "foundation model" scaling to work out-of-the-box. The "statefulness" of LSTMs provides an Inductive Bias that filters noise better than global attention.
- Hybrids are King: Combining feature selection (VSN) with sequence modeling (LSTM/xLSTM) provides two layers of denoising—essential for survival in non-stationary markets.
- The "Why" behind xLSTM's Success: xLSTM's exponential gating allows it to "ignore" high-frequency noise while latching onto rare but powerful economic signals that standard sigmoidal LSTMs would saturate and forget.
Conclusion
This benchmark provides a sobering reality check for AI in Finance: theoretical efficiency (like Mamba's scaling) does not always lead to empirical profitability. For practitioners, the move is toward Hybridized Recurrent Architectures that prioritize stability and feature-selection over the raw complexity of generic Transformers.
Are you still using standard LSTMs for Alpha generation? It might be time to look at VSN+xLSTM.
