Two AI models dominate Alpha Arena trading competition

Two AI models have emerged as clear leaders in the Alpha Arena live trading experiment. Competition frontrunners demonstrate rare qualities like position management discipline and the ability to recognize losses.
The Alpha Arena AI trading experiment enters its decisive phase (the project finale is scheduled for November 3, 2025). After early chaos that erased half the capital of several models, clear winners and losers have emerged.
The strategies of two dominant AI models survived the Fed volatility on October 29. The correlation between leader yield curves is also visible to the naked eye. Meanwhile, Gemini 2.5 Pro and GPT-5 remain the competition's most notable failures.
The live trading experiment launched weeks ago with ambitious claims about AI's ability to autonomously navigate cryptocurrency markets. Reality showed more nuances than expected.
Leader strategy breakdown
The two current leaders – DeepSeek ChatV3.1 and QWEN3 Max – show simultaneously disciplined and aggressive trading patterns. The bots can hold positions for several consecutive days. At the same time, very active use of available funds is visible.
For example, DeepSeek engaged over 50% of its available $17,600 in margin for open trades, opening 6 positions simultaneously with 10x leverage.
As can be seen, all positions are long. This explains the substantial equity drawdown (from a peak of $22,000 to $17,600) after yesterday's disappointing crypto market reaction to Fed Chair Jerome Powell's press conference.
Leaders maintain active positions across BTC, ETH, and select altcoins. Position duration varies from several hours to multi-day holds. It appears the leading models adapt timeframe work based on market conditions rather than following rigid strategies.
The way they allocate capital points to professional risk management. Instead of going all-in on high-probability trades, DeepSeek and QWEN3 typically spread 40-60% of available capital simultaneously across 3-5 positions. This diversification protected them during volatility periods.
Stop-loss discipline
Almost all models showed the ability to recognize losses. Judging by the number of closed trades with small losses, either stop-losses are applied or defective positions are immediately closed before drawdowns compound. If you're a novice trader, you should know from your own experience how difficult such decisions are for the human brain.
Moreover: leader DeepSeek has losses exceeding $1,000, which even losing models may not have. But then why do we see such a large difference in results?
Trading frequency appears to be a classic mistake: models in drawdown (Gemini 2.5 Pro and GPT-5) have already made 84 to 100 trades, while leaders range from 20 to 40. This is a good reminder of the old "art of doing nothing in the market." It very much appears that DeepSeek and QWEN3 managed to grasp this zen.
Volatility test after Fed statements
The Federal Reserve press conference on October 29 sharply increased market volatility: Bitcoin dropped 4% within hours, while altcoins experienced sharper moves.
Absolutely all AI models entered drawdown after Powell's statement. But their reactions differed again. Failures panicked, either chaotically closing all positions or stubbornly holding them. Leaders reduced risk through position size reduction.
Buy-and-hold remains profitable strategy
The "buy-and-hold" tracker, included as baseline comparison, remains profitable despite all events throughout the experiment. This simple strategy involves purchasing BTC at the start of the competition and passive holding without any additional activity.
Active trading by AI leaders currently exceeds this strategy's returns, partially justifying the complexity and risks. But speaking of outsiders, they've lost up to 75% of deposits. Interpret the results as you wish.
GPT-5 and Gemini 2.5 Pro: anatomy of failure
GPT-5 and Gemini 2.5 Pro models entered the competition with significant expectations given Google's and OpenAI's AI capabilities. However, their trading performance proved catastrophic, making them the competition's most notable casualties.
Trade analysis reveals poor entry timing (buying near local tops and selling at supports), arbitrary position sizing (some trades used 5% capital, others up to 40%), and absence of basic profitable trading strategy (with relatively small individual losses, capital was lost to critical levels in less than two weeks).
The failure shows that branded AI doesn't guarantee success in financial markets. Predicting market price movements requires different capabilities than processing information arrays or general reasoning tasks (where these models typically excel).
AI bloggers
An interesting fragment of Alpha Arena is that each model maintains its own blog, commenting on its actions and decisions.
For example, failure Gemini shows itself as quite a stubborn "bear": sitting in deep drawdown, it writes that it maintains conviction in the market's bearish character, holding 6 profitable short positions simultaneously.
Track live results at nof1.ai
Recommended