HomeCryptocurrency NewsMystery Model wins Alpha Arena Season 1.5

Mystery Model wins Alpha Arena Season 1.5

Posted: 6 December 2025, 20:00 CET | Updated: 7 December 2025, 19:17 CET 3 min read

The latest Alpha Arena AI trading competition moved to stock markets and crowned a new champion. Mystery Model (which turned out to be Grok 4.20) delivered 12.11% returns over two weeks, earning $4,844 across four separate contests. Crypto season champion Qwen crashed on US equities.

Mystery Model – revealed to be the new Grok 4.20 release – beat seven established competitors across four parallel trading formats, including Season 1 winner Qwen 3 Max.

The two-week tournament tested whether crypto-focused trading models could adapt to the very different structure of regulated stock markets. Most of them struggled. Mystery Model’s across-the-board win – and the losses from Qwen, GPT-5, and Gemini – highlight how unfamiliar market rules can trip up even strong models.

Season 1.5 of Alpha Arena has officially ended !

– Mystery Model (a.k.a GROK 4.20) is the winner, up 12% on avg.

– Not only did it win, it made money in all four competitions

– GPT5.1 🥈 came in 2nd, and Gemini 3 🥉 3rd

– All trades & model outputs are 100% verifiable 👇 pic.twitter.com/cQOIjDAOob
— Jay A (@jay_azhang) December 5, 2025

Stocks hit harder than crypto

Crypto markets run 24/7 with high leverage and price action driven by short-term sentiment. US equities work differently: fixed trading hours, earnings-driven volatility, deep institutional liquidity, and fundamental valuation systems.

Season 1.5 kicked off November 19, 2025. Eight AI models each got $10,000 in real capital to trade Tesla, Nvidia, Microsoft, Amazon, and the Nasdaq-100 Index.

Qwen – the crypto competition winner – dropped to sixth place on Day 1 and never recovered. Mystery Model stayed consistent across all four formats and walked away with $4,844 in profits.

Four contests, four different skills

Season 1.5 ran four separate competitions, each testing different trading capabilities:

Contest 1: New Baseline – enhanced data infrastructure including news feeds, macro sentiment, company fundamentals, order book depth, and market microstructure. Models could add to existing positions, scaling winners or averaging down losers. Temperature set to standard creativity levels.

Contest 2: Monk Mode – radically shortened prompts (roughly 50% shorter than baseline) with optional risk management guardrails. “Do nothing” counted as a valid action, testing whether models could resist overtrading.

Contest 3: Situational Awareness – models received live ranking updates (theirs and competitors’), adding a tactical layer: current rank, competitor positions, and other models’ P&L figures. The goal shifted from pure profit maximization to winning the competition, mimicking how hedge fund managers adjust tactics when competing for investor capital.

Contest 4: Max Leverage – full leverage (20x) required on every position. This stress-tested risk management, stop-loss placement, and adaptation to high-risk trading.

Mystery Model dominated all four formats with a 12.11% return. Not eye-popping by crypto standards, but solid for equities.

Who is a Mystery Model?

The winning AI turned out to be the new Grok 4.20 release, confirmed by both the organizers and Elon Musk.

elon just confirmed the "mystery AI model" is actually grok 4.20

apparently, the model is powerful for finance.

what i love about grok 4.1 is how well it follows instructions — even with over 200k context, it handles everything in seconds without hallucinating

grok 4.20 is… pic.twitter.com/Yqvc4Kl5AX
— Ali Hasnain (@a1i_hasnain) December 5, 2025

In Season 1, Qwen and DeepSeek dominated while models from OpenAI, Google, and Anthropic struggled. Grok broke that pattern, and with a low randomness factor – winning all four formats makes it hard to dismiss the new champion.

AI on financial markets

Alpha Arena became the first benchmark measuring AI investing capabilities with real capital instead of simulations. The 12.11% two-week return beat most human day traders and many algorithmic strategies over the same period.

But the performance gap (+12.11% for Mystery Model versus down to -70% for Qwen and DeepSeek) still shows current AI systems don’t have strong inherent trading abilities. Architecture, training data, and solid prompt engineering will matter going forward. And those questions get answered by humans.

What’s next

Alpha Arena organizers aren’t hiding that future seasons will keep testing AI trading capabilities across different asset classes and market conditions. The nof1 founder already announced the concept will expand.

Will share timing on that soon
— Jay A (@jay_azhang) December 5, 2025

What could that look like? After crypto and stocks, future iterations could explore forex, commodities, or multi-asset portfolios. Either way, seeing those boundaries mapped out in real markets is worth watching.

The material on GNcrypto is intended solely for informational use and must not be regarded as financial advice. We make every effort to keep the content accurate and current, but we cannot warrant its precision, completeness, or reliability. GNcrypto does not take responsibility for any mistakes, omissions, or financial losses resulting from reliance on this information. Any actions you take based on this content are done at your own risk. Always conduct independent research and seek guidance from a qualified specialist. For further details, please review our Terms, Privacy Policy and Disclaimers.