💥 Gate Square Event: #PostToWinTRUST 💥
Post original content on Gate Square related to TRUST or the CandyDrop campaign for a chance to share 13,333 TRUST in rewards!
📅 Event Period: Nov 6, 2025 – Nov 16, 2025, 16:00 (UTC)
📌 Related Campaign:
CandyDrop 👉 https://www.gate.com/announcements/article/47990
📌 How to Participate:
1️⃣ Post original content related to TRUST or the CandyDrop event.
2️⃣ Content must be at least 80 words.
3️⃣ Add the hashtag #PostToWinTRUST
4️⃣ Include a screenshot showing your CandyDrop participation.
🏆 Rewards (Total: 13,333 TRUST)
🥇 1st Prize (1 winner): 3,833
AI Model Arena: An In-Depth Perspective on the NoF1 Live Portfolio Trading Competition
On October 18, the AI research laboratory focused on financial markets, nof1, launched an unprecedented experiment: six top global AI models—GPT-5, Gemini 2.5 Pro, Grok-4, Claude Sonnet 4.5, DeepSeek V3.1, Qwen3 Max—each managing $10,000 in real funds on Hyperliquid to conduct live cryptocurrency trading.
Current rankings and account values: As of the evening of October 30, the latest standings are:
Compared to data from a few days ago, these rankings have undergone dramatic changes. DeepSeek remains in the lead, but its yield has sharply retreated from 95.71% to 56.71%, with account value dropping from $19,570 to $15,671, evaporating nearly $4,000. Qwen3 also experienced a decline, from 53.68% to 25.20%. Notably, Claude Sonnet 4.5 shifted from a slight profit to a loss of 7%, while GPT 5’s loss expanded further to 72%, nearing liquidation.
Reading the Market Curve: The Evolution of Three Phases
Phase One (October 18-25): Uptrend, Strategy Divergence Emerges
The market was in an upward channel, with strategy differences among models beginning to show:
Phase Two (October 26-28): Accelerated Rise, Peak Formation
Phase Three (October 29-30): Market Pullback, Risk Control Revealed
Deepening Issues Revealed by the Decline
1. The Double-Edged Sword of “Trend Following”
DeepSeek’s success was based on “trend following”: 95% of the time, it went long, trusting the trend would continue. During an uptrend, this strategy yielded a maximum return of 95%. But when the trend reversed, the same approach caused a 30% loss.
This exposes a key problem: Trend-following strategies require effective take-profit and stop-loss mechanisms. If you only “let profits run” without “cutting losses,” a major reversal can wipe out most gains.
DeepSeek may have overly believed in the value of “long-term holding,” neglecting market uncertainty. Its largest single profit of $7,378 came from a 60-hour ETH trade, reinforcing its “long-termism.” But markets are not one-way streets; trends can reverse at any time.
2. Holding Cash Is a Form of Wisdom and Protection
Qwen3 demonstrated the value of holding cash. Its 82.4% cash during rising phases might seem like “missing opportunities,” but during declines, it prevented losses.
A 26% drawdown versus 32% shows only a 6 percentage point difference, but compounded over time, this gap widens. More importantly, Qwen3 preserved principal and psychological advantage, enabling quick re-entry when the market stabilizes. DeepSeek, if it continues to decline, risks falling into a “floating loss–hesitation–missed rebound” vicious cycle.
3. The Resilience of Simple Strategies
The performance of BTC Buy & Hold is a slap in the face for all “smart” AI models. This strategy involves no technical analysis, no complex algorithms, no frequent rebalancing, yet it ranks third, outperforming half of the AI models.
This result reminds us: in trading, avoiding mistakes can be more valuable than making many correct trades. Gemini lost 66% over 193 trades, while BTC Buy & Hold made zero trades and preserved principal. Who is more successful? The answer is obvious.
4. Lack of Risk Management
Except for Qwen3, nearly all AI models exposed serious risk management flaws:
This shows that while these AI models can “understand” market data and “execute” trades, their core risk management capabilities are still immature.
Limitations of the Experiment: Cold Reflection Beyond Data
After reviewing the data and analysis, it’s tempting to focus on DeepSeek’s 56% yield or Gemini’s 66% loss. But before drawing conclusions, we must acknowledge the systemic limitations of this experiment—these may be more important than the results themselves.
1. The Time Frame Is Too Short: 12 Days Cannot Reveal the Truth
This experiment lasted only 12 days, from October 18 to 30. What does 12 days mean in the crypto market? Likely just a fragment of a full bull-bear cycle.
The observed “rise–peak–pullback” is a complete mini-cycle, but it could be luck. If the experiment started at a market top or encountered a sudden 30% crash like the “519 event,” the rankings could be completely reversed.
DeepSeek’s 56% return may heavily depend on this short-term market behavior. Its 95% long position strategy excels in a bullish trend but would be eaten away by fees and repeated stop-losses during sideways or bear markets.
Similarly, Qwen3’s 82% cash during sideways markets is advantageous, but in a 2021 bull run, it would underperform, missing out on large gains. A BTC bull market from $10,000 to $100,000 with 80% cash means only capturing 20% of the rise.
12 days of data are insufficient to validate any long-term strategy.
2. Same Prompt, Different AI: Bound by the Same Data
All six AI models received identical market data and trading instructions. It’s like six fund managers analyzing the same research report—what’s being tested isn’t their research ability but their execution discipline.
In real trading, alpha comes from asymmetric information. Top quant funds have exclusive on-chain tracking, whale transfer insights, and off-chain large order flow data to anticipate institutional moves.
But in this experiment, all AI models saw the same information. It’s more a “execution competition” than a “strategy innovation” contest.
We cannot determine, from this setup, who would win if DeepSeek had exclusive on-chain data or Gemini had proprietary Twitter sentiment analysis.
3. Capital Scale Distortion: The $10,000 Fairy Tale
Each AI only managed $10,000. This is a tiny amount on Hyperliquid—you can enter and exit freely, slippage is negligible, liquidity impact is nonexistent, and large orders can be split without concern.
But in real quantitative trading, managing $10 million versus $10,000 is a different universe.
This experiment tests “small capital flexibility,” not “scalable strategy robustness.”
4. Market Environment Luck: No True Hell Encountered
During the experiment, market volatility was moderate. We did not see:
All AI risk controls have not been tested under extreme stress, which is what real crypto traders face. How would DeepSeek’s stop-loss work during a “limit down” scenario? We don’t know. Would Qwen3’s quick close work if the exchange crashes? Uncertain.
Luck plays a significant role in this 12-day experiment.
5. The Randomness of a Single Experiment: No Second Season for Validation
This is a one-off test; there’s no “second season” to verify strategy stability. We cannot answer:
The current results are more like six people rolling dice, with DeepSeek rolling the highest. But that doesn’t mean its dice are better—just luckier.
So, How Should We View These Rankings?
After considering these limitations, you might ask: does this experiment have any meaning?
Yes, but not in terms of “who is the champion.” Its true value is showing us:
But if you see DeepSeek in first place and decide to entrust it with your funds or copy its approach, you’re making a mistake.
A 12-day champion does not guarantee a 12-month champion; managing $10,000 doesn’t mean managing $1 million; current market winners don’t guarantee future success.
Investing has no simple answers. This experiment provides valuable data, but the limitations behind the data may be more worth pondering than the data itself.
This report’s data was edited and compiled by WolfDAO. For questions, contact us for updates.
Written by: Riffi / WolfDAO( X: @10xWolfdao )