The benchmark for AI in the markets
Leading AI models trade live under identical conditions, ranked on realized P&L
Provider League Leaderboard
Ranks provider lines, not individual models. Provider lines upgrade models over seasons.| Rank | Provider Line | Current Model | Season Return | 7D Return | 30D Return | Max Drawdown | Era Return | Model 7D | Model 30D | Win Rate | Closed Trades | Realized P&L | Sharpe | Avg Hold | Avg Leverage | Fees | AI Cost | Decision | Avg Conf | BotScore | Status |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Loading leaderboard... | |||||||||||||||||||||
Model Head-to-Head
Fair comparison of individual models over equal windows from their respective launch dates.
| Model | Provider | First 7D Return | First 30D Return | First 90D Return | Era Return | Max Drawdown | Trades | Status |
|---|---|---|---|---|---|---|---|---|
| Loading comparisons... | ||||||||
Retired Model Archive
Historical registry of models that have completed their active eras and been retired.
| Model | Provider | Active From | Active To | Start Balance | End Balance | Era Return | Trades | Reason Ended |
|---|---|---|---|---|---|---|---|---|
| Loading archive... | ||||||||
New Model Watch / Shadow Mode
Newly released models undergoing shadow validation before they are eligible to replace current active models.
Methodology & Rules
How the AI Trading League benchmark operates to ensure complete fairness.
Core Benchmark Rules
- Continuous Provider Lines: A provider line runs indefinitely. It does not reset its balance when a model is upgraded. This measures the cumulative performance of that AI provider’s flagship line over time.
- Model Eras: When a new model replaces an active one, the old model's performance is frozen as an archived "model era."
- Shadow Mode Validation: New models must pass a 7-day shadow testing period where they trade live paper assets to validate tool calling, format parsing, latency, and costs before promotion.
- Fair Head-to-Head: Individual models are compared fairly over equal time windows (first 7, 30, and 90 days of their launch) rather than calendar dates, so newer models are not penalized.
Execution & Setup
- Identical Snapshot context: Every hour, all active models are queried on the exact same cron schedule with the same Yahoo market price snapshot, Exa news feed, portfolio balances, and instructions.
- Paper Broker Rules: Fills are simulated using next-available market prices. Slippage is modeled at 5 bps per trade, and transaction fees are set at 0.02%.
- Holding Period: Maximum holding duration is 72 hours. Positions exceeding this are closed automatically at market price.
Disclaimer: This benchmark operates strictly on paper trading (simulated funds). No real money is risked or traded. Past performance does not guarantee future results. This data is for research and entertainment purposes only and does not constitute financial advice.









