#24
X AI ยท Release: 2026-03-12 ยท x-ai/grok-4.20-beta::medium
Flaky tests
2
Flaky tests had mixed outcomes across runs (at least one pass and one fail).
Charts
Choose the first model, then click a second model to open a side-by-side page.
Avg Score vs Total Cost
Response Time (avg)
Avg Score vs Response Time (avg)
Total Output Tokens
Avg Score vs Total Output Tokens
Quick Compare
Grok 4.20 BetamediumvsMiMo-V2-FlashmediumGrok 4.20 BetamediumvsGemini 3 Flash PreviewnoneGrok 4.20 BetamediumvsSeed-2.0-MinimediumGrok 4.20 BetamediumvsGPT-5.3 ChatnoneGrok 4.20 BetamediumvsQwen3.5-FlashmediumGrok 4.20 BetamediumvsGemini 3 Flash PreviewmediumGrok 4.20 BetamediumvsGemini 3.1 Pro PreviewmediumGrok 4.20 BetamediumvsStep 3.5 FlashmediumFree Available
Category Breakdown
| Category | Avg Score | Consistency | Tests Correct |
|---|---|---|---|
| Anti-AI Tricks | 7.0 | 7.2 | |
| Combined | 10.0 | 10.0 | |
| Data parsing and extraction | 9.9 | 10.0 | |
| Domain specific | 4.0 | 10.0 | |
| General Intelligence | 10.0 | 10.0 | |
| Instructions following | 9.0 | 10.0 | |
| Puzzle Solving | 7.0 | 7.2 | |
| Tool Calling | 10.0 | 10.0 |