AI BENCHY Compare
Inception: Mercury 2 vs xAI: Grok 4.20
Last updated at: 2026-04-02
| Metric | Mercury 2 Mercury 2 medium | Grok 4.20 Grok 4.20 none |
|---|---|---|
| Score | 6.3 | 5.4 |
| Rank | #51 | #69 |
| Consistency | 8.5 | 9.5 |
| Tests Correct | ||
| Attempt pass rate | 51.0% | 31.4% |
| Flaky tests | 3 | 1 |
| Total Runs | 51 | 51 |
| Cost per result | 0.634 | 1.809 |
| Total Cost | $0.045 | $0.091 |
| Input Price | $0.250 / 1M | $2.000 / 1M |
| Output Price | $0.750 / 1M | $6.000 / 1M |
| Output Tokens | 3,723 | 1,655 |
| Reasoning Tokens | 46,120 | 0 |
| Response Time (avg) | 2.25s | 1.11s |
| Response Time (max) | 14.63s | 6.04s |
| Response Time (total) | 35.99s | 18.80s |
Score vs Total Cost
Response Time (avg)
Score vs Response Time (avg)
Total Output Tokens
Score vs Total Output Tokens
Category Breakdown
Quick Compare
Switch Comparison Pair
DeepSeek V3.2nonevsMercury 2mediumMercury 2mediumvsMiMo-V2-OmninoneMistral Small 4mediumvsGrok 4.20noneMercury 2mediumvsQwen3.5-FlashnoneMercury 2mediumvsGLM 5V TurbononeSeed-2.0-LitenonevsMercury 2mediumMiniMax M2.7mediumvsGrok 4.20noneGemini 2.5 FlashnonevsMercury 2mediumMercury 2mediumvsQwen3.5-35B-A3BnoneMercury 2mediumvsGLM 5noneGemma 4 31BnonevsMercury 2mediumMercury 2mediumvsHunter Alphanone