Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

xAI: Grok Build 0.1 vs Xiaomi: MiMo-V2.5

Last updated at: 2026-05-21

Metric Grok Build 0.1 Grok Build 0.1 medium Release: 2026-05-21 MiMo-V2.5 MiMo-V2.5 medium Release: 2026-04-22
Score 7.8 7.8
Rank #41 #37
Reliability 10.0 10.0
Consistency 8.9 8.6
Tests Correct
Attempt pass rate 71.9% 75.9%
Flaky tests 3 3
Total Runs 57 54
Cost per result 4.064 2.101
Total Cost $0.488 $0.253
Input Price $1.000 / 1M $0.400 / 1M
Output Price $2.000 / 1M $2.000 / 1M
Output Tokens 1,947 2,821
Reasoning Tokens 223,372 116,207
Response Time (avg) 22.28s 14.40s
Response Time (max) 88.28s 86.93s
Response Time (total) 423.30s 259.20s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 10.0 10.0 100.0% 0 5.46s 195 9,825
MiMo-V2.5 10.0 10.0 100.0% 0 4.14s 281 1,739
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 7.3 3.7 66.7% 1 30.98s 354 17,734
MiMo-V2.5 10.0 10.0 100.0% 0 31.48s 488 14,813
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 10.0 10.0 100.0% 0 30.81s 231 18,779
MiMo-V2.5 10.0 10.0 100.0% 0 16.86s 363 7,609
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 10.0 10.0 100.0% 0 7.76s 180 10,343
MiMo-V2.5 2.7 5.7 16.7% 1 6.33s 306 5,714
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 5.3 10.0 33.3% 0 77.75s 501 111,807
MiMo-V2.5 5.3 10.0 33.3% 0 34.53s 507 49,478
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 3.8 2.5 33.3% 1 10.14s 78 5,386
MiMo-V2.5 5.4 2.5 66.7% 1 5.37s 121 418
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 9.8 10.0 100.0% 0 9.62s 57 12,436
MiMo-V2.5 9.9 10.0 100.0% 0 1.80s 88 801
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 6.2 7.5 55.6% 1 8.67s 161 15,476
MiMo-V2.5 8.2 7.2 88.9% 1 20.60s 364 33,211
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 10.0 10.0 100.0% 0 9.40s 180 5,319
MiMo-V2.5 10.0 10.0 100.0% 0 7.29s 303 2,424
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 3.0 10.0 0.0% 0 26.07s 10 16,267
MiMo-V2.5 - - - - - - - -

Quick Compare

Switch Comparison Pair