Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

StepFun: Step 3.7 Flash vs Xiaomi: MiMo-V2-Flash

Last updated at: 2026-05-29

Metric Step 3.7 Flash Step 3.7 Flash low Release: 2026-05-29 MiMo-V2-Flash MiMo-V2-Flash medium Release: 2025-12-16
Score 7.4 7.1
Rank #60 #77
Reliability 10.0 10.0
Consistency 8.7 8.7
Tests Correct
Attempt pass rate 68.3% 63.3%
Flaky tests 3 3
Total Runs 60 60
Cost per result 2.796 0.345
Total Cost $0.336 $0.038
Input Price $0.200 / 1M $0.100 / 1M
Output Price $1.150 / 1M $0.300 / 1M
Output Tokens 285,209 12,458
Reasoning Tokens 0 115,182
Response Time (avg) 16.06s 20.28s
Response Time (max) 124.75s 96.01s
Response Time (total) 321.11s 283.87s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 8.7 7.9 91.7% 1 4.02s 10,896 0
MiMo-V2-Flash 8.1 7.9 83.3% 1 15.85s 1,674 23,559
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 10.0 10.0 100.0% 0 9.43s 14,569 0
MiMo-V2-Flash 4.1 5.8 33.3% 1 7.20s 456 3,648
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 10.0 10.0 100.0% 0 7.98s 6,426 0
MiMo-V2-Flash 9.8 10.0 100.0% 0 75.68s 442 26,859
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 7.3 5.8 83.3% 1 2.29s 2,667 0
MiMo-V2-Flash 6.5 10.0 50.0% 0 0ms 153 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 5.3 7.2 44.4% 1 43.31s 104,487 0
MiMo-V2-Flash 5.9 7.2 55.6% 1 96.01s 8,374 42,461
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 3.4 9.3 0.0% 0 7.00s 4,604 0
MiMo-V2-Flash 4.0 10.0 0.0% 0 4.20s 87 488
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 9.8 10.0 100.0% 0 1.58s 1,857 0
MiMo-V2-Flash 10.0 10.0 100.0% 0 4.28s 75 3,504
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 5.5 9.9 33.3% 0 1.84s 3,564 0
MiMo-V2-Flash 7.7 10.0 66.7% 0 3.87s 864 1,948
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 10.0 10.0 100.0% 0 3.25s 1,360 0
MiMo-V2-Flash 10.0 10.0 100.0% 0 27.78s 321 12,715
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 3.0 10.0 0.0% 0 124.75s 134,779 0
MiMo-V2-Flash 3.0 10.0 0.0% 0 1.96s 12 0

Quick Compare

Switch Comparison Pair