Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Qwen: Qwen3.5-9B vs Xiaomi: MiMo-V2.5

Summary

Qwen3.5-9B vs MiMo-V2.5 benchmark comparison: MiMo-V2.5 leads on average score with 4.9 vs 4.2. MiMo-V2.5 has the lower benchmark cost at $0.007 vs $0.035. MiMo-V2.5 is faster at 2.20s vs 82.24s, with pass rates of 27.0% vs 27.0%.

Recommended model: MiMo-V2.5 - It has the best score here (4.9), while costing about 5.3x less than Qwen3.5-9B.

Last updated at: 2026-06-04

Metric Qwen3.5-9B Qwen3.5-9B medium Release: 2026-03-02 MiMo-V2.5 MiMo-V2.5 none Release: 2026-04-22
Score 4.2 4.9
Rank #161 #143
Reliability 6.7 10.0
Consistency 8.0 9.6
Tests Correct
Attempt pass rate 27.0% 27.0%
Flaky tests 5 1
Total Runs 63 63
Cost per result 1.187 0.413
Total Cost $0.035 $0.007
Input Price $0.040 / 1M $0.140 / 1M
Output Price $0.150 / 1M $0.280 / 1M
Total Input Tokens 17,070 41,985
Output Tokens 29,045 2,267
Reasoning Tokens 209,516 0
Response Time (avg) 82.24s 2.20s
Response Time (max) 226.38s 6.86s
Response Time (total) 1315.88s 46.21s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#161 Qwen3.5-9B

medium
Cost
$0.001
Time
35.9s
Tokens
3,030 tok

#143 MiMo-V2.5

none
Cost
$0.007
Time
267.4s
Tokens
25,283 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-9B 5.1 5.8 50.0% 2 34.44s 369 2,621 12,411
MiMo-V2.5 3.5 8.0 16.7% 1 2.19s 645 282 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-9B 2.9 10.0 0.0% 0 100.88s 2,396 7,890 41,129
MiMo-V2.5 5.5 10.0 33.3% 0 3.24s 7,440 696 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-9B 3.0 10.0 0.0% 0 0ms 0 0 0
MiMo-V2.5 3.0 10.0 0.0% 0 2.36s 15,075 330 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-9B 3.6 5.6 33.3% 1 87.31s 4,722 1,383 32,113
MiMo-V2.5 6.5 10.0 50.0% 0 1.01s 7,758 366 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-9B 3.6 7.2 22.2% 1 137.75s 295 11,549 48,475
MiMo-V2.5 3.0 10.0 0.0% 0 756ms 753 27 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-9B 2.8 1.6 33.3% 1 226.38s 180 0 30,695
MiMo-V2.5 4.4 9.9 0.0% 0 6.86s 498 81 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-9B 6.5 10.0 50.0% 0 5.75s 381 491 1,824
MiMo-V2.5 6.5 10.0 50.0% 0 751ms 684 72 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-9B 3.0 10.0 0.0% 0 32.27s 376 1,593 12,026
MiMo-V2.5 5.4 10.0 33.3% 0 2.13s 678 166 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-9B 10.0 10.0 100.0% 0 4.31s 8,283 444 1,149
MiMo-V2.5 10.0 10.0 100.0% 0 2.43s 8,238 231 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-9B 3.0 10.0 0.0% 0 177.02s 68 3,074 29,694
MiMo-V2.5 3.0 10.0 0.0% 0 3.89s 216 16 0

Quick Compare

Switch Comparison Pair