Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

MiniMax: MiniMax M3 vs MoonshotAI: Kimi K2.5

Summary

MiniMax M3 vs Kimi K2.5 benchmark comparison: MiniMax M3 leads on average score with 7.6 vs 7.5. MiniMax M3 has the lower benchmark cost at $0.131 vs $0.348. MiniMax M3 is faster at 68.17s vs 98.43s, with pass rates of 65.1% vs 68.3%.

Recommended model: MiniMax M3 - It has the best score here (7.6), while costing about 2.7x less than Kimi K2.5.

Last updated at: 2026-07-02

Metric MiniMax M3 MiniMax M3 medium Release: 2026-06-01 Kimi K2.5 Kimi K2.5 medium Release: 2026-01-27
Score 7.6 7.5
Rank #42 #45
Reliability 9.6 10.0
Consistency 7.9 6.9
Tests Correct
Attempt pass rate 65.1% 68.3%
Flaky tests 5 8
Total Runs 63 63
Cost per result 1.187 3.704
Total Cost $0.131 $0.348
Input Price $0.300 / 1M $0.375 / 1M
Output Price $1.200 / 1M $2.025 / 1M
Total Input Tokens 46,546 34,312
Output Tokens 49,036 48,379
Reasoning Tokens 92,543 157,747
Response Time (avg) 68.17s 98.43s
Response Time (max) 431.03s 281.00s
Response Time (total) 1363.38s 1378.03s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#42 MiniMax M3

medium
Cost
$0.012
Time
154.4s
Tokens
10,018 tok

#45 MoonshotAI: Kimi K2.5

medium
Cost
$0.030
Time
58.6s
Tokens
8,683 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
MiniMax M3 5.5 3.7 66.7% 3 14.95s 2,526 874 3,414
Kimi K2.5 7.3 5.8 83.3% 2 51.38s 634 2,789 8,880
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
MiniMax M3 6.1 6.5 55.6% 1 144.74s 5,804 6,223 32,667
Kimi K2.5 6.1 4.6 66.7% 2 217.49s 6,935 5,705 74,693
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
MiniMax M3 10.0 10.0 100.0% 0 65.30s 14,760 1,306 6,253
Kimi K2.5 10.0 10.0 100.0% 0 71.37s 11,280 703 3,713
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
MiniMax M3 10.0 10.0 100.0% 0 14.92s 8,088 514 3,164
Kimi K2.5 10.0 10.0 100.0% 0 49.78s 7,020 563 7,940
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
MiniMax M3 5.5 9.3 33.3% 0 233.13s 869 16,254 19,070
Kimi K2.5 3.5 4.4 33.3% 2 137.29s 485 20,753 30,564
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
MiniMax M3 5.1 3.4 33.3% 1 33.25s 954 2,487 2,523
Kimi K2.5 6.5 3.4 66.7% 1 69.73s 480 3,815 4,262
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
MiniMax M3 9.8 10.0 100.0% 0 6.14s 1,623 103 920
Kimi K2.5 10.0 10.0 100.0% 0 92.47s 675 5,371 6,547
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
MiniMax M3 7.9 9.9 66.7% 0 49.91s 2,079 11,946 13,761
Kimi K2.5 5.3 7.3 44.4% 1 43.23s 659 8,426 12,692
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
MiniMax M3 10.0 10.0 100.0% 0 11.91s 9,168 281 555
Kimi K2.5 10.0 10.0 100.0% 0 31.74s 5,933 242 812
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
MiniMax M3 3.0 10.0 0.0% 0 100.80s 675 9,048 10,216
Kimi K2.5 3.0 10.0 0.0% 0 83.95s 211 12 7,644

Quick Compare

Switch Comparison Pair