Navigate
AI BENCHY
Your ad here

AI BENCHY Compare

Mistral: Mistral Small 4 vs MoonshotAI: Kimi K2.5

Last updated at: 2026-03-17

Metric Mistral Small 4 Mistral Small 4 medium Release: 2026-03-16 Kimi K2.5 Kimi K2.5 none Release: 2026-01-27
Rank #55 #59
Score 5.6 5.3
Consistency 7.0 8.7
Cost per result 0.502 0.297
Total Cost $0.026 $0.015
Tests Correct
Attempt pass rate 49.0% 37.3%
Flaky tests 6 3
Total Runs 51 51
Output Tokens 12,288 2,010
Reasoning Tokens 28,112 0
Response Time (avg) 4.18s 10.83s
Response Time (max) 25.25s 42.13s
Response Time (total) 71.03s 108.27s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 5.6 3.8 66.7% 3 2.67s 4,055 4,778
Kimi K2.5 3.6 8.4 8.3% 1 6.24s 373 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 3.0 10.0 0.0% 0 25.25s 2,612 10,700
Kimi K2.5 2.8 2.1 33.3% 1 19.16s 748 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 7.3 5.9 83.3% 1 1.23s 335 723
Kimi K2.5 7.3 5.8 83.3% 1 42.13s 187 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 5.3 7.2 44.4% 1 6.11s 2,621 6,904
Kimi K2.5 5.3 10.0 33.3% 0 4.38s 29 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 4.8 10.0 0.0% 0 2.05s 821 828
Kimi K2.5 10.0 10.0 100.0% 0 4.00s 76 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 7.3 5.8 83.3% 1 1.38s 540 1,031
Kimi K2.5 6.5 10.0 50.0% 0 2.67s 60 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 3.4 9.7 0.0% 0 2.00s 983 2,338
Kimi K2.5 3.1 10.0 0.0% 0 4.73s 317 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Mistral Small 4 10.0 10.0 100.0% 0 3.50s 321 810
Kimi K2.5 10.0 10.0 100.0% 0 13.99s 220 0

Quick Compare

Switch Comparison Pair