Navigate
AI BENCHY
Your ad here

AI BENCHY Compare

Google: Gemma 4 31B vs MoonshotAI: Kimi K2.5

Last updated at: 2026-04-02

Metric Gemma 4 31B Gemma 4 31B none Release: 2026-04-02 Kimi K2.5 Kimi K2.5 medium Release: 2026-01-27
Score 6.7 7.2
Rank #47 #39
Consistency 10.0 7.2
Tests Correct
Attempt pass rate 52.9% 72.6%
Flaky tests 0 6
Total Runs 51 51
Cost per result 0.023 2.232
Total Cost $0.002 $0.201
Input Price $0.140 / 1M $0.383 / 1M
Output Price $0.400 / 1M $1.909 / 1M
Output Tokens 660 40,907
Reasoning Tokens 0 75,121
Response Time (avg) 2.55s 64.59s
Response Time (max) 4.68s 137.29s
Response Time (total) 38.20s 645.93s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 6.5 10.0 50.0% 0 1.85s 45 0
Kimi K2.5 7.3 5.8 83.3% 2 51.38s 2,789 8,880
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 3.0 10.0 0.0% 0 0ms 0 0
Kimi K2.5 10.0 10.0 100.0% 0 71.37s 703 3,713
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 10.0 10.0 100.0% 0 2.25s 285 0
Kimi K2.5 10.0 10.0 100.0% 0 49.78s 563 7,940
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 7.7 10.0 66.7% 0 3.22s 27 0
Kimi K2.5 3.5 4.4 33.3% 2 137.29s 20,753 30,564
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 10.0 10.0 100.0% 0 2.09s 117 0
Kimi K2.5 6.5 3.4 66.7% 1 69.73s 3,815 4,262
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 6.5 10.0 50.0% 0 2.84s 78 0
Kimi K2.5 10.0 10.0 100.0% 0 92.47s 5,371 6,547
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 5.5 10.0 33.3% 0 2.95s 108 0
Kimi K2.5 5.3 7.3 44.4% 1 45.40s 6,671 12,403
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 3.0 10.0 0.0% 0 0ms 0 0
Kimi K2.5 10.0 10.0 100.0% 0 31.74s 242 812

Quick Compare

Switch Comparison Pair